An interview with O'Reilly

Software architecture vs code

While at the O'Reilly Software Architecture conference in Boston last week, I was interviewed by O'Reilly about a number of things, including the software architecture role and the tension between software architecture and code.

This interview originally appeared in Signals from the O’Reilly Software Architecture Conference 2015 that looks at some of the key insights from the event. The slides from my talk titled Software architecture vs code are available to view online/download.

Video - Agility and the essence of software architecture

Recorded at YOW! 2014 in Brisbane, Australia

This is just a quick note to say that the video of my "Agility and the essence of software architecture" talk from YOW! 2014 in Brisbane is now available to watch online. This talk covers the subject of software architecture and agile from a number of perspectives, focusing on how to create agile software systems in an agile way.

Agility and the essence of software architecture

The slides are also available to view online/download. A huge thanks to everybody who attended for making it such a fun session. :-)

Package by component and architecturally-aligned testing

I've seen and had lots of discussion about "package by layer" vs "package by feature" over the past couple of weeks. They both have their benefits but there's a hybrid approach I now use that I call "package by component". To recap...

Package by layer

Let's assume that we're building a web application based upon the Web-MVC pattern. Packaging code by layer is typically the default approach because, after all, that's what the books, tutorials and framework samples tell us to do. Here we're organising code by grouping things of the same type.

Package by layer

There's one top-level package for controllers, one for services (e.g. "business logic") and one for data access. Layers are the primary organisation mechanism for the code. Terms such as "separation of concerns" are thrown around to justify this approach and generally layered architectures are thought of as a "good thing". Need to switch out the data access mechanism? No problem, everything is in one place. Each layer can also be tested in isolation to the others around it, using appropriate mocking techniques, etc. The problem with layered architectures is that they often turn into a big ball of mud because, in Java anyway, you need to mark your classes as public for much of this to work.

Package by feature

Instead of organising code by horizontal slice, package by feature seeks to do the opposite by organising code by vertical slice.

Package by feature

Now everything related to a single feature (or feature set) resides in a single place. You can still have a layered architecture, but the layers reside inside the feature packages. In other words, layering is the secondary organisation mechanism. The often cited benefit is that it's "easier to navigate the codebase when you want to make a change to a feature", but this is a minor thing given the power of modern IDEs.

What you can do now though is hide feature specific classes and keep them out of sight from the rest of the codebase. For example, if you need any feature specific view models, you can create these as package-protected classes. The big question though is what happens when that new feature set C needs to access data from features A and B? Again, in Java, you'll need to start making classes publicly accessible from outside of the packages and the big ball of mud will again emerge.

Package by layer and package by feature both have their advantages and disadvantages. To quote Jason Gorman from Schools of Package Architecture - An Illustration, which was written seven years ago.

To round off, then, I would urge you to be mindful of leaning to far towards either school of package architecture. Don't just mindlessly put socks in the sock draw and pants in the pants draw, but don't be 100% driven by package coupling and cohesion to make those decisions, either. The real skill is finding the right balance, and creating packages that make stuff easier to find but are as cohesive and loosely coupled as you can make them at the same time.

Package by component

This is a hybrid approach with increased modularity and an architecturally-evident coding style as the primary goals.

Package by component

The basic premise here is that I want my codebase to be made up of a number of coarse-grained components, with some sort of presentation layer (web UI, desktop UI, API, standalone app, etc) built on top. A "component" in this sense is a combination of the business and data access logic related to a specific thing (e.g. domain concept, bounded context, etc). As I've described before, I give these components a public interface and package-protected implementation details, which includes the data access code. If that new feature set C needs to access data related to A and B, it is forced to go through the public interface of components A and B. No direct access to the data access layer is allowed, and you can enforce this if you use Java's access modifiers properly. Again, "architectural layering" is a secondary organisation mechanism. For this to work, you have to stop using the public keyword by default. This structure raises some interesting questions about testing, not least about how we mock-out the data access code to create quick-running "unit tests".

Architecturally-aligned testing

The short answer is don't bother, unless you really need to. I've spoken about and written about this before, but architecture and testing are related. Instead of the typical testing triangle (lots of "unit" tests, fewer slower running "integration" tests and even fewer slower UI tests), consider this.

Architecturally-aligned testing

I'm trying to make a conscious effort to not use the term "unit testing" because everybody has a different view of how big a "unit" is. Instead, I've adopted a strategy where some classes can and should be tested in isolation. This includes things like domain classes, utility classes, web controllers (with mocked components), etc. Then there are some things that are easiest to test as components, through the public interface. If I have a component that stores data in a MySQL database, I want to test everything from the public interface right back to the MySQL database. These are typically called "integration tests", but again, this term means different things to different people. Of course, treating the component as a black box is easier if I have control over everything it touches. If you have a component that is sending asynchronous messages or using an external, third-party service, you'll probably still need to consider adding dependency injection points (e.g. ports and adapters) to adequately test the component, but this is the exception not the rule. All of this still applies if you are building a microservices style of architecture. You'll probably have some low-level class tests, hopefully a bunch of service tests where you're testing your microservices though their public interface, and some system tests that run scenarios end-to-end. Oh, and you can still write all of this in a test-first, TDD style if that's how you work.

I'm using this strategy for some systems that I'm building and it seems to work really well. I have a relatively simple, clean and (to be honest) boring codebase with understandable dependencies, minimal test-induced design damage and a manageable quantity of test code. This strategy also bridges the model-code gap, where the resulting code actually reflects the architectural intent. In other words, we often draw "components" on a whiteboard when having architecture discussions, but those components are hard to find in the resulting codebase. Packaging code by layer is a major reason why this mismatch between the diagram and the code exists. Those of you who are familiar with my C4 model will probably have noticed the use of the terms "class" and "component". This is no coincidence. Architecture and testing are more related than perhaps we've admitted in the past.

p.s. I'll be speaking about this topic over the next few months at events across Europe, the US and (hopefully) Australia

Security Concerns for Legacy Systems

An Ongoing process

Information security is a quality attribute that can’t easily be retrofitted. Concerns such as authorisation, authentication, access and data protection need to be defined early so they can influence the solution's design.

However, many aspects of information security aren’t static. External security threats are constantly evolving and the maintainers of a system need to keep up-to-date to analyse them. This may force change on an otherwise stable system.

Functional changes to a legacy system also need to be analysed from a security standpoint. The initial design may have taken the security requirements into consideration (a quality attribute workshop is a good way to capture these) but are they re-considered when features are added or changed? What if a sub-component is replaced or services moved to a remote location? Is the analysis re-performed?

It can be tempting to view information security as a macho battle between evil, overseas (people always think they come from another country) hackers and your own underpaid heroes but many issues have simple roots. Many data breaches or not hacks but basic errors - I once worked at a company where an accountant intern accidentally emailed a spreadsheet with everyone’s salary to the whole company.

Let’s have a quick look at some of the issues that a long running, line-of-business application might face:

Lack of Patching

Have you applied all the vendors’ patches? Not just to the application but the software stack beneath? Has the vendor applied patches to third party libraries that they rely upon? What about the version of Java/.net that the application is running or the OS beneath that? When an application is initially developed it will use the latest versions but unless a full dependency tree is recorded the required upgrades can be difficult to track. It is easy to forget these dependant upgrades even on an actively developed system.

Even if you do have a record of all components and subcomponents, there is no guarantee that, when upgraded, they will be compatible or work as before. The level of testing can be high and this acts as a deterrent to change - you only need a single broken component for the entire system to be at risk.

Passwords

Passwords are every operations team’s nightmare. Over the last 20 years the advice for best-practice, generating, and storing of passwords has changed dramatically. Users used to be advised to think of an unusual password and not write it down. However it turns out that ‘unusual’ is actually very common with people picking the same ‘unusual’ word. Leaked password lists from large websites have demonstrated how many users pick the same password. Therefore the advice and allowable passwords for modern systems have changed (often multiple word sentences). Does your legacy system enforce this or is it filled with passwords from a brute-force list?

Passwords also tend to get shared over time. What happens when someone goes on holiday, a weekly report needs to be run, but the template exists within a specific user’s account? Often they are phoned up and asked for their password. This may indicate a feature flaw in the product but is very common. There are many ways to improve this; from frequent password modifications to two factor authentication but these increase the burden on the operations team.

Does your organisation have an employee leaver’s process? Do you suspend account access? If you have shared accounts (“everyone knows the admin password") this may be difficult or disruptive. Having a simple list (or preferably an automated script) to execute for each employee that leaves is important.

There are similar problems with cryptographic keys. Are they long enough to comply with the latest advice? Do they use a best practice algorithm or one with a known issue? It is amazing how many websites use old certificates that should be replaced or have even expired. How secure is your storage of these keys?

Are any of your passwords or keys embedded in system files? This may have seemed safe when the entire system was on a single machine in a secure location but if the system has been restructured this may no longer be the case. For example, if some of the files have been moved to a shared or remote location, it may be possible for a non-authorised party to scan them.

Moving from Closed to Open Networks

A legacy system might have used a private, closed network for reasons of speed and reliability but it may now be possible to meet those quality attributes on an open network and vastly reduce costs. However, if you move services from closed networks to open networks you have to reconsider the use of encryption on the connection. The security against eavesdropping/network sniffing was a fortunate side-effect of the network being private, so the requirement may have not been captured - it was a given. This can be dangerous if the original requirements are used for restructuring. These implicit quality attributes are important and whether a feature change creates new quality attributes should be considered. You might find these cost-saving changes dropped on you by an excited accountant (who thinks their brilliance has just halved communications charges) with little warning!

Moving to an open network will make services reachable by unknown clients. This raises issues from Denial-of-Service attacks through to malicious clients attempting to use bad messages (such as SQL injection) to compromise a system. There are various techniques that can be applied at the network level to help here (VPNs, blocking unknown IPs, deep packet inspection etc) but ultimately the code being run in the services need to be security aware - this is very, very hard to do to an entire system after it is written.

Migrating to an SOA or micro-service architecture increases these effects as the larger number of connections and end-points now need to be secured. A well modularised system may be easy to distribute but intra-process communication is much more secure than inter-process or inter-machine.

Modernising Data Formats

Migrating from a closed, binary data format to an open one (e.g. xml) for messaging or storage makes navigating the data easier, but this applies to casual scanning by an attacker as well. Relying on security by obscurity isn’t a good idea (and this is not an excuse to avoid improving the readability of data) but many systems do. When improving data formats you should re-consider where the data is being stored, what has access and whether encryption is required.

Similar concerns should be addressed when making source-code open source. Badly written code is now available for inspection and attack vectors can be examined. In particular you should be careful to avoid leaking configuration into the source code if you intending making it open.

New Development and Copied Data

If new features are developed for a system that has been static for a while, it is likely that new developer, test, QA and pre-production environments will be created. (The originals will either be out of date or not kept due to cost). The quickest and most accurate way to create test environments is to clone production. This works well but copied data is as important as the original. Do you treat this copied data with the same security measures as production? If you have proprietary or confidential customer information then it should be. Note that the definition of ‘confidential’ varies but you might be surprised at how broad some regulators make it. You may also be restricted in the information that you can move out of the country - is your development or QA team located overseas?

Remember, you are not just restricting access to your system but your data as well.

Server Consolidation

Systems that pushed the boundaries of computing power 15 years ago, can now be run on a cheap commodity server. Many organisations consolidate their systems on a regular basis, replacing multiple old servers with a single powerful one. An organisation may have been through this process many times. If so, how has this been done and has this increased the visibility of these processes/services to others? If done correctly, with virtualisation tools, then the virtual machines should still be isolated but this is still worth checking. However, a more subtle problem can be caused by the removal of the infrastructure between services. There may no longer be routers or firewalls between the services (or virtual ones with a different setup) as they now sit on the same physical device. This means that a vulnerable, insecure server is less restricted - and therefore a more dangerous staging point if compromised.

A server consolidation process should, instead, be used as an opportunity to increase the security and isolation of services as virtual firewalls are easy to create and monitoring can be improved.

Improved Infrastructure Processes

Modifications to support processes can create security holes. For example, consider the daily backup of an application’s data. The architect of a legacy system may have originally expected backups to be placed onto magnetic tapes and stored in a fire-safe near to the server itself (with periodic backups taken securely offsite).

A more modern process would use offsite, real-time replication. Many legacy systems have had their backup-to-tape processes replaced with a backup-to-SAN which is replicated offsite. This is simple to implement, faster, more reliable and allows quicker restoration. However, who now has access to these backups? When a tape was placed in a fire-safe, the only people with access to the copied data were those with physical access to the safe. Now it can be accessed by anyone with read permission in any location the data is copied. Is this the same group of people as before? It is likely to be a much larger group (over a wide physical area) and could include those with borrowed passwords or those that have left the organisation.

Any modifications to the backup processes need to be analysed from an information security perspective. This is not just for the initial backup location but anywhere else the data is copied to.

Conclusion

Information security is an ongoing process that has multiple drivers, both internal and external to your system. The actions required will vary greatly between systems and depend on the system architecture, its business function and the environment it exists within. Any of these can change and affect the security. Architectural thinking and awareness are central to providing this and a good place to start is a diagram and a risk storming session (with a taxonomy).

Lightweight software architecture - an interview with Fog Creek

I recently did a short interview with the folks from Fog Creek (creators of Stack Exchange, Trello, FogBugz, etc) about lightweight approaches to software architecture, my book and so on. The entire interview is only about 8 minutes in length and you can watch/listen/read it on the Fog Creek blog.

Read more...

Introducing Structurizr

Simple, versionable, up-to-date, scalable software architecture models

I've mentioned Structurizr in passing, but I've never actually written a post that explains what it is and why I've built it. First, some background.

"What tool do you use to draw software architecture diagrams?"

I get asked this question almost every time I run one of my workshops, usually just after the section where I introduce the C4 model and show some example diagrams. My answer to date has been "just OmniGraffle or Visio", and recommending that people use a drawing tool to create software architecture diagrams has always bugged me. My Simple Sketches for Diagramming Your Software Architecture article provides an introduction to the C4 model and my thoughts on UML.

Once you have a simple way to think about and describe the architecture of a software system (and this is what the C4 model provides), you realise that the options for communicating it are relatively limited. And this is where the idea for a simple diagramming tool was born. In essence, I wanted to build a tool where the data is sourced from an underlying model and all I need to do is move the boxes around on the diagram canvas.

Part 1: Software architecture as code

Structurizr initially started out as a web application where you would build up the underlying model (the software systems, people, containers and components) by entering information about them through a number of HTML forms. Diagrams were then created by selecting which type of diagram you wanted (system context, container or component) and then by specifying which elements you wanted to see on the diagram. This did work but the user experience, particularly related to data entry, was awful, even for small systems.

Behind the scenes of the web application was a simple collection of domain classes that I used to represent software systems, containers and components. Creating a software architecture model using these classes was really succinct, and it struck me that perhaps this was a better option. The trade-off here is that you need to write code in order to create a software architecture model but, since software architects should code, this isn't a problem. ;-)

These classes have become what is now Structurizr for Java, an open source library for creating software architecture models as code. Having the software architecture model as code opens a number of opportunities for creating the model (e.g. extracting components automatically from a codebase) and communicating it (e.g. you can slice and dice the model to produce a number of different views as necessary). Since the models are code, they are also versionable alongside your codebase and can be integrated with your build system to keep your models up to date. The models themselves can then be output to another tool for visualisation.

Part 2: Web-based software architecture diagrams

structurizr.com is the other half of the story. It's a web application that takes a software architecture model (via an API) and provides a way to visualise it. Aside from changing the colour, size and position of the boxes, the graphical representation is relatively fixed. This in turn frees you up from messing around with creating static diagrams in drawing tools such as Visio.

Structurizr screenshot
A screenshot of Structurizr.

As far as features go, the list currently includes an API for getting/putting models, making models public/private, embedding diagrams into web pages, creating diagrams based upon different page sizes (paper and presentation slide sizes), exporting diagrams to a 300dpi PNG file (for printing or inclusion in a slide deck), automatic generation of a key/legend and a fullscreen presentation mode for showing diagrams directly from the tool. The recent webinar I did with JetBrains includes more information and a demo. Pricing is still to be confirmed, but there will be a free tier for individual use and probably some paid tiers for teams and organisations (e.g. for sharing private models).


An embedded software architecture diagram from structurizr.com (you can move the boxes).

It's worth pointing out that structurizr.com is my vision of what I want from a simple software architecture diagramming tool, but you're free to take the output from the open source library and create your own tooling to visualise the model. Examples include an export to DOT format (for importing into something like Graphviz), XMI format (for importing into UML tools), a desktop app, IDE plugins, etc.

That's a quick introduction to Structurizr and, although it's still a work in progress, I'm slowly adding more users via a closed beta, with the goal of opening up registration next month. It definitely scratches an itch that I have, and I hope other people will find it useful too.

JetBrains webinar recording: Software architecture as code

The lovely people at JetBrains have published the recording of the live webinar I did with them last week about software architecture as code. I've embedded the YouTube video below, but you should also go and take a look at their website because there are answers to a bunch of questions that I didn't get time to answer during the webinar itself.

If you've already seen one of my Software architecture vs code presentations, you should probably jump straight to the demo section where I show how to create a software architecture model with code and Structurizr. You can also get the slides and the code that I used.

Thanks again to JetBrains (especially Hadi Hariri, Trisha Gee and Robert Demmer) and to everybody who listened in.

《程序员必读之软件架构》作者Simon Brown:架构师与程序员的区别(图灵访谈)

Simon Brown 是全球知名软件架构独立咨询师、讲师,创办了专门讨论软件架构问题的网站“编码架构”(CodingTheArchitecture.com)。他自称是写代码的软件架构师和明白架构的软件开发者。自2008年以来的7年时间里,Simon在全球28个国家做过有关软件架构、技术领导力及其与敏捷的平衡等主题的百余场演讲,并于2012年8月在中国举办的ArchSummit全球架构师峰会上以“郁闷的架构师”和“如何设计安全的架构”为主题发表演讲,深受与会者好评。Simon已为全球20多个国家的软件团队提供咨询和培训,他的客户既有小型技术初创企业,也不乏全球家喻户晓的品牌公司。Simon著有《程序员必读之软件架构》一书,他在这本书中打破传统的认知,模糊软件开发和架构在流程中的界限,进而为软件架构正名。

问:开发者和架构师之间最大的区别是什么?

架构师和开发者一样,也经常写代码,简单的说,开发者和架构师之间最大的区别就是技术领导力。软件架构师的角色需要理解最重要的架构驱动力是什么,他提供的设计需要考虑这些因素。架构师还要控制技术风险,在需要的时候积极演化架构,并且负责技术质量保证。从根本上讲,架构师是一个技术领导者的角色,这就是最大的区别。

Read more...

I'm speaking at the O'Reilly Software Architecture Conference

March 16-19, 2015 in Boston, MA

I'm thrilled to say that I'll be speaking at the inaugural O'Reilly Software Architecture Conference in Boston during March. The title of my session is Software architecture vs code and I'll be speaking about the conflict between software architecture and code. This is a 90-minute session, so I look forward to also discussing how can we solve this issue. Here's the abstract...

Software architecture and coding are often seen as mutually exclusive disciplines, despite us referring to higher level abstractions when we talk about our software. You've probably heard others on your team talking about components, services and layers rather than objects when they're having discussions. Take a look at the codebase though. Can you clearly see these abstractions or does the code reflect some other structure? If so, why is there no clear mapping between the architecture and the code? Why do those architecture diagrams that you have on the wall say one thing whereas your code says another? In fact, why is it so hard to automatically generate a decent architecture diagram from an existing codebase? Join us to explore this topic further.

Software Architecture Conference 2015

You can register with code FRIEND20 for a discount. See you there!

Live Webinar with JetBrains: Software Architecture as Code

I'm doing a live and free webinar with Trisha Gee and the other fine people over at JetBrains on February 12th at 15:00 GMT. The topic is "software architecture as code" and I'll be talking about/showing how you can create a software architecture model in code, rather than drawing static diagrams in tools such as Microsoft Visio.

Over the past few years, I've been distilling software architecture down to its essence, helping organisations adopt a lightweight style of software architecture that complements agile approaches. This includes doing "just enough" up front design to understand the significant structural elements of the software, some lightweight sketches to communicate that vision to the team, identifying the highest priority risks and mitigating them with concrete experiments. Software architecture is inherently about technical leadership, stacking the odds of success in your favour and ensuring that everybody is heading in the same direction.

But it's 2015 and, with so much technology at our disposal, we're still manually drawing software architecture diagrams in tools like Microsoft Visio. Furthermore, these diagrams often don't reflect the implementation in code, and vice versa. This session will look at why this happens and how to resolve the conflict between software architecture and code through the use of architecturally-evident coding styles and the representation of software architecture models as code.

Please sign-up here if you'd like to join us.