Evolutionary design still requires up front thinking

There's a great post by Joshua Kerievsky titled The Day We Stopped Sprinting that itself references an older post called Evolutionary Design, which talks about the need to create a "primitive whole" initially, before iterating to improve it over time. The important thing to note here is that the primitive whole isn't just a bunch of unassembled parts, it's a "under-developed whole" and something that does work (to some extent, anyway). The example used in the illustration is a guitar, with the first version being something that looks like a very primitive guitar.

Something that's often missed here, or not discussed explicitly, is that you still need to do some up front thinking in order to get to that primitive whole, and to create an initial vision or starting point. After all, the primitive guitar is still a guitar, which implies that somebody somewhere had a vision early on that a guitar was the thing that was needed. In my experience, I've seen many teams misinterpret "evolutionary design" and "emergent architecture" to mean, "you don't need to do any design up front at all". As Dave Thomas says, "big design up front is dumb, but doing no design up front is even dumber".

During my software architecture workshops, I occasionally see teams draw nothing more than a visual summary of the requirements when asked to work in groups and design a software solution. The requirements are based upon a "financial risk system" for a bank, and sometimes groups will literally just draw a box labelled "Financial Risk System" before proudly proclaiming, "that's all of the up front design we need to do, we're agile". Yes, this really has, and still does happen from time to time.

Engaging with the problem

Many years ago, my boss gave me a small internal software project to work on. I don't remember the exact details, but he basically gave me a problem statement and told me to design a software solution. After a couple of hours, I presented my solution. It was instantly ripped apart and he told me that I hadn't properly engaged with the problem. He was right. I'd presented a very simple solution that didn't cater for any of the complexity of the problem space, mainly because I hadn't actually uncovered it yet. On my travels around the world, I regularly see the same thing, and people really struggle with the concept of doing up front design. Many of them also never really engage their minds in the problem, evidenced by one or more diagrams that present a very simplified and superficial view of the solution, such as the typical "logical view" diagrams below.

An example software architecture diagram

Whenever I'm doing an up front design exercise, I want it to be quick and efficient while still providing as much value as possible. The process of doing some up front design provides you with a way to engage with the problem space and create a starting point for your vision of the thing you want to build. For me, doing up front design is about understanding the structure of the thing you're going to build, creating a starting point and vision for the team to work with and identifying/mitigating the highest priority risks. As I've discovered, a simple way to make this happen is to encourage people to actually produce some lightweight yet structured artifacts, like the diagrams in my C4 model, as a part of the design process. When they do this, conversations change because it forces people to engage with the problem. Up front design is therefore a crucial part of a longer evolutionary design process. Without it, you're just lost.

PaaS for Java developers - Part 4

Previous parts of this blog post series have provided an overview of Cloud Foundry from a number of different perspectives; including the high-level concepts, vendor lock-in and the Pivotal Web Services marketplace services. In this part, we'll look at how Cloud Foundry makes it trivial to perform zero-downtime deployments.

Blue-Green Deployments

As a quick introduction to this topic, imagine that you have a Java web application running somewhere. A simple way to upgrade that application to a new version is to stop the application, update the relevant deployment artifacts (e.g. a .JAR or .WAR file), and then restart it. Some web application servers provide support for hot-swapping applications, but the principle is the same. Although this works, users of your application are likely to encounter downtime because the application will be unavailable for a short period of time. Over the years, we've created a number of techniques to deal with this issue, one of the most popular being Blue-Green Deployments, where a (physical or virtual) router is used to switch traffic from one running instance of your application to another. Although this might sound like an advanced technique, tools like Cloud Foundry make this feasible for teams of any size to achieve.

As we saw in previous blog posts, Structurizr consists of two Java/Spring web applications; a "Web Application" (serving HTML, CSS and JavaScript) and an "API Application" (allowing clients to GET or PUT software architecture workspaces). The build and deployment process is fully automated, triggered by TeamCity running on an Amazon EC2 server waiting for commits to the git repository. In summary, this build and deployment process performs the following steps:

  1. Resolve dependencies.
  2. Initialise build directories, increment build number, etc.
  3. Compile code (production and tests).
  4. Run unit/class tests.
  5. Run integration/component tests.
  6. Create deployment artifacts (e.g. .WAR files).
  7. Push the API Application to Pivotal Web Services.
  8. Run e2e/system tests on the API Application.
  9. Push the Web Application to Pivotal Web Services.
  10. Run e2e/system tests on the Web Application.
  11. Make the API Application live.
  12. Make the Web Application live.
  13. Generate and publish new software architecture diagrams and documentation.

Push applications to Pivotal Web Services

Assuming that the build and tests were successful, the build process will push each of the API and Web Applications to Pivotal Web Services. The Cloud Foundry command line interface is installed on the build server, and the build script simply uses the "cf push" command to push the .WAR files. The "--no-start" flag is used so that the application is pushed, but not started, and this is done so that application environment variables (e.g. configuration) can be set using the "cf env" command. Once the configuration has been set, the "cf scale" command is used to set the desired number of instances and RAM, before actually starting the application. At this point, the applications are running but only accessible using a temporary URL that includes the build number (e.g. "https://structurizr-web-123.cfapps.io").

With the applications running, the build script can now run a series of end-to-end tests (a mixture of "smoke tests" and system tests), in order to verify that the new versions of the applications are running as expected. These tests include scenarios such as signing in, getting/putting software architecture models, etc.

Making the new versions live

If the end-to-end tests pass, the next step is to make these new versions of the applications live. This involves using the Cloud Foundry command line interface to map the live URL to the new versions of the applications ("cf map-route"), while removing it from the old versions ("cf unmap-route"). This process makes use of the Cloud Foundry router, which allows you to configure the URLs that are used to access running applications. If everything is successful, finally, the previous versions of the applications are deleted. The whole build process takes less than 10 minutes. Here is some more information about how to do Blue-Green Deployments if you're interested.

The process of switching the live URLs to the new versions of the applications is what allows a zero-downtime deployment. The small caveat here is that any information that only resides in the memory space of the old versions of the applications is lost, of course. As an example, if HTTP session state is only stored in memory, users will be signed out once their requests are directed to a new instance of Apache Tomcat. There are a number of ways to deal with this problem (including session replication), but Structurizr makes use of Spring Session in conjunction with Redis, to instead store HTTP session information outside of the Apache Tomcat server instances, so that session information is retained during the deployment process.

And that's it ... a zero-downtime deployment process using nothing more than the Cloud Foundry command line interface. In part 5 I'll briefly discuss how to customise the deployment environment using Java buildpacks. Comments or questions? Tweet me at @simonbrown.

PaaS for Java developers - Part 3

Marketplace services

I want to start part 3 by saying that I really do like and recommend Pivotal Web Services and Cloud Foundry as a simple and robust way to deploy Java applications. I've been running Structurizr on Pivotal Web Services for over 3 years now and I've had very few issues with the core platform. The marketplace services, on the other, are a different story.

In addition to providing a deployment platform to run your code, most of the Platform as a Service providers (Pivotal Web Services, Heroku, Azure, etc) provide a collection of "marketplace services". These are essentially add-on services that give you easy access to databases, messaging providers, monitoring tools, etc. As I write this, the Pivotal Web Services marketplace includes many of the popular technologies you would expect to see; including MySQL, PostgreSQL, Redis, Memcached, MongoDB, RabbitMQ, etc.

MySQL as a service

Let's imagine that you're building a Java web application and you'd like to store data in a MySQL database. You have a few options. One option is to build your own database server somewhere like Amazon AWS. Of course, you need to have the skills to do this and, given that part 1 was all about the benefits of PaaS over building your own infrastructure, the DIY approach is not necessarily appealing for everybody. Another option is to find a "Database as a Service" provider that will create and run a MySQL server for you. ClearDB is one such example, and it's also available on the Pivotal Web Services marketplace. All you need to do is create a subscription to ClearDB through the marketplace (there is a free plan), connect to the database and create your schema. That's it. Most of the operational aspects of the MySQL database are taken care of; including backups and replication.

To connect your Java application to ClearDB, again, you have some options. The first is to place the database endpoint URL, username and password in configuration, like you might normally do. The other option is to use the Cloud Foundry command line interface to issue a "cf bind" command to bind your ClearDB database instance to your application instance(s), and use Cloud Foundry's auto-reconfiguration feature. If you're building a Spring-based application and you have a MySQL DataSource configured (some caveats apply), Cloud Foundry will automagically reconfigure the DataSource to point to the MySQL database that you have bound to your application. When you're getting started, this is a fantastic feature as it's one less thing to worry about. It also means that you don't need to update URLs, usernames and passwords if they change.

I used this approach for a couple of years and, if you look at the Structurizr changelog, you can see the build number isn't far off 1000. Each build number represents a separate (automated) deployment to Pivotal Web Services. So I've run a lot of builds. And most of them have worked. Occasionally though, I would see deployments fail because services (like ClearDB) couldn't be bound to my application instances. Often these were transient errors, and restarting the deployment process would fix it. Other times I had to raise a support ticket because there was literally nothing I could do. One of the big problems with PaaS is that you're stuck when it goes wrong, because you don't have access to the underlying infrastructure. Thankfully this didn't happen often enough to cause me any real concern, but it was annoying nonetheless.

More annoying was a little bug that I found with Structurizr and UTF-8 character encoding. When people sign up for an account, a record is stored in MySQL and a "please verify your e-mail address" e-mail is sent. If the person's name included any UTF-8 characters, it would look fine in the initial e-mail but not in subsequent e-mails. The problem was that the UTF-8 characters were not being stored correctly in MySQL. After replicating the problem in my dev environment, I was able to fix it by adding a characterEncoding parameter to the JDBC URL. Pushing this fix to the live environment is problematic though, because Cloud Foundry is automatically reconfiguring my DataSource URLs. The simple solution here is to not use automatic reconfiguration, and it's easy to disable via the Java buildpack or by simply not binding a MySQL database instance to the Java application. At this point, I'm still using ClearDB via the marketplace, but I'm specifying the connection details explicitly in configuration.

The final problem I had with ClearDB was earlier this summer. I would often see error messages in my logs saying that I'd exceeded the maximum number of connections. The different ClearDB plans provide differing levels of performance and numbers of connections. I think the ClearDB databases offered via the marketplace are multi-tenanted, and there's a connection limit to ensure quality of service for all customers. And that's okay, but I still couldn't work out why I was exceeding my quota because I know exactly how many app instances I have running and the maximum number of permitted connections in the connection pools per app instance. I ran some load tests with Apache Benchmark and I couldn't get the number of open connections to exceed what had been configured in the connection pool. Often I would be watching the ClearDB dashboard, which shows you the number of open connections, and my applications wouldn't be able to connect despite the dashboard only showing a couple of live connections.

Back to vendor lock-in and migration cost. The cost of migrating from ClearDB to another MySQL provider is low, especially since I'm no longer using the Cloud Foundry automatic reconfiguration mechanism. So I exported the data and created a MySQL database on Amazon RDS instead. For not much more money per month, I have a MySQL database running in multiple availability zones, with encrypted data at rest and I know for sure that the JDBC connection is happening over SSL (because that's how I've configured it).

E-mail delivery as a service

Another marketplace service that I used from an early stage is SendGrid, which provides "e-mail delivery as a service". There's a theme emerging here! Again, you can run a "cf bind" command to bind the SendGrid service to your application. In this case, though, no automatic reconfiguration takes place, because SendGrid exposes a web API. This raises the question of where you find the API credentials. One of the nice features of the marketplace services is that you can get access to the service dashboards (e.g. the ClearDB dashboard, SendGrid dashboard, etc) via the Pivotal Web Services UI, using single sign-on. The service credentials are usually found somewhere on those service dashboards.

After finding my SendGrid password, I hardcoded it into a configuration file and pushed my application. To my surprise, trying to connect to SendGrid resulted in an authentication error because my password was incorrect. So I again visited the dashboard and yes, the password was now different. It turns out that, and I don't know if this is still the case, the process of running a "cf bind" command would result in the SendGrid credentials being changed. What I didn't realise is that service credentials are set in the VCAP_SERVICES environment variable of the running JVMs, and you're supposed to extract credentials from there. This is just a regular environment variable, with JSON content. All you need to do is grab it and parse out the credentials that you need, either using one of the many code samples or libraries on GitHub to do this. From a development perspective, I now have a tiny dependency on this VCAP stuff, and I need to make sure that my local Apache Tomcat instance is configured in the same way, with a VCAP_SERVICES environment variable on startup.

Some time later, SendGrid moved to v3 of their API, which included a new version of the Java library. So I upgraded, which resulted in the API calls failing. After signing in to the SendGrid dashboard, I noticed that I now have the option of connecting via an API key. Long story short, I ditched the VCAP stuff and configured the SendGrid client to use the API with the API key, which I've also added to my deployment configuration.

Other services

I used the Pivotal SSL Service for a while too, which provides a way to upload your own SSL certificate. When used in conjunction with the Cloud Foundry router, you can serve traffic from your own domain name with a valid SSL certificate. I also had a few issues with this, resulting in downtime. The Java applications were still running and available via the cfapps.io domain, but not via the structurizr.com domain. I've since switched to using CloudFlare's dedicated SSL certificate service for $5 per month. I did try the free SSL certificate, but some people reported SSL handshake issues on some corporate networks when uploading software architecture models via Structurizr's web API.

I also used the free Redis marketplace service for a while, in conjunction with Spring Session, as a way to store HTTP session information. I quickly used up the quota on that though, and found it more cost effective to switch to a Redis Cloud plan directly with Redis Labs.

PaaS without the marketplace

There are certainly some benefits to using the marketplace services associated with your PaaS of choice. It's quick and easy to get started because you just choose a service, subscribe to it and you're ready to go. All of your services are billed from, and managed, in one place, so that's nice too. And, with Cloud Foundry, I can live with configuration via the VCAP_SERVICES; at least everything is in one place.

If you're just starting out with PaaS, I'd certainly take a look at the marketplace services on offer. Your mileage may vary, but I find it hard to recommend them for production use though. As I said at the start of this post, the core PaaS functionality on Pivotal Web Services has been solid for the three years I've been using it. Any instability I've experienced has been around the edge, related to the marketplace services. It's also unclear what you're actually getting in some cases, and where the services are running. If you look at the ClearDB plans, the free plan ("Spark DB") says that it's "Perfect for proof-of-concept and initial development", whereas the $100 per month "Shock DB" plan says "Designed for apps where high performance is crucial". These plans are not listed on the ClearDB website, so it's hard to tell whether they are multi-tenant or single-tenant services. Some of the passwords created by marketplace services also look remarkably short (e.g. 8 characters) considering they are Internet-accessible.

With all of this in mind, I prefer to sign up with a service directly and integrate it in the usual way. I don't feel that the pros of using the marketplace services outweigh the cons. I'm also further reducing my migration cost, should I ever need to move away from my PaaS. In summary then, the live deployment diagram for Structurizr now looks like this:

Structurizr - Deployment

The Java applications are hosted at Pivotal Web Services, and everything else is running outside, yet still within Amazon's us-east-1 AWS region. This should hopefully help to address another common misconception that you need to run everything inside of a PaaS environment. You don't. There's nothing preventing you from running Java applications on a PaaS, and have them connect to a database server that you've built yourself. And it gives you the freedom to use any technology you choose, whether it's available on the marketplace or not. You do need to think about collocation, performance and security, of course.

So that's a summary of my experience with the marketplace services. In part 4 I'll discuss more about my build/deployment script, and how straightforward it is to do zero-downtime, blue-green deployments via Cloud Foundry. Comments or questions? Tweet me at @simonbrown.

PaaS for Java developers - Part 2

Vendor lock-in?

In part 1, I introduced Platform as a Service (PaaS) and discussed how you can use Pivotal Web Services and Cloud Foundry as a way to easily deploy applications without worrying about the underlying infrastructure. A common misconception with all of this is that using Cloud Foundry (and Pivotal Web Services, or another implementation) results in vendor lock-in.

Back to Structurizr, which is a collection of tooling to visualise and document software architecture. The system context diagram looks like this:

Structurizr - Context
Structurizr - Context - Diagram key

In summary, authenticated users create and upload software architecture models using the Structurizr Client libraries (Java and .NET), and then view the content of those models via the web. Structurizr uses SendGrid to send e-mails, and all payment processing is performed by a combination of Taxamo and Braintree Payments. Some other services (e.g. CloudFlare, Pingdom and Papertrail) are also used, but not shown on the diagram.

From a (C4 model) containers perspective, Structurizr is as follows (the external services have been omitted from the diagram because they are not relevant to this discussion):

Structurizr - Containers
Structurizr - Containers - Diagram key

In essence, Structurizr is made up of a client application running in the web browser (HTML, CSS and JavaScript), while the server-side consists of a Java web application serving https://structurizr.com, another serving https://api.structurizr.com plus some data stores (MySQL, Redis and Amazon S3). The two Java web applications are running on Pivotal Web Services.

Both of the Java web applications are based upon Spring MVC and they are implemented following a number of the principles described in the twelve-factor methodology. In reality though, from a technical perspective, both applications are just typical Java web applications that have been designed to run on the cloud. Both applications are stateless and they don't write important information to the local file system.

Vendor lock-in and migration cost

Let's talk about vendor lock-in or, as Sam Newman says, "don't think lock-in, think migration cost". All development for Structurizr is done using IntelliJ IDEA on a Mac, with Vagrant being used for running local copies of MySQL and Redis. Nothing in the codebase is tied to, or even aware of, Cloud Foundry and I don't have Cloud Foundry running locally. The Java applications are standard Java EE .WAR files that are deployed to an Apache Tomcat instance running locally.

Pushing the Structurizr "Web Application" (using the "cf push" command) results in the web application being deployed on Pivotal Web Services and available at a URL of https://structurizr-web.cfapps.io. If I wanted to migrate that web application to another provider, here's what I would need to do.

  1. I would first need to find another PaaS that supports Java 8 and Apache Tomcat 8.x, or build my own server to do this. The JVM additionally requires that the "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files" are installed, because the web application is making use of some of the stronger encryption algorithms when storing data.
  2. Given that there are multiple instances of the web application running (Pivotal Web Services transparently handles this behind a single URL), I need to mirror this setup too.
  3. The DNS for the structurizr.com domain is being managed by CloudFlare, and some CNAME records need to be changed to reflect the new deployment location.

That's it. My deployment script will need to change, but no code changes are required. Pivotal Web Services does provide some additional features on top of Cloud Foundry, such as the their dashboards, which are handy for monitoring and management, but they are not an essential part of my applications. In summary, I don't really have any vendor lock-in and the migration cost is low. After all, my Java web applications are just regular Java web applications, with no dependencies on Cloud Foundry.

Now, you might be thinking, "wait a minute, what about those data stores?". Well, that's a different story. Stay tuned for part 3 where I'll discuss my experiences with the additional marketplace services available at Pivotal Web Services and why I don't recommend that you use them. As before, if you have comments or questions, you can find me at @simonbrown on Twitter.

PaaS for Java developers - Part 1

Introduction

I've been a software developer for over 20 years, during which time I've built many different types of software systems in many different environments, the majority centred around Java and web technologies, but I've used others too. I've also been fortunate enough to have been involved in the full life cycle of software development; from inception through to delivery. What I haven't done much of is infrastructure. Sure, I've spun up a few servers but the majority of production infrastructure provisioning was performed by an infrastructure team. I don't think this is unusual. Provisioning production-grade infrastructure is a specialised task; requiring knowledge about installing, configuring and hardening operating systems, databases, load balancers, firewalls, etc. If I'm honest, it's not something I'm particularly interested in either, especially given the never-ending supply of things to learn in the software space.

I remember having a number of discussions with people at Craft Conference in Budapest, three years ago, discussing deployment options for my startup called Structurizr. An early demo was running on a single Ubuntu server at Rackspace, but this raised a number of infrastructure and operations questions that I didn't necessarily have the knowledge to answer. One of the suggestions I received was that I should look at Pivotal Web Services. I'd played with this during a pilot project to run programming clubs at some of the local schools here in Jersey, but I'd not considered it for actually running my own apps.

Pivotal Web Services and Cloud Foundry

Pivotal Web Services is a commercial service offering based upon Cloud Foundry, which provides a "Platform as a Service" (PaaS). In essence, Cloud Foundry provides an application deployment platform via an API, abstracting the underlying infrastructure, whether installed on a public cloud, private cloud or a bunch of servers in a datacenter.

Imagine that you're building a Java web application to run on Apache Tomcat. To get this running on the Internet, you need to provision a server somewhere, install Java and install Apache Tomcat before you can deploy your application. You also need to harden the server, configure SSL, apply patches on a regular basis, etc. And if you wanted two instances for better resilience, you now need to spin up a second server, repeat the previous steps and configure both servers to sit behind a load balancer (for example). Of course, you could, and probably would, use Vagrant, Chef, Puppet, Docker, etc to automate most of these steps.

With the Cloud Foundry command line interface installed, if I have a deployable Java .WAR or .JAR file, I can simply run a "cf push" command to deploy it to Pivotal Web Services. That's it! I can also use the "cf scale" command to ask Pivotal Web Services to scale up the application (e.g. add more RAM), or scale out the application (e.g. add more instances).

If I need to update the application, another "cf push" command is all it takes. Doing this will stop the currently running instance and replace it with the new version, so some downtime will be experienced. However, Cloud Foundry makes it super-easy to do Blue-Green deployments using the command line interface. As a Java developer, this gives you the ability to setup a zero-downtime continuous delivery pipeline in minutes, without any specialised knowledge of the underlying infrastructure.

Applications over infrastructure

As Joshua McKenty says on a recent podcast with Cisco Cloud, Pivotal Web Services and Cloud Foundry provides a higher level of abstraction for developers to work with. As a software developer, I'm dealing with applications, rather than infrastructure or containers. There's a lot of hype around Docker at the moment. And I think Docker is a fantastic technology. But, as a software developer, I want to deal with applications, not containers and infrastructure. I think that Docker is somewhat of a distraction for most software developers, but that's a different blog post.

In part 2, we'll look at some of the common misconceptions associated with PaaS, such as vendor lock-in. If you have comments or questions, you can find me at @simonbrown on Twitter.

Free e-books for software architecture meetups

To get more people thinking and talking about software architecture, I'm offering free copies of my Software Architecture for Developers ebooks for meetups. Simply organise a meetup on a software architecture related topic (see below) and send me a link to your meetup/event page by e-mail ([email protected]). I will help you promote the event on Twitter, etc. Then, a few days before the meetup itself, send me another e-mail indicating what the expected audience numbers will be and I'll send you a special URL that you can distribute to the attendees for them to download a free ebook related to the theme of the meetup.

Software Architecture for Developers: Volume 1 - Technical leadership and the balance with agility

Software Architecture for Developers: Volume 1

If you would like a copy of volume 1, try to organise a meetup related to the following topics: software architecture basics, the software architecture role, technical leadership, software architecture and agile, etc.

Software Architecture for Developers: Volume 2 - Visualise, document and explore your software architecture

Software Architecture for Developers: Volume 2

And if you would like copies of volume 2, try to organise meetup related to the following topics: diagramming software architecture, the C4 model, documenting software architecture, exploring software architecture, etc.

Drop me a note if you have any questions.

Grouping Components by Use Case

Easing the Navigation from Use Cases to Components

Recently I’ve been creating many context/system diagrams but have needed to link them to use cases. This is due to the nature of the development process and the need to identify affected components from the starting point of a use case.

For example, given the diagram:


A system context diagram for a financial risk system


and the following Use Cases:

  1. Import trade data from the Trade Data System.
  2. Import counterparty data from the Reference Data System.
  3. Join the two sets of data together, enriching the trade data with information about the counterparty.
  4. For each counterparty, calculate the risk that the bank is exposed to.
  5. Generate a report that can be imported into Microsoft Excel containing the risk figures for all counterparties known by the bank.
  6. Distribute the report to the business users before the start of the next trading day (9am) in Singapore.
  7. Provide a way for a subset of the business users to configure and maintain the external parameters used by the risk calculations.

I overlay a box that captures the components involved in an interaction within the use case.


A system context diagram for a financial risk system with a use case tagged


I’ll repeat for all the use cases - although this may look quite messy if you try to include all on the same diagram. Using different colours/lines for each use case can help here.


A system context diagram for a financial risk system with multiple use cases tagged


This allows me to:

  • Quickly identify which components are used in a use case.
  • Identify components affected by new use cases or changes to a current one.
  • Identify high coupling and ‘god objects’ (items that are used by everything).
  • Make sure all components are in a use case - otherwise they might not be necessary or there is a missing use case. (Important if you are driving development from uses cases/stories.)
  • Helps indicate where diagrams (or systems) could be split into multiple, simpler ones.

Structurizr has the ability to tag components with use cases, which you can then filter on to achieve a similar effect:


A system context diagram for a financial risk system with filtered use case


I find this to be a simple way to help bridge the gap between static and dynamic views and break down complex systems.

Do any of you do anything similar or have a completely different way to map use cases to components?

Visualising and documenting software architecture cheat sheets

My cheat sheet summarising the C4 model has now been updated, and I've created another to summarise my thoughts on how to document software architecture. Click the images for the full-size (A3) PDF file.

Visualising software architecture Documenting software architecture

I hope you find them useful!

PlantUML and Structurizr

Create models not diagrams

Despite the seemingly high unpopularity of UML these days, it continues to surprise me how many software teams tell me that they use PlantUML. If you've not seen it, PlantUML is basically a tool that allows you to create UML diagrams using text. While the use of text is very developer friendly, PlantUML isn't a modelling tool. By this I mean you're not creating a single consistent model of a software system and creating different UML diagrams (views) based upon that model. Instead, you're simply creating a single UML diagram at a time, which means that you need to take responsibility for the consistent naming of elements across diagrams.

Some teams I've worked with have solved this problem by writing small applications that generate the PlantUML diagram definitions based upon a model of their software systems, but these tend to be bespoke solutions that are never shared outside of the team that created them. Something I've done recently is create a PlantUML exporter for Structurizr. Using PlantUML in conjunction with the Structurizr open source library allows you to create a model of your software system and have the resulting PlantUML diagrams be consistent with that model. If you use the Structurizr's component finder, which uses reflection to identify components in your codebase, you can create component level UML diagrams automatically.

A PlantUML component diagram

Even if you're not interested in using my Structurizr software as a service, I would encourage you to at least use the open source library to create a model of your software system, extracting components from your code where possible. Once you have a model, you can visualise that model in a number of different ways.

Application Architecture and Ransomware

Protecting data from Cryptolocker and variants

Ransomware and Cryptolocker

Ransomware is an increasing threat to many organisations - I recently had a conversation with a (non-IT) friend whose employer had been affected, which is why I’m writing this. These are attacks where a system or data are made inaccessible until a ransom is paid. This form of extortion actually dates back to the 1980s but recent variants, such as Crytolocker, are very dangerous and destructive on modern networks.

Often the initial infection is via a phishing email that contains a link to a website, that if clicked, will download the malware. This will scan all files that the user has access to and starts encrypting them. Once the files are encrypted the user will be sent a message telling them of the infection and offering to decrypt in return for payment (usually in bitcoins). Of course the user has no guarantee that their files will be decrypted even if the ransom is paid.

Applications and Processes

If an individual's machine is infected then they might lose all their personal documents. If they are using remote drives and shares, which have multiple users, then the infection may also lock other people's files. If a user has access to a large number of files across an organisation then this could be devastating.

These are all files that a person has access to. This includes any files used by applications along with documents etc. Therefore if a developer or operational user becomes infected then the systems files they have access to can be affected. It’s very common for technical employees to have access to the files of production servers in order to make issue resolution easy. For example; log files, configuration files, data exports/imports etc.

If the technical users have write access to a mapped drive on a production server then it is trivial for the malware to encrypt these files. This may take down the service (if runtime files are affected) or even destroy the data making the service impossible to run even after a reinstall. Remember that your databases will ultimately have their data stored in files on a disk somewhere.

If people with elevated privileges are infected, you can lose entire systems as well as that person's individual files.

Preventative Actions

I won't give advice here on Endpoint Protection (antiviruses etc.) as that out-of-scope for this blog but there are many data related actions you should consider with respect to your applications.

Audit

Many of you will be reading this and thinking "well we don't allow access as you've described here" but technical staff will setup systems to make their jobs easier. Has your organisation ever performed a data audit and classification? Do you know what files, shares and sections of your network each user has access to? If you haven't then I'd strongly advise you do so - you may be surprised at what you find. There are many commercial and free tools to assist you in doing this.

Restrict user access

You should define your users, what groups they are in and what data they have access to. This is good practice anyway (for reasons of privacy, data loss prevention etc) but if you reduce the total number of files accessible than any infection will have less effect.

File Permissions

If someone really needs access to files do they require write access? Log files and configuration files are a perfect example. A user shouldn't be writing to a log file and if they want to change some configuration then they should go through your normal release process rather than hacking it in manually. If you can't release configuration quickly enough, then your release process may be your real issue...

Don't share users between people and applications

A person shouldn't be using an account used by an application and the applications shouldn't be using personal accounts. Again you may claim this isn't happening but technical users often take shortcuts like this to release quickly (or get around approval processes). A good audit should pick up on this.

Don't use the same user for all applications (or use root!)

It's tempting (for ease of management) to create a single account and get all applications to run as this account. If this account is compromised then all data for all applications are vulnerable. Use specific accounts for applications to reduce lateral movement between systems.

Don't give administration permissions to interactive accounts

If a login account is used to run a web browser or email then it should have restricted permissions. Likewise any administrative account should not be able to run a web browser or email. Separate the concerns!

Analyse your Backup Policy

How do you backup your data? If you are using online backups, that are accessible to an infected user, then all your backups may get corrupted too! Maybe you should consider using WORM (write once read many) technology or at least use separate processes to move and permission backups appropriately once they have been taken.

Some malware may be stealthy and stay on your system for a long time before making itself known. Therefore incremental backups can be corrupted far back in time. Make sure you regularly test your restoration processes too.

Conclusion

It's important to remember that your data is the most important part of your application and valuable to your organisation. If something has value then nefarious parties can seek to take advantage of this. It's hard to stop some attacks but you can minimise the damage if you are attacked.

The architecture of a system should take into account where data is stored, how it is permissioned and who/what has access to it. It's very easy to become obsessed with the latest design patterns but basic data management is important and shouldn't be forgotten.