As a central function, developers must come to us for infrastructure design and hosting. We are here to ensure that all deployments adhere to policies and standards.
Change could only happen as fast as we could deliver it. Although we increased the throughput of services, we could not guarantee it. Occasional spikes in workload (such as WannaCry) caused delays to projects.
Work in Progress
We needed to understand and control the work in progress. The rate of change slows down over time due to an increasing body of policies (such as security policies getting stronger to counteract increasing threats) and requests get more frequent due to the wider business’ growing agility.
To understand WIP, we ensure requests must come via the team’s project coordinator. We set up a Kanban board to get visibility into the work requests we receive.
‘The Three Ways’ Improvement Plan
To control the flow, we needed a new way of planning. We plan the backlog on Friday afternoons for the week ahead. On Monday mornings, we discuss what is expected for the week. On Friday mornings, we hold a retrospective to see what went well, what did not, and what tasks to carry over.
By deploying infrastructure-as-code on public clouds, we have increased the speed of delivery by creating Continuous Deployment pipelines, so the bottleneck is no longer infrastructure delivery.
By creating standard development templates, such as containerization and auto scaling, we can maintain standards within the architecture. Developers are empowered to handle the infrastructure within the limits set by our policies. We publish these standards and run workshops across the business to explain them.
As the business moves towards to implementation of Continuous Integration (CI), manual deployment of the infrastructure was no longer fast enough. Automation is key to increasing flow, but that alone is not enough. We need to make sure that flow is only in one direction to avoid rework.
To reduce backward workflows, we set workshops with product teams to discuss changes collaboratively and exchange on a regular basis. This way, feedback is instantaneous.
We have given each individual architect and engineer responsibility for specific products and technologies. This encourages commitment and retains knowledge. We rotate support rotas, preventing individuals from being pinch-points. When someone else supports a product, they ensure that the documentation required to support it is available. Any changes to a product are peer-reviewed by the team, so feedback is immediate. This increases reliability and reduces repeat incidents.
Even with flow control and everything moving in one direction, systems do not always improve. Arguably, over time they move backwards due to the increasing technical debt. The next challenge was to improve the system.
Improvement comes from enhancements. Some will work, and others will not. The skill is in embracing the ones that do (from fast feedback) and failing fast with the ones that do not. This reduces investment in non-productive enhancements quickly.
We have a culture that leaves no room for blaming. We learn from failure.
We have an experimental nature within the team. Our engineers are free to innovate and experiment with technologies that deliver the strategy. Improvements are presented back to the group for feedback. This creates iterative improvement.
We have increased the release rate from one release every 3-6 months up to five releases per day, increasing the quality as fixes are released immediately. Without monolithic releases, work is not flowing backwards and there is no effort duplication. Without Ops repeating the infrastructure delivery, they are focussed on high-value work and increasing productivity.
The Pride index in the internal “Great Place To Work” survey has increased by 8 points over the last three years.
Moving to DevOps has increased efficiency, productivity and has significantly contributed to increasing the Ops team’s morale.