One of the challenges with agile methods is to get a clear perspective on how to measure process improvements. I recently had a brief discussion with a C-level executive at a small organization about this. His concern was that cycle time was meaningless because it depended so much upon the size of the work package. So how do we use cycle time as a meaningful measurement? What else can we use to measure process improvement?
Let’s look at the difference in measuring cycle time in an agile vs. non-agile environment. Then we’ll get to other measurements.
Cycle Time , Waterfall and Agile
First, let’s define cycle time. From iSixSigma we have:
Cycle time is the total time from the beginning to the end of your process, as defined by you and your customer. Cycle time includes process time, during which a unit is acted upon to bring it closer to an output, and delay time, during which a unit of work is spent waiting to take the next action.
This definition is important because it gives us a clue about the potential difference between a waterfall vs. agile method of delivering value. Let’s imagine the typical process used in a waterfall environment. The following are the high-level steps:
- Customer / User / Stakeholder sees a need, validates it and submits a request to have that need fulfilled. This is when we start the clock on cycle time.
- The fulfillment organization (IT, Product Development, R&D) puts the request in a queue, backlog or requirements management system.
- Along with other requests, the fulfillment organization schedules the work on the request, usually by creating a project to fulfill it and other related requests. The project is estimated at a high level, the current status of in-flight projects is noted, and the new project is prioritized relative to other projects.
- At some point, based on the schedule and the reality of the work on other projects, the project containing our customer’s request is started. Here, “started” means that detailed requirements are gathered.
- After sufficient requirements are gathered, a detailed technical analysis is done including architecture, high-level design, risk analysis, etc.
- Development begins. (Note: many people mistakenly start measuring cycle time here.)
- Developers and testers work to validate the results of development and fix any problems discovered.
- Final acceptance testing is done.
- The results of the project are deployed to users, sold to the client, or in some other way passed back to the original requestor. This is when we stop the clock on cycle time.
So from the start of the customer request formally submitted to the time that the fulfillment of that request is made is our true cycle time. There are a few important things to note here. First, there is a queue of work based on requests made but not yet scheduled. There is another queue for work scheduled but not yet started. We know that if we can reduce the size of these queues, we can improve cycle time in a general sense. Second, we know that most organizations of any significant size will have different queues based on the urgency of the request. For example, a high severity bug discovered in the production system of a company’s largest client will be treated differently than a wish list item for a small not-yet-client. These two requests won’t even go in the same queue: the high priority problem will be quickly escalated to a support or development team that can work on it immediately. Third, it is tempting for the development group to measure their local cycle time. This is a Really Bad Idea since it leads to sub-optimizing behaviors. For example, it is easy for the development team to improve their cycle time by sacrificing quality… but this just causes the QA cycle time to increase, and probably the overall cycle time (true cycle time) is affected more than the local improvement in the development group’s cycle time.
Now let’s look at the steps that occur in an ideal agile environment:
- As before, the Customer / User / Stakeholder sees a need, validates it and submits a request to have that need fulfilled.
- That request is immediately placed in a ready state for the next iteration (cycle, sprint) of a delivery team. Elapsed time: maximum one month.
- Team completes the request including all work to actually deliver/deploy and work is delivered to the stakeholder at the end of the iteration. Elapsed time: maximum two months.
So the ideal method of doing agile has a maximum cycle time of two months to deliver from the time a request is made… how many teams are doing this? Not many.
The ideal is extremely difficult to accomplish. Getting to that state requires that the development organization catches up to the business side so that there are zero pending requests at the start of each iteration. It also requires that the business side users and stakeholders are able to articulate their requests so that they are small, and appropriately detailed for the team doing the work.
A realistic agile implementation actually is a lot more messy. Depending on the type of request, the cycle time for a piece of work can vary widely. Some low priority items may take years even in an agile environment. A low priority request is made and approved but then never quite makes it into a project… and then once in a project never quite makes it to the top of the team’s product backlog. This is interesting to look at sometimes, but it points out another important aspect of measuring cycle time: mostly we care about average cycle time (or some other statistically interesting aggregate measure).
The predominant factor in most organizations’ cycle time is the number and size of the queues they use as work is processed. In most organizations there are several queues and most of them contain large numbers of requests or bits of work in process. Queues represent huge amounts of waste. It is easy to see that queue size and cycle time are closely related: the more items in a queue, the longer the cycle time.
This leads to a simple conclusion: regardless of lifecycle approach, reducing the size of an organization’s queues is one of the easiest ways to reduce cycle time. What are some common queues? There are often queues of projects, queues of enhancement requests, queues of defects to be fixed, queues of features, queues of tasks, queues of email (large inboxes), queues of approval requests, queues of production database changes. The number of queues increases the more an organization is oriented around functional groups, and the number of queues decreases the more an organization arranges work to be handled by cross-functional teams.
Cycle Time and Work Package Size
This is where queueing theory and agile methods intersect really well. Cycle time is related to the load on your system, in particular your units of work processing. In most organizations, teams are created to handle work. The more work given to a team simultaneously, the higher their utilization level. Many organizations like high utilization levels because it gives them a guarantee that people are doing valuable work all the time that they are paid to work. This is a completely false benefit and in fact is extremely destructive to overall productivity. From queueing theory we know that the cycle time for a piece of work increases exponentially to the utilization level. We see this whenever we over-load a server… but for some reason we fail to see this when we overload a person or a team or an organization even though it still happens.
Cycle time is also related to the variability in the size of the work packages. Low variability means that the exponential factor related to load is low, and high variability means that the exponential factor is high. In other words, if you have a highway that only allows motorbikes, you ca