There are lots of reasons to measure the productivity of your R&D team, including understanding the overall level of the team’s performance, developing benchmarks, tracking progress, identifying high and low performers, improving processes and operations, justifying your investments, and determining resource allocation.
But if you’ve ever attempted to measure your software team’s productivity, you’ve probably run across several snags in the process – from trying to define what productivity actually means, to attempt to identify the metrics that reflect that definition.
This post discusses the various challenges involved in measuring dev productivity and offers a KPI strategy to help you really understand the performance of your R&D teams.
Before you can even start to measure productivity, you need to define it. Usually, productivity is defined in terms of inputs and outputs, where you divide your output by your input to get your ROI. Using this method, we could, theoretically, measure the input and output of a software developer as follows:
Input:
Output:
It’s easy to see the problem, since it’s so difficult to measure the outputs themselves. Good measurements of output should have a strong correlation with revenue. However, if we break down the metrics for measuring those outputs, the correlation tends to be pretty weak.
For example, one common metric for measuring developer productivity is lines of code (LOC). The thing is, for several reasons, more isn’t always better. For example:
Further, the difference between physical lines of code and logical lines of code adds to the complexity involved in measuring LOC.
Considering the flaws inherent in defining software development productivity using inputs and outputs, it makes more sense to define it as a measurement of your team’s ability to quickly and efficiently create useful, good-quality software that’s easy to maintain and has high customer value.
Now that we’ve defined dev team productivity, we can try to figure out how to measure it.
The reality is, there’s no one metric that can be used to assess productivity, since each KPI, on its own, lacks important context. We’ve already discussed some of the problems inherent in measuring lines of code, but there are similar issues involved in all commonly used productivity metrics:
In addition to lines of code, other code-based metrics include commits, pull requests, code review turnaround time, code churn (also referred to as “rework), code coverage, closed change requests, and bug fixes. While all of these metrics are possible indicators of productivity, the same issues apply: They’re all missing important context and can be gamed in a variety of ways that almost always sacrifice quality.
Velocity measures the amount of work your team can complete during an average sprint. The team assigns points to each story (based on estimated complexity, risk, and repetition), then calculates the average amount of points achieved each sprint over a period of time (minimum 5 sprints). This is one way to track your team’s overall progress and figure out how realistic your team’s goals are.
However, if we look at how we’ve defined productivity, it’s easy to see how quality can be sacrificed to maintain or improve velocity. Cognitive bias can also play a part by inflating the amount of points assigned to each story.
Function points are units that measure the functionalities that a software service or product delivers to its end users, meaning what the software can do in terms of tasks and services. The advantage of function points is they are relatively agnostic to the technology model as well as the dev model used. These points are assigned using the Function Point Analysis (FPA) rules, which include 5 components: external inputs (EI), external outputs (EO), inquiries (EQ), internal logic file (ILF), and external logic file (ELF). However, since the assignment of function points is made by the team, they’re subjective and can be manipulated.
Pull request metrics, such as time to merge, lead time, size, flow ration, and discussions, have increased in popularity recently as a way to measure development productivity. However, tracking these metrics doesn’t factor in the effort or impact of the work performed. It can also be unfair to developers who are working on a legacy codebase, compared to developers working on a greenfield project.
This is a graphic representation of how much work has been completed during a sprint and the total amount of work remaining in the sprint. While good for identifying issues such as scope creep and oversaturation of features, it lacks several important factors as well as the context needed to get the full picture.
Both manager and peer evaluations are a good way to give real context to the performance of your team, however, they are highly subjective and can be subject to abuse or bias.
Every company and development team is different, with its own set of dynamics and processes. Using a combination of the above metrics, then weighting them (and fine-tuning the weighting over time) in terms of their importance to your team and company leadership, is a good way to measure your team’s overall performance, dynamics, and the efficiency of your processes.
For example, you create a set of criteria, such as correlation with revenue, continuous delivery, high quality, function to user, team cooperation, objectiveness, etc.
Then score each metric on a scale of 1-5, based on how well that metric meets the above criteria. For example, you might give Pull Request metrics a high score for team cooperation, but a low score for correlation with revenue. While functionality points could score high for revenue correlation, they might score low for objectiveness.
Once scored, you can weigh these KPIs using the scores, in order to reach a final productivity score that more accurately reflects your team’s performance. In addition, multiple metrics are less subject to abuse and bias than any single metric.
It’s also important to measure the entire team vs. each individual member, since the true scope of software team productivity is far larger than any one developer on your team.
Yes, each team member’s success has real value, but since we’re trying to increase the creation of useful, high-quality, easy-to-maintain software, many more factors must be taken into account in order to get a real understanding of your team’s productivity.
Since single KPIs can be misleading when measuring software engineering productivity, we recommend using a combination of weighted KPIs, which can offer more in-depth and nuanced insights into your software team’s performance.
Tabnine Enterprise is an AI code generation tool that helps software engineering teams write high-quality code faster and more efficiently, accelerating the entire SDLC. Designed for use in enterprise software development environments, Tabnine Enterprise offers a range of features and benefits, including the highest security and compliance standards and features, as well as support for a variety of programming languages and IDEs.