Throughout my years as an engineering leader I've always been interested in engineering metrics as a topic. Compared to other orgs, Engineering is the hardest to represent quantitatively. And this makes it hard for stakeholders to be confident about how well we’re performing. We may feel qualitatively great about the products we're shipping, the innovations we're driving, the engineering culture we’re promoting, etc. – but how do we really compare to a world-class organization?
Software development is inherently hard to measure. Most of what we work on delivers value later, many steps in the future, making it hard to measure direct outcomes such as revenue or NRR in relation to software development. In many cases the products and enhancements we're working on now won’t be in customers' hands until next quarter or next year. And even once that work ships, revenue attribution is hard. When a customer buys a product, we might be able to tell that a given feature or model contributed to the sale, but how much? And how much was the value of that feature influenced by other features? The value is a complex function of the whole.
And probably because of this there just haven't been as many established best practices for measuring and reporting on Engineering. There hasn’t been a standard playbook for the engineering leader to reach for and pick a reporting strategy off the shelf.
But the situation has been steadily improving.
While there hasn’t been some breakthrough “Eureka” moment, recent years have seen steady progress such as the introduction of standard frameworks for engineering metrics like DORA and SPACE, as well as the emergence of packaged Engineering metrics products, sometimes called Software Engineering Intelligence (SEI) Platforms or Engineering Management Platforms (EMPs).
Engineering metrics is a huge topic and I couldn't possibly hope to cover it all in a single post. But for starters, I want to provide a high level survey. What’s happening in the key areas of standard models, what’s happening in the product domain, and at a high level, what actions do I see Engineering leaders at various stages taking to integrate metrics strategies into their organizations? Even that scope proved a bit large for a single post, so in this first post, I’ll look at the landscape – what’s happening with standards and products. And next time we will dive into recommendations about how to proceed.
Background
Before we dive in, it’s worth asking, what are the goals of Engineering metrics? Of course, this is the “big question” – what do we need to measure and why? Assuming we have two very macro goals – (1) to have solid telemetry about the way our org is working to spot problems and opportunities for improvement, and (2) to be able to give a disciplined quantitative view of how well our org is performing to our stakeholders including the rest of the executive team and the board – then there are pretty clear themes that need to be included in any approach, for example:
Flow and Velocity: Is work flowing efficiently through the team? And if we make changes to the environment – improved processes, better tools, more staff, etc. – can we see the impact clearly in terms of measurable capacity improvements? This is probably the first area that most people, including our stakeholders, think of when we talk about “engineering metrics.” How much stuff are we producing? But of course it doesn’t come close to telling the whole story.
Delivery Predictability: Does the team provide good predictability to other functions and customers around reasonable delivery timeline expectations? This is usually a close second to pure velocity in terms of stakeholder expectations, and is legitimately important to running a business at any scale beyond the startup stage.
Allocation: Does the actual investment of resources, especially developer time, align with our product strategy, including high level goals such as achieving an appropriate balance of effort towards innovation versus keeping the lights on (KTLO)? This is among the highest value areas to measure, and one that’s sometimes underappreciated. There tends to always be so much focus on “going fast” when in many cases the best velocity improvement that can actually be achieved is by “going in the right direction.”
Quality: Do our engineering practices result in good quality software that both provides customers with a strong perception of quality and avoids costly bug fixing and rework, and incident response? This is frequently among the most valued areas of measurement from the team actually doing the real work of engineering, and provides the natural counterbalance against pure velocity, ensuring that we don’t just go fast today, but we also allow ourselves to go fast in the future.
Developer Experience: Is the engineering environment conducive to engineers doing good work without undue friction and with appropriate attention to continuous improvement? This area has become a major focus of engineering metrics strategies in recent years, as it’s among the best ways to actually uncover specific and actionable areas for improvement. Stakeholders may see this as a “soft” area of metrics since it tends to be measured by survey and include measures of sentiment, but it has proven to be one of the best ways to actually inform engineering management about where to focus attention.
Financial Efficiency: Is the team achieving its outcomes at reasonable costs relative to best practices and appropriate to the demands of the product strategy? Especially in the current economic environment, this area has elevated in importance.
State of the Standards
So we have a general picture of the types of things that we’d like to measure, but looking at the vast space of things we could measure, that still leaves us with a lot to decide. This is where standard models are emerging to play a critical role.
Especially given the complexity of measuring an engineering organization we need standards versus DIY approaches to know that our models are grounded in thoughtful design. Building on established standards means not only am I measuring things that have been proven to matter, but also that I can build on shared best practices around how to explain the relevance of these metrics to my stakeholders.
And without standards how can we possibly get a picture of appropriate benchmark values? On the key metrics that matter, how does an org that looks like mine typically perform? And where are my opportunities for improvement given my metrics? Standards create the opportunity to look across the industry and determine what good looks like from a quantitative perspective.
DORA
The DORA Framework, named for the DevOps Research and Assessment (DORA) group (currently part of Google), is the oldest and, without question, best known standard in the space. DORA is fairly universally known, and has a level of adoption that is a wide margin beyond newer models. Reinforcing the popularity of the DORA model, the DORA group also publishes the widely read and influential annual DORA Report, which provides an annual research-based overview of the state of software development via the lens of DevOps practices, including an annual survey across the industry benchmarking performance on the DORA metrics.
The magic of DORA is in the statistical approach it took to overcoming the challenges of measuring direct outcomes in software engineering. To develop the standard, the DORA group did a broad analysis of the correlation between various DevOps and Engineering metrics and associated company performance to find the metrics that are predictive of higher performing teams.
Briefly, the DORA metrics are:
Deployment Frequency: This measures how often a team successfully releases code to production.
Lead Time for Changes: This tracks the time it takes for a code change to go from commit to production.
Change Failure Rate: This calculates the percentage of deployments that result in a failure in production.
Mean Time to Recovery (MTTR): This determines how quickly a team can restore service after a failure occurs.
Reliability (added more recently): A measure of how reliable the software is (e.g., error rates, availability, MTBF, etc.)
While DORA is extremely well known, influential, and increasingly widely adopted, it is not without its shortcomings:
One issue is its focus on DevOps operations which makes it most applicable to SaaS / online service delivery environments. It’s less clear how to apply it to traditional software development teams, e.g., ones that ship OSS or on prem software. Also, in practice, DORA is fairly coarse grained – metric values are bucketed into categories like “high,” “medium,” and “low,” giving a general sense of performance, but failing to offer finer grained guidance on ROI of investments. And perhaps the most common critique is the observation that "correlation does not equal causation” – the DORA metrics are outcome metrics clearly indicative of performance but aren’t necessarily the best things to focus on in terms of how to improve. This property of DORA makes it both less clear as a management tool and also harder to explain and justify to stakeholders – are these indicators or knobs that I can turn?
Limitations aside, DORA has become widely adopted and is the closest thing to a de-facto standard for Engineering reporting out there.
SPACE
The SPACE framework was developed as a clear successor to DORA by a number of the original DORA researchers. The main goal of SPACE is to provide a more comprehensive picture of development metrics than DORA. In terms of adoption, SPACE has been widely influential in establishing a framework for understanding the categories of metrics that should be employed. Although as a high level framework with lots of latitude for selecting actual concrete metrics, it’s difficult to really tell how frequently SPACE is being implemented. It’s kind of like the situation with SCRUM, where most teams run a SCRUM-like process of some form, but hardly anyone does it “by the book.”
The categories of SPACE metrics include:
Satisfaction and well-being: Engineer happiness, engagement, and job satisfaction (survey based) (closely related to the “Developer Experience” area described above).
Performance: Effectiveness and quality of the work (e.g., change failure rate, mean time to recovery, and the time taken to complete code reviews).
Activity: Amount of work or output, such as volume of deploys, code commits, pull requests, etc.
Communication and Collaboration: Evaluates how well engineers collaborate and communicate within teams (survey based).
Efficiency and Flow: Looks at how well engineers manage time and resources to complete tasks, including bottlenecks or delays (e.g., cycle time).
SPACE is clearly very comprehensive, but therein lies its main challenge. As a general framework with categories of metrics to consider, it is not very prescriptive. Adoption involves a design problem of selecting the right set of metrics for your organization. And without more actual standardization of the metrics, it isn’t nearly as supportive of broad based benchmarking as something like DORA.
Vendor Models
With DORA, we have something very specific and prescriptive, and supportive of benchmarking, but that’s a little too narrowly defined. With SPACE we have a solid template for understanding what a good metrics model should cover, but we don’t have the actual opinionated model. Unsurprisingly, vendors in the space, which I will talk about more below, have stepped in to fill the gap, and in doing so to gain an advantage in supporting the proposed model in their products. Practically every vendor in the Engineering metrics space has a proprietary model, but the following instances are the most notable:
DX Core 4
Developed by DX, the Core 4 is probably the best known and most influential of the vendor supported models. To develop the Core 4, DX engaged a number of the authors of both DORA and SPACE as research advisors, and so their framework blends some of the best ideas from these models. Released last year, Core 4 has received widespread attention from thought leaders in the space.
LinearB Benchmarks
LinearB has published a wide range of benchmarks from across their customers, mostly centered around flow metrics as observed through PRs.
Jellyfish SEI Maturity Model
Jellyfish recently published their Software Engineering Intelligence (SEI) Maturity Model . Also clearly influenced by SPACE and providing good coverage across all of the SPACE areas, the Jellyfish SEI Maturity Model provides a noticeably pragmatic take on metrics, with an eye towards stakeholder consumability of the measures.
Vendor models are fairly early in their existence, so adoption is hard to assess. As a general expectation, it’s likely that truly proprietary models won’t really win the day, as the idea of measuring Engineering orgs is unlikely to be monopolized by a single vendor. Rather, I anticipate that these vendor models will contribute progress towards more general vendor-agnostic standards. But in the meantime, while there is not a clear winner in the standards space, vendor models offer pragmatic, well considered approaches to what to measure – looking beyond DORA, while being more opinionated than SPACE, and likely a breeze to adopt in your SEI platform of choice.
State of the Products
Speaking of SEI platforms, this product category has emerged as a vibrant and fast moving space. Years ago engineering metrics were a fully DIY problem. Every engineering leader had their pile of homegrown data extraction jobs, ETL scripts, spreadsheets, and SQL statements that made up their metrics implementation. If they were lucky enough to be in an organization with mature data warehousing and BI infrastructure these might be built on quality foundations, but often these were run on, shall we say, fairly primitive tools.
Today a whole crop of products have emerged to provide engineering Leaders with out-of-the-box reporting and metrics capabilities for their function. Packaged SEI platforms are of course a win in terms of simplicity over the former DIY approach. But they're also a boon to the industry in terms of promoting and easing the adoption of standard metric frameworks, which in turn strengthens the engagement and improvement of these models, as well as making benchmarking across large numbers of orgs a possibility.
The Case for Specialized Products
On the surface it may seem like the value of products for engineering reporting is just around not reinventing the wheel. Why should every engineering org need to recreate the same jobs to pull data from key systems like Jira, the same transforms to populate a reasonable schema for reporting and BI, the same SQL to compute key metrics like cycle time, work in progress, and so on? But at the same time people familiar with data warehousing might ask, isn't this something I can do with standard tools?
Looking more closely at the problem, we find deeper reasons for packaged solutions being valuable in this space. As is so often the case, the problem is the data. The systems of work that support software engineering operations – ticketing systems like Jira, SCM systems like GitHub, etc. – simply weren't designed for reporting. These are transactional systems meant to support work.
For example if you want to look back in time and understand the state of the tickets in your Jira environment at various points in the past, you’ll quickly find that it’s a pain. There’s an API to get the history of changes for items, but it was designed to power a UI, not to support reporting. Some changes to issues, such as those applied with bulk updates, aren’t even tracked. To do it right, you really need a rigorous approach for tracking changes as they happen, and then you need to transform these into a reasonable data model for querying in aggregate and with time trending.
Can you do this yourself? Of course. It’s just code! Would I recommend it to a friend? That’s a different matter. Practically speaking, what most in-house implementations do is compromise on goals to make implementation costs reasonable, for example accepting some amount of imperfection in the data, accepting some limitations on the types of queries that can be run, etc. A packaged product that does this right and amortizes the cost across many customers is clearly the preferable solution.
SEI Platform Capabilities
The case for SEI products is clear, but what should you be looking for in terms of capabilities of a platform? Since the space is relatively new and quickly evolving, it can be a bit bewildering to try to sort this out by looking at vendor products and directly comparing what they offer. You’ll quickly find that there are many options and possibilities. But in terms of the core capabilities, the product architecture is actually pretty clear, consisting of these areas:
Integration
Job one of an SEI platform is gathering all of the data from your engineering systems into one place for analysis. As discussed above, this may sound easy, but it’s no small amount of challenge and value.
The anchor systems that are required are without a doubt the ticketing system (Jira, Githib Issues, etc.) and the SCM system. These are the foundations of understanding most core metrics around the flow of work and productivity. Probably the next priorities in services environments are DevOps systems such as CI/CD, and Incident management (e.g., PagerDuty or similar). These play into very core metrics such as DORA deploy frequency, deploy failure rate, and time to restore service.
Beyond these core systems, there are many possibilities, from observability systems to pull in more application workload data, calendar and collaboration systems to better understand how people are working and allocating effort, security systems to understand activity related to security, and many more.
Packaged Metics
Of course, pulling the data together is only part of the challenge, Making sense of it all is the real trick. SEI platforms really deliver value on the basis of the packaged analytics they provide.
At this point, having decent coverage of the metrics standards, especially DORA, is table stakes. Most leading SEI products offer DORA metrics out of the box, along with good coverage of metrics that align with SPACE, even if they’re not always clearly labeled as such. Coverage of the standards along with basic variations, for example, different ways of looking at cycle time at various levels such as the epic, the story, and the PR, provide a strong foundation for implementing a metrics strategy.
And beyond these core metrics, most platforms provide metrics designed to support specific use cases, which we’ll discuss below.
Developer Experience / Surveys
Looking at self reported developer experience via surveys is a practically universal best practice. It’s widely agreed that we cannot look at system metrics such as PR flow alone, but rather need to understand the sentiment of the Engineers doing the work as well as their recommendations for the best areas to improve.
Given this universal importance of developer experience surveys for any sound metrics strategy, functionality to execute these surveys, along with thoughtfully designed survey questions to start with, are basically table stakes for SEI platforms at this point.
Benchmarks / Goals
Early experiences with SEI products, before they had matured and achieved adoption, and also before so much progress had been made on metrics standards, often left Engineering leaders with complicated tealeaves to interpret. Was the PR cycle time I might discover in my team a problem? Actually OK? Somewhere in between?
To address this, it’s essential that SEI platforms provide some opinionated guidance, especially for key metrics, to discern whether they are looking good or cause for concern. Beyond this, it’s important that metrics platforms integrate a notion of goaling, the ability to set objectives for metrics and quickly be able to see in a UI or via alerts when things are off. In other words, SEI platforms should ideally be able to drive action, not just provide data.
Packaged Use Cases
Beyond looking at core metrics of organizational performance and health, SEI platforms seize on the opportunity presented by having so much valuable data captured to drive additional specific use cases. These vary by platform, by common examples include:
Team Operations - e.g., help frontline managers spot when work is stuck, e.g., a PR that is not getting attention, and trigger action
Financial Operations - help understand and report on the costs behind investments, including providing valuable input for reporting on Capitalizable Software Costs, a basic accounting requirement in many software organizations that requires reporting by the Engineering team.
Delivery Management - help track delivery commitments, including highlighting delivery risk and helping the organization achieve greater predictability.
Capacity Planning - help organizations ensure adequate staffing based on team objectives and standing capacity requirements such as maintenance / support / KTLO.
AI Impact
Perhaps the top specific metrics use case on the minds of most Engineering leaders is ensuring that they understand the implications of AI. How is adoption trending? What impact are we seeing in terms of productivity, and how is that trending? What will this mean for planning, including thinking about future headcount needs, as well as making sure to budget for additional expense incurred due to the compute and license costs of AI coding assistants,
SEI platforms, which are already tracking core productivity measures such as flow metrics, are seizing on the opportunity to fill the need of helping Engineering leaders manage the path to AI-enabled operations. The key ingredient for analyzing AI impact is getting a picture of what AI usage looks like, and tools such as GitHub Copilot are starting to provide APIs to such data. We can expect the capabilities in the area to co-evolve with the tools and likely represent a top area of innovation for SEI platforms.
Leading Products
To do a true overview of the products in this space would be a lengthy post in and of itself and I won't attempt to do that here. It's safe to say from humble beginnings less than a decade ago there's now an explosion of innovation and new product development in this market.
Briefly, some of the leading names include (listed in chronological order by original founding date):
Pluralsight Flow
Flow is the evolution of the former GitPrime product, which Pluralsight acquired back in 2019. GitPrime saw very rapid adoption in the early wave of Engineering metrics product introduction, building on a focus around helping engineering flow by highlighting metrics and alerts around activity in Git and Git services such as GitHub.
Jellyfish
Jellyfish is probably the longest tenured product looking at a broad spectrum of Engineering metrics, not just Git activity. Perhaps best known for its patented ML-based “work allocation” model, Jellyfish helps to expose metrics about where actual investment is directed, making it useful for looking at Engineering ROI and financial operations, as well as strategy alignment. This strategy alignment component gives it a strong foundation for reporting at the exec and board level, where execution against strategy and well managed resource allocation are key concerns. (Disclosure: I’m currently working as an Advisor with Jellyfish)
LinearB
Founded in 2018, LinearB provides a wide range of engineering insights, in particular from workflow in Git. Beyond coverage of the key metric categories, LinearB is perhaps best known for its integration into engineering workflows, for example with Slack alerting to ensure that work doesn’t get stuck.
DX
DX was originally built with a primary focus around assessing developer experience through surveys, but over time expanded their platform to offer a comprehensive set of metrics. One of DX’s key strengths is its extensibility; it’s built around a standard data warehouse / lakehouse model that allows new sources of data to be added, and the underlying queries that represent metrics to be customized. Additionally DX has engaged key researchers in the engineering metrics space, including many of the DORA and SPACE authors, as research advisors, ensuring that they are positioned to offer strong product support for standard metrics as they evolve.
Of course I’ve only scratched the surface here. There are many other products available with different strengths and specialties. And it seems like new entrants are arriving pretty quickly! Not to mention the growth of availability of metrics in other types of platforms. For example, Internal Developer Portals (IDPs) naturally have a comprehensive view of much of the organization's portfolio, from services to repositories to teams, making them a natural consumption point for metrics about those entities. Not to mention the built-in analytics in systems of work like Jira and Github. It’s safe to say that the space is still pretty complex, with lots of choices and a changing landscape.
What Next?
As we’ve seen, there’s lots happening on both the standards and the products fronts. On the one hand, this is great news. Now more than ever, Engineering leaders don’t need to invent approaches to measuring and reporting on their organizations. But at the same time, expectations about the sophistication and thoughtfulness of Engineering reporting are high and only increasing.
And there’s what feels like an unprecedented level of transformation happening in the software industry, making analytical data-driven management more important than ever. On the one hand, the massive AI revolution that’s underway is changing the fundamentals of software development - not just how we build, but also how we plan and budget. And meanwhile, we continue to operate in a complex economic context, which is no longer just “post zero interest rates” and all the efficiency concerns that brought, but now increasingly we’re in a volatile economic environment due to impact from the political sphere, placing a premium on our ability to be agile and adjust with changing business demands.
So charting a smart course in terms of engineering metrics and SEI is critical.
Next time we’ll dive into recommendations about how to approach it all!