DORA Metrics: The Engineering Leader's Guide to Measuring Team Performance
Most engineering leaders know they should be measuring team performance. The problem isn't a lack of data — it's that the most commonly tracked metrics are either useless or actively misleading.
Story points completed. Lines of code. Number of tickets closed. I've seen engineering organizations obsess over all of these, and in every case, the metrics told leadership what they wanted to hear while the actual state of the team slowly deteriorated underneath.
DORA metrics are different. Developed by the DevOps Research and Assessment team (originally led by Dr. Nicole Forsgren, Jez Humble, and Gene Kim), they emerged from years of rigorous research into what actually separates high-performing engineering organizations from the rest. The result is four metrics that, taken together, give you a remarkably clear picture of how well your team delivers software.
I've used these metrics across multiple organizations, from 10-person startups to teams of 50+, and they've consistently been the most honest signal I've found. Here's how they work, how to implement them, and — just as importantly — how to avoid the traps that make them counterproductive.
The Four DORA Metrics
Deployment Frequency
How often does your team deploy code to production?
This is the most straightforward metric of the four, and it's the one I start with when working with a new team. Deployment frequency is a proxy for batch size. Teams that deploy frequently are, by definition, shipping smaller changes. Smaller changes are easier to review, easier to test, easier to understand when something breaks, and easier to roll back.
High-performing teams deploy on demand — multiple times per day. Low performers deploy on a monthly or longer cadence, typically gated by manual release processes and fear of production incidents.
I want to be clear about something: deployment frequency isn't about moving fast for its own sake. It's about reducing risk. A team that ships one massive release every two weeks is taking on significantly more risk per deployment than a team that ships twenty small changes a day. The small-change team has shorter feedback loops, better traceability, and faster recovery when things break.
Lead Time for Changes
How long does it take for a commit to reach production?
This measures the elapsed time from when a developer pushes code to when that code is running in production and available to users. It captures everything in the delivery pipeline: code review wait time, CI/CD execution time, staging environment availability, QA processes, and deployment mechanics.
For high performers, this is under a day — often under an hour. For low performers, it can be weeks or months.
When I audit a team's lead time, the bottleneck is almost never the build pipeline itself. It's usually one of three things: pull requests sitting in review queues for days, a staging environment that's shared and constantly broken, or a manual approval gate that requires someone who's in meetings all day.
Lead time is where you find the process debt that silently kills team velocity. Every day of lead time is a day where a developer has moved on to new work, lost context on the change, and will be slower to respond when issues surface.
Change Failure Rate
What percentage of deployments cause a failure in production?
This is the quality counterweight to the speed metrics. Deployment frequency and lead time push you toward shipping faster; change failure rate tells you whether you're shipping responsibly. A "failure" here means a production incident, a rollback, a hotfix, or any degradation that requires remediation.
High-performing teams have a change failure rate between 0% and 15%. Low performers sit above 45%.
The interesting thing about this metric is that it doesn't have a linear relationship with speed. You might expect that teams deploying more frequently would have higher failure rates. The research shows the opposite. High-performing teams deploy more often and break things less. That's because frequent deployment enables smaller changes, which are inherently less risky and easier to validate.
If your change failure rate is high, look at test coverage, code review rigor, and whether your CI pipeline actually catches real issues. In my experience, a high change failure rate almost always traces back to one of these: tests that don't test meaningful behavior, code reviews that are rubber stamps, or a staging environment that doesn't resemble production.
Mean Time to Restore (MTTR)
When something breaks in production, how quickly do you recover?
MTTR measures the time from when a production incident is detected to when service is restored. This isn't about root cause analysis or post-mortems — it's about how fast you can get the system working again for your users.
High-performing teams recover in under an hour. Low performers take days to weeks.
This metric reveals the maturity of your incident response and your deployment infrastructure. Can you roll back a bad deploy in minutes? Do you have clear ownership of production issues? Can your team diagnose problems quickly, or does every incident turn into a multi-day investigation?
I've seen teams with excellent deployment frequency and lead time but terrible MTTR. They ship fast, but when things break, they panic. There's no runbook, no clear escalation path, and rollbacks are a manual process that nobody has practiced. MTTR is what exposes that gap.
Why These Four Metrics Work Together
The power of DORA metrics is that they're balanced. You can't game one without it showing up in another.
Push deployment frequency without improving your pipeline and practices? Change failure rate goes up. Optimize for low failure rate by adding heavy review gates? Lead time suffers. Focus entirely on speed and ignore operational maturity? MTTR exposes you.
This is why velocity alone is a terrible metric. A team can inflate velocity by gaming estimation. A team can close lots of tickets by working on easy things. But a team that deploys frequently, with short lead times, low failure rates, and fast recovery? That team is genuinely performing well. There's no way to fake all four simultaneously.
I think of DORA metrics as a health check for the entire delivery system: code, pipeline, process, and operational readiness. They don't tell you everything, but they tell you enough to know where to dig deeper.
How to Implement DORA Metrics Without Making Your Team Hate You
Here's where most organizations go wrong. They discover DORA metrics, get excited, and immediately turn them into targets. Dashboards go up, goals get set, and within a quarter the metrics are being gamed or resented.
Start With Measurement, Not Targets
When I introduce DORA metrics at a new organization, I spend the first 4–6 weeks just measuring. No targets. No judgment. Just visibility.
The goal of this phase is to establish a baseline and start conversations. When engineers see that the average lead time is 5 days and the biggest contributor is code review wait time, the improvement ideas come from the team — not from management. That's the dynamic you want.
Use Them for Team-Level Conversations, Not Individual Performance Reviews
DORA metrics are team-level metrics. The moment you tie them to individual performance reviews, you've destroyed their value. Engineers will start gaming deployments, avoiding risky changes, and sandbagging estimates to protect their numbers.
I use these metrics in team retrospectives, not in 1:1s. The question is always: "What's slowing us down?" not "Why isn't this number higher?"
Automate Collection
If engineers have to manually report on DORA metrics, the data will be unreliable and the process will feel like busywork. These metrics should come directly from your toolchain: CI/CD system for deployment frequency and lead time, incident management for MTTR, and deployment logs for change failure rate.
Most modern CI/CD platforms (GitHub Actions, GitLab CI, CircleCI) can surface this data natively or with minimal integration. If you're using something like LinearB, Sleuth, or Faros AI, the collection is essentially automatic.
Focus on One Metric at a Time
Don't try to improve all four simultaneously. Pick the one that's the biggest bottleneck and focus on that.
In my experience, the usual order is: lead time first (because the bottlenecks are often low-hanging fruit), then deployment frequency (because improving lead time usually enables this naturally), then change failure rate (because shipping more frequently surfaces quality gaps), and finally MTTR (because it requires investment in operational maturity that's separate from the delivery pipeline).
Benchmarks: Where Does Your Team Stand?
The DORA research categorizes teams into four performance levels. Here's a rough summary.
Elite performers deploy on demand (multiple times per day), with lead time under one hour, change failure rate between 0–15%, and time to restore service in under one hour.
High performers deploy between once per day and once per week, with lead time between one day and one week, change failure rate of 16–30%, and time to restore under one day.
Medium performers deploy between once per week and once per month, with lead time between one week and one month, change failure rate of 16–30%, and time to restore under one day.
Low performers deploy less than once per month, with lead time between one month and six months, change failure rate of 16–30% (but often with much longer remediation), and time to restore between one week and one month.
Most teams I work with fall somewhere between medium and high performers. The biggest lever for moving from medium to high is almost always lead time — specifically, removing unnecessary wait states in the delivery pipeline.
What DORA Metrics Don't Tell You
These metrics are a powerful signal, but they're not the complete picture. There are things that matter deeply that DORA doesn't capture.
Developer experience. A team can have great DORA numbers and still be miserable. If engineers are burning out to hit those numbers, the metrics are misleading.
Business impact. DORA measures delivery performance, not whether you're building the right things. A team that ships the wrong features quickly is still failing.
Code quality and long-term maintainability. A team can maintain low change failure rates while accumulating technical debt that will slow them down in 12 months.
This is why I always pair DORA metrics with qualitative signals: developer satisfaction surveys, voluntary attrition, time-to-productivity for new hires, and how often engineers describe their codebase as something they're proud of. Numbers and vibes, together.
The Real Value Is in the Trends
I want to leave you with one more observation. The absolute numbers matter less than the direction.
A team with a 4-day lead time that's trending toward 2 days is in a better position than a team with a 1-day lead time that's creeping toward 3. The trend tells you whether the team is improving its practices or slowly accumulating friction.
Review DORA metrics monthly. Look at 90-day rolling averages, not weekly snapshots. Weekly variation is noise. Monthly trends are signal. And when a metric changes significantly — in either direction — treat it as a prompt for investigation, not a verdict.
DORA metrics aren't a scorecard. They're a diagnostic tool. Used well, they surface the systemic issues that are hardest to see from inside the day-to-day work. Used poorly, they become another set of numbers that leadership watches and engineers resent.
The difference, as always, comes down to how you use them.
I'm an engineering leader and consultant who helps teams build better delivery practices, scale their organizations, and measure what actually matters. If your team is struggling with delivery performance or you're not sure where to start with engineering metrics, let's talk
Hit like if you enjoyed this post!