Your First SRE Hire Is Not Your Best Firefighter

There is a moment, somewhere around your fifth or sixth serious outage, when someone in leadership says it out loud: "We need an SRE." Reliability has stopped being a background concern and started costing you customers, sleep, and credibility. So you decide to fix it with a hire.
Then comes the second decision, the one that quietly determines whether any of this works. Who?
And almost everyone reaches for the same person. The engineer who always shows up when production is on fire. The one who knows where the bodies are buried, who can SSH into the right box at 2am and bring the system back from the dead. The hero. You promote them, change their title to "SRE," and tell yourself reliability is now handled.
I have watched this fail more times than I can count. And I have done it myself, which is how I know how seductive it is. The problem is that you have just taken your best firefighter and asked them to become a fire marshal, and those are not the same job. One puts out fires. The other makes sure they never start.
What firefighting actually selects for
Let's be precise about what your hero is good at, because they are genuinely good at something.
Great firefighters are fast under pressure. They hold enormous amounts of system context in their head. They pattern-match symptoms to causes quicker than anyone else, often without being able to explain how. They are calm when the dashboards are red and everyone else is panicking. These are real, valuable skills.
But notice what they are. They are skills for operating in the failure state. The whole talent is built around the moment things have already gone wrong. The faster and more heroic someone is at recovery, the more their value is tied to failures continuing to happen.
This is the trap. Firefighting rewards the person for whom the system is most dependent on them. The engineer who can fix anything is also, very often, the engineer who built things only they understand. They are the single point of failure wearing a cape. Their indispensability during an outage is the flip side of the fragility they have quietly created.
I once worked with a team where one engineer handled roughly eighty per cent of all production incidents. Leadership loved him. He was clearly the most reliable person on the team. Except the systems were not reliable at all. They failed constantly. He was simply very good at the recovery. Take him out of the room and the mean time to recovery tripled, because nobody else had ever been allowed to learn.
That is not a reliability function. That is a dependency.
What the SRE role actually requires
Site reliability engineering, done properly, is mostly about preventing the thing your firefighter loves. The day-to-day is not heroics. It is the patient, unglamorous work of making heroics unnecessary.

Here is what the role genuinely demands, and almost none of it overlaps with being a good firefighter.
Systems thinking. Not "where is this bug" but "what class of bug is this, and what structural change removes the whole class." A good SRE looks at an incident and sees a category. They are bored by the individual fire and interested in the conditions that produced it.
An automation mindset that borders on laziness. The best SREs are constitutionally unwilling to do the same manual task twice. Where a firefighter feels a quiet pride in being the one who knows the runbook, an SRE feels irritation that a human has to run the runbook at all. They want to automate themselves out of the loop. Crucially, they are happy to make themselves unnecessary. That is the exact opposite instinct to the hero.
Comfort saying no to launches. This is the one most people underestimate. A real SRE will stand in front of a product team that wants to ship on Friday and say, "Not until there's a rollback path and an alert on this." That requires a spine, and it requires not needing to be liked in that moment.
Influence without authority. Your first SRE will not own the services they are trying to make reliable. Other teams do. So the entire job is convincing engineers who do not report to you to change how they build, test, and deploy. That is a political and persuasive skill far more than a technical one.
Product and engineering empathy. A good SRE understands why the product team wants to move fast and does not treat reliability as a holy war against shipping. They find the path that lets the business move quickly without setting itself on fire. They translate between "we need this feature" and "we need to sleep."
Read that list again and ask honestly: does it describe your best firefighter? Occasionally, yes. Usually, no. The firefighter is selected for the opposite of half of these.
Why your best senior engineer is often the wrong choice
I want to be careful here, because this is not about your senior engineer being bad. They are not. They are being asked to do a job that runs against their grain and against their incentives.
They love the firefighting. This is the quiet one nobody says out loud. For a lot of strong engineers, the outage is the best part of the week. It is where they are visibly the hero, where the adrenaline is high and the gratitude is immediate. Ask that person to spend their days writing alerting rules and chasing other teams to add health checks, and watch the energy drain out of them. You have taken away the part of the job they actually wanted. Worse, you have given them a subtle incentive for fires to keep happening, because fires are when they shine.
They create single points of failure, including themselves. The senior engineer who knows everything has usually built a system that requires knowing everything. Making them the SRE entrenches that. Instead of distributing knowledge, you have now formalised the bottleneck and given it a job title.
They do not automate themselves out of a job. Not out of malice. It is just hard to systematically dismantle the thing that makes you valuable and admired. The runbook in their head is their status. Asking them to turn it into a script that anyone can run is asking them to give away the source of their importance. Most people, consciously or not, will not.
There is also a simple seniority problem. Your most senior firefighter has spent years accumulating tribal knowledge of your specific systems. That knowledge is precious during an incident and almost worthless as a foundation for reliability practice, which is about general patterns, not your particular pile of legacy code. You are spending your most expensive context on the wrong problem.
Why hiring externally often works better
The counterintuitive move is to bring in someone who knows nothing about your systems.
It feels wrong. How can someone who has never seen your stack make it more reliable? But that lack of context is a feature. An external SRE has not absorbed your team's learned helplessness. They have not made peace with the deploy process that everyone internally has stopped questioning because "that's just how it is." They walk in and ask why the database has no replica, why there are no SLOs, why a deploy takes a manual checklist of fourteen steps. The obviousness of these questions is exactly what insiders have gone blind to.
An external hire also arrives with patterns from other organisations. They have seen what good looks like somewhere else. Your internal hero has only ever seen your systems, so their idea of "better" is anchored to your current ceiling. The outsider's ceiling is higher because they have stood under a taller one.
And critically, an external SRE has no political investment in the current architecture. They did not build the fragile thing, so they can say it is fragile without it being a confession. They can say no to a launch without years of relationships making that awkward. They start with the influence-without-authority muscle precisely because they have no other authority to fall back on.
This does not mean never promote internally. It means the person you promote should be chosen for the SRE traits, not for being the best at the job you are trying to eliminate. Sometimes that is a quieter mid-level engineer who keeps writing the little scripts nobody asked for and keeps gently pushing for better monitoring. That person is showing you who they are. Look for them.
What to screen for
When I am hiring or selecting for this role, I am not primarily testing systems design knowledge, though it has to be there. I am testing for temperament. A few things I deliberately probe.

- "Tell me about a time you made yourself redundant." A good SRE lights up at this. They have a story about automating away their own on-call pain or documenting themselves out of a bottleneck. A firefighter struggles, because their best stories are about being indispensable.
- "Tell me about a launch you blocked or delayed." I want to hear that they have said no, that they sat with the discomfort, and ideally that they found a way to let the team ship safely rather than just standing in the doorway. If they have never pushed back on a launch, they have never really done the job.
- How they talk about incidents. Do they tell war stories about the heroic fix, or do they talk about the postmortem, the root cause, and what structural change stopped it recurring? The first is a firefighter. The second is an SRE. Listen for which part of the story they get excited about.
- How they handle "the other team won't cooperate." The right answer is never "I'd escalate" as a first move. It is about understanding the other team's pressures, making the reliable path the easy path, and winning the argument with empathy rather than authority.
Screen for the boring person who is quietly obsessed with making things not break. Not the exciting person who is brilliant when things do.
Be honest about the politics
There is a hard conversation buried in all of this, and pretending otherwise helps no one.
If you do not promote your hero, they may be hurt. They have carried your production stability on their back and now you are hiring someone over the top of that. That stings, and it is reasonable for it to sting. You owe them a real conversation about why, and ideally a path that values what they are genuinely great at. Some of the best deep engineers are happiest as principal individual contributors who get pulled into the hardest problems, not as the owner of a reliability function. Frame it as that, honestly, not as a consolation prize.
And if you do bring in an external SRE, protect them. They will arrive and start saying uncomfortable things about systems your most senior people built and your leadership signed off on. They will have no relationship capital and a stack of unpopular truths. Without explicit air cover from someone senior, they will be ground down within a quarter and either leave or go quiet, and a quiet SRE is useless. Their job is to say no and to insist on standards. If the org's reflex is to side with whoever wants to ship, you have hired a scapegoat, not a reliability function.
The bottom line
The instinct to crown your best firefighter is so natural that you will feel slightly insane resisting it. Resist it anyway.
Reliability is not the absence of someone heroic to fix things. It is the absence of the need for one. The person who builds that is usually not the person who is best at the heroics, because the two roles pull in opposite directions. One is rewarded for the system depending on them. The other succeeds precisely when the system depends on no one.
So when you make your first SRE hire, do not ask who saves you during the fire. Ask who would quietly make the fire impossible, and then never get the credit because nothing went wrong.
That person is the one you want. They are just much harder to notice, because their best work is the work you never see.
Hit like if you enjoyed this post!
Keep reading
Error Budgets Are a Management Tool, Not an Engineering One
Most error budgets die quietly because engineers introduced them with no authority behind them. The number only matters when it changes what leadership does. Here is how to wire budget burn into roadmap decisions, exec reviews, and feature-freeze conversations so it actually has teeth.
June 09, 2026Support & SRESRE Org Design: Centralized, Embedded, or Platform?
Centralized, embedded, or platform SRE? Each model solves a different problem and breaks in a different way. Here is how to pick one, and how to migrate when you outgrow it.
June 05, 2026