what is sre (to me)?

step 1, clarify your stance

what does SRE mean to me

Site Reliability Engineer (SRE), a role invented and popularized by Google with their books. These days companies of much smaller scale have SRE roles, sometimes as a new function, other times as an evolution of ops or production engineering. Just like the term DevOps, it means different things to different people, and it's important to clarify your position when speaking with people.

To me, the existence of an SRE role at a company is recognition from leadership that they either have or will soon have a reliability problem and want someone to address it, or that if they had a reliability problem, it would become very expensive very quick. What this team is empowered to do to fix or prevent those issues varies quite a bit depending on resources, team size, company culture and politics.

I think SRE is... a blend of operations with the willingness to develop solutions as needed, with a heavy dose of developer education. In exchange for not having product chase us for features, SRE is in a tug-of-war with product to ensure existing features are stable and scalable, with developers in the middle. We are here to try to get people to care about production, without being the only ones in the line of fire.

Also, reminds me of the post Conway's law on the (broken) incentive structure for SREs...