Blog Post

The SRE Report 2023: Forecasts and the Current Economy

Published
February 28, 2023
#
 mins read
By 

in this blog post

Today’s guest blog is written by Keri Melich, SRE at Nobl9 who was one of the contributors to The SRE Report 2023.

As questions and challenges loom over the tech industry and the larger economy, now is a perfect time for us to take a step back and learn from the past. As reliability engineers, we regularly use Service Level Objectives (SLOs) to understand the performance, reliability, and trends of our systems to help inform and prioritize our decision making. At a more macro level, Catchpoint’s annual SRE Report helps us learn about the state of our field to help better guide the decision-making of our teams and management.

Let’s take a deeper look at some of these industry trends and consider how they might change as a result of the current volatile economy.

Learning from anomalies

Overall, I believe that in the next SRE Report, we will see a significant shift in our insights born out of changes we’re likely to witness to the upcoming 2023 working year, where problems that we thought were once solved are now creeping back up as a result of strained business decisions.  

Line graph showing self-reported toil measurements in 2020, 2021 and 2022 (Catchpoint/SRE Report)

Things like challenges with tool sprawl and the amount of time spent on toil which have recently been trending downward year to year could see a quick spike upward as teams revert back to in-house builds or more free-tier tools to satisfy budget constraints. These sorts of anomalous spikes will quickly come to define the next set of 2023-2024 SRE Report findings.

Understanding these expected anomalies and how they will affect our teams will ultimately help reliability practitioners make better decisions that hopefully will protect us from their consequences, such as an inevitable increase in toil. And just as we do when we monitor a service, we need to acknowledge and track these anomalies (where the data from The SRE Report is so valuable) to have effective retrospectives to review how well we prepared for them in advance.

Smaller teams are improving communication practices

As SREs, we are uniquely positioned to make or break a significant cultural shift identified in 2023’s SRE Report that should help us all with one sticking point that seems to hold steady year after year – communication. Although many of us are bearing the weight in some way or another of company cutbacks, the unexpected upside is perhaps that smaller teams will give us a better chance at tightening our bonds and improving our communication practices.  

If you asked me a year ago how we could improve our interdepartmental communication, I would’ve said we needed more DevOps tools that help each department, team, or individual see the larger picture of how their work affects the customer-facing product. While I still think that’s a very useful and viable option, and there are an increasing number of open-source tools that do just that, I’ve found we’re all already communicating more as we work through our experiences of the economic downturn. What may have started in crisis may in fact help us escape our stagnation.

Finding a forum to share or listen to stories of our mistakes and successes can also help unblock us in ways we couldn’t have possibly found on our own. Look out for the several conferences and meetups that help spread great ideas and stories of what works, like Learning from Incidents (LFI) , SLOconf and the upcoming SRECon.

The value of annual retrospectives

In some ways, 2023 is one large incident in the history of our industry. With the expected anomalies to come in next year's SRE Report findings, we can all benefit from taking the time to take part in next year’s survey (look out around the June timeframe for this!) and to hold a retrospective at the end of the past year:  

  • How has 2023 affected our teams?
  • What business decisions contributed to anomalies in our practices? What was the impact of those decisions on our teams?  
  • Looking back, would we have done anything differently in response?
  • What anomalies did we see in our practices compared to previous years and quarters?  
  • What did we do well this year despite our anomalies?  
  • Where did we get “lucky” this year, and what type of future improvements can we implement within our teams to ensure we stay “lucky?”  
  • How did our team compare with the industry averages benchmarked in the SRE Report?  

I encourage all of us to participate in a similar retrospective each year to help foster greater transparency and communication within our companies. Everyone from junior individual practitioners to executives will have valuable feedback to contribute. The best way to deal with any incident is to focus on our awareness of its past, present, and future. 2023 has already affected so many teams in countless ways; let’s not forget to learn from these changes.

Learn more

Get The SRE Report 2023 (no registration required).

Keep an eye out for a talk by Leo Vasiliou and Kurt Andersen, co-authors of The SRE Report 2023, at SRECon next month!

Today’s guest blog is written by Keri Melich, SRE at Nobl9 who was one of the contributors to The SRE Report 2023.

As questions and challenges loom over the tech industry and the larger economy, now is a perfect time for us to take a step back and learn from the past. As reliability engineers, we regularly use Service Level Objectives (SLOs) to understand the performance, reliability, and trends of our systems to help inform and prioritize our decision making. At a more macro level, Catchpoint’s annual SRE Report helps us learn about the state of our field to help better guide the decision-making of our teams and management.

Let’s take a deeper look at some of these industry trends and consider how they might change as a result of the current volatile economy.

Learning from anomalies

Overall, I believe that in the next SRE Report, we will see a significant shift in our insights born out of changes we’re likely to witness to the upcoming 2023 working year, where problems that we thought were once solved are now creeping back up as a result of strained business decisions.  

Line graph showing self-reported toil measurements in 2020, 2021 and 2022 (Catchpoint/SRE Report)

Things like challenges with tool sprawl and the amount of time spent on toil which have recently been trending downward year to year could see a quick spike upward as teams revert back to in-house builds or more free-tier tools to satisfy budget constraints. These sorts of anomalous spikes will quickly come to define the next set of 2023-2024 SRE Report findings.

Understanding these expected anomalies and how they will affect our teams will ultimately help reliability practitioners make better decisions that hopefully will protect us from their consequences, such as an inevitable increase in toil. And just as we do when we monitor a service, we need to acknowledge and track these anomalies (where the data from The SRE Report is so valuable) to have effective retrospectives to review how well we prepared for them in advance.

Smaller teams are improving communication practices

As SREs, we are uniquely positioned to make or break a significant cultural shift identified in 2023’s SRE Report that should help us all with one sticking point that seems to hold steady year after year – communication. Although many of us are bearing the weight in some way or another of company cutbacks, the unexpected upside is perhaps that smaller teams will give us a better chance at tightening our bonds and improving our communication practices.  

If you asked me a year ago how we could improve our interdepartmental communication, I would’ve said we needed more DevOps tools that help each department, team, or individual see the larger picture of how their work affects the customer-facing product. While I still think that’s a very useful and viable option, and there are an increasing number of open-source tools that do just that, I’ve found we’re all already communicating more as we work through our experiences of the economic downturn. What may have started in crisis may in fact help us escape our stagnation.

Finding a forum to share or listen to stories of our mistakes and successes can also help unblock us in ways we couldn’t have possibly found on our own. Look out for the several conferences and meetups that help spread great ideas and stories of what works, like Learning from Incidents (LFI) , SLOconf and the upcoming SRECon.

The value of annual retrospectives

In some ways, 2023 is one large incident in the history of our industry. With the expected anomalies to come in next year's SRE Report findings, we can all benefit from taking the time to take part in next year’s survey (look out around the June timeframe for this!) and to hold a retrospective at the end of the past year:  

  • How has 2023 affected our teams?
  • What business decisions contributed to anomalies in our practices? What was the impact of those decisions on our teams?  
  • Looking back, would we have done anything differently in response?
  • What anomalies did we see in our practices compared to previous years and quarters?  
  • What did we do well this year despite our anomalies?  
  • Where did we get “lucky” this year, and what type of future improvements can we implement within our teams to ensure we stay “lucky?”  
  • How did our team compare with the industry averages benchmarked in the SRE Report?  

I encourage all of us to participate in a similar retrospective each year to help foster greater transparency and communication within our companies. Everyone from junior individual practitioners to executives will have valuable feedback to contribute. The best way to deal with any incident is to focus on our awareness of its past, present, and future. 2023 has already affected so many teams in countless ways; let’s not forget to learn from these changes.

Learn more

Get The SRE Report 2023 (no registration required).

Keep an eye out for a talk by Leo Vasiliou and Kurt Andersen, co-authors of The SRE Report 2023, at SRECon next month!

This is some text inside of a div block.

You might also like

Blog post

Did Delta's slow web performance signal trouble before CrowdStrike?

Blog post

The hidden challenges of Internet Resilience: Key insights from 2024 report

Blog post

The curious case of Marriott and the untold impact of web performance on revenue