What’s Happening at SRE From Home
After a brief postponement, Catchpoint’s “SRE from Home” live event is fast coming up. Thursday, July 23rd, join us between 12- 4:20pm for a series of talks polling, and panels, to explore how Site Reliability Engineers are adapting to ‘all-remote’ operations and what we can learn from each other!
“We know SREs are being relied on to be even more resilient than ever right now, and from their homes”, says Elena Mendis, Manager of Events and Field Marketing at Catchpoint. “This community event is an opportunity to get together online, gain some inspiration from a killer line-up of speakers, and ask questions of each other about best practices and new frameworks within the current reality of all-remote.”
Working with our amazing partners, Gremlin, Packet, LaunchDarkly, NS1, Blameless, and Honeycomb, we’ve pulled together a terrific line-up of speakers. More details below!
Welcome (12 PM)
Peter Saulitis, Manager, Brand Marketing at Catchpoint, welcomes everyone to SREfh.
Yes, You Can Improve Your Team’s Wellness (12:05-12:35 PM)
Most talks about wellness focus on fortifying the people, but that’s only part of the story: you need to improve your sources of toil, or else you’re only paying attention to the symptoms. This talk breaks down the science behind stress and burnout, and how you can apply that to create an on-call process that supports your team’s wellness.
Jaime Woo, Co-founder at Incident Labs
Jaime began his career as a molecular biologist before following his passion for communications, working at DigitalOcean, Riot Games, and Shopify, where he launched the engineering communications function. He co-founded Incident Labs: check out Ovvy Insights, which helps provide teams the right data to improve incident response and return hours for planned work. He has spent two years learning about mental health and mindfulness. He is also an avid lover of dumplings.
DevOps Parenting (12:35-12:50 PM)
There’s no such thing as a staging environment when it comes to parenting. Every decision you make is in production. In this talk, Dawn will share her experiences adapting to parenting, working, and schooling and how she incorporates DevOps principles into her parenting style.
Dawn Parzych, Developer Advocate at LaunchDarkly
Dawn is a Developer Advocate at LaunchDarkly where she uses her storytelling prowess to write and speak about the intersection of technology and psychology. She enjoys helping people be more successful at work and in life. She makes technical information accessible avoiding buzzwords and jargon whenever possible. Dawn has spoken at DevOpsDays, Velocity, Interop, and Monitorama. Her articles have appeared in numerous technical publications. In her free time, she serves as an organizer for Write/Speak/Code, the Seattle DevOps Meetup, and is on the organizing committee for DevOpsDays Seattle.
OK, so you are not Google. What should SRE mean for your organization? (12:50-1:15 PM)
SRE, which is SRE as defined by Google, is not applicable to most organizations. Organizations need to take the thought process and culture behind Google’s SRE and adapt it just enough to make it suitable and viable for their organization’s business needs. As I see it today, large enterprises are mostly failing at doing this. They are either attempting to adopt SRE in its purest form, not realizing they are not Google, or totally changing (corrupting) it to suit how they do things, how they have always done things, to their broken culture, hence making what they call SRE, SRE in name only.
This session will delve into the underlying philosophy behind SRE and present practical approaches to adapt and adopt SRE in the enterprise.
Sanjeev Sharma, Principal Analyst at Accelerated Strategies
Sanjeev Sharma is an internationally known DevOps and Cloud Transformation, and Data Modernization thought leader, technology executive, and author. Sanjeev’s industry experience includes tenures as CTO, Technical Executive, and Cloud Architect leader. As a former IBM Distinguished Engineer, Sanjeev was recognized at the highest levels of IBM’s core of technical leaders. He is currently a Principal Analyst at Accelerated Strategies. Sanjeev provides leadership to drive the adoption of cutting-edge solutions, architectures and strategies for DevOps and Cloud transformations and advises C-level and senior technical executives leading these transformations. Sanjeev published his 2nd bestseller book ‘The DevOps Adoption Playbook’ in 2017. He regularly blogs and podcasts on DevOps, Cloud, and Data Modernization on his popular blog http://sdarchitect.blog
THE PANDEMIC BRIEF: Assuring Essential Services (1:15-1:35 PM)
Henri Helvetica, Freelance Developer
Henri is a freelance developer who has turned his interests to a potpourri of performance engineering with pinches of user experience. When not reading the deluge of daily research docs and case studies, or indiscriminately auditing sites in devtools, Henri can be found contributing back to the community, co-programming meetups including the Toronto Web Performance Group or volunteering his time for lunch and learns at various bootcamps.
Otherwise, he’s tooling with music production software or with near certainty training and focusing on running the fastest 5k possible.
BREAK & NETWORKING (1:35-2:30 PM)
Emerging from Burnout (2:30-2:45 PM)
Amy Tobey, Staff SRE at Blameless
Amy has worked in web operations for 20 years at companies of every size, touching everything from kernel code to user interfaces. When she’s not working, she can usually be found around her home in San Jose, caring for her family, practicing piano, or running slowly in the sun.
Live Panel – Ask an SRE (2:50pm-3:40 PM)
Our live panel is an opportunity to have a live discussion with four expert SREs who will discuss how we are adapting to “all-remote” operations and what we can learn from each other about adapting our processes and communication to unfamiliar team environments. Questions won’t only be about “remote” and “working from home”. We want to answer any pressing questions the community has around SRE as a practice, from resiliency to testing.
Here’s a sample of some the great questions we’ve received from SREs in our survey that we may ask the panel:
- How do teams manage asynchronous communication vs. synchronous collaboration during work from home situations to minimize meeting fatigue?
- How do you manage the incident management flow in an agile way?
- What engineering tasks must happen on-site (i.e. not from home) in software development?
- What are some good resources for planning a home office?
- We have no dashboards on TVs, and still, everything stands. What does this say about dashboards on TVs?
MODERATORS
Holly Allen is the head of reliability at Slack, with SRE, Monitoring, and Resilience Engineering in her portfolio. She is tireless in her efforts to make Slack the software reliable and scalable, and Slack the company a delightful place to work. Prior to Slack Holly worked at startups, DreamWorks Animation, and was Director of Engineering at 18F, a civic tech startup in the US government.
Tony is a 25-year Internet industry veteran who has served in various Network Engineering and Operation leadership roles, including Google and DoubleClick. Tony spearheads the management and operations of all Catchpoint’s monitoring data centers, supporting Catchpoint’s expanding corporate strategy, delivering stable, secure, and reliable operations.
Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 16+ years of experience. She is an advocate at Honeycomb for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.
Lex Neva is interested in all things related to running large, massively multiuser online services. He has years of Systems Engineering, tinkering, and troubleshooting experience and perhaps loves incident response more than he ought to. He’s previously worked for Linden Lab, DeviantArt, and Heroku and currently works as an SRE at Fastly helping to make sure the Internet keeps running.
Maira is an Application Engineer at Autodesk, based in Novi Michigan. She is obsessed with learning, but especially with the learning process that accompanies on-boarding monitoring concepts for better site/service Performance and Availability. She has dedicated her past years to site reliability, working with different Synthetic and RUM monitoring tools.
Maintaining Mean-Time-to-Joy: Managing a Global Incident at Netflix (3:40-4:15 PM)
J. Paul Reed began his career in the trenches as a build/release and operations engineer. After launching a successful consulting firm, he now spends his days as a Senior Applied Resilience Engineer on Netflix’s Critical Operations & Reliability Engineering (CORE) team, focusing on incident analysis, systemic risk identification and mitigation, applied Resilience Engineering, and human factors expressed in the streaming leader’s various sociotechnical systems.
Tim is a Site Reliability Engineer at Netflix, working on the team responsible for the reliability of the Streaming Platform. Prior to becoming an SRE at Netflix, he worked at startups in roles focused on the operation, reliability, and security of their applications and infrastructure; as well as assuming the commander role in active security incidents. While he has primarily relinquished security responsibilities in his current position, it is still an area he is deeply passionate about and is a focus of the work he does.
Toast (4:15-4:20 PM)
Grab your favorite beverage and join us for a toast to wrap up SREfh and look ahead to the future!
Register Here!
If you haven’t yet grabbed your space at SREfh, there’s still time. Sign up here and get your swag pack! If you’re not sure of your availability, feel free to sign up regardless as we will send out a recording of the sessions via email afterward.