Staff Site Reliability Engineer
Little more about the team:
Honeycomb’s Site Reliability Engineering (SRE) team works at the intersection of infrastructure, developer experience, and organizational enablement. We lead technically complex, cross-team projects that improve reliability, scale systems, and make life easier for engineering teams. We’re trusted across the company to set direction, solve ambiguous problems, and build processes that run smoothly. Our work spans AWS infrastructure, Kubernetes and Helm, Terraform, and other tools, aligned with the sociotechnical needs of scaling a fast-growing company. We’re a collaborative, diverse team that values experimentation, data-driven decisions, and maintaining a safe environment for healthy debate and innovation.
What you’ll do in the role:
- Lead technically complex, cross-functional projects to help Honeycomb scale.
- Help manage and grow our vendor relationships (like with AWS) - including strategic negotiations - and help others do the same.
- Build organizational trust through transparent communication with engineering leadership and stakeholders.
- Shape how the SRE team engages with the rest of Honeycomb (program management, embedding, education).
- Drive technical improvements in AWS, Kubernetes, Helm, and Terraform usage.
- Contribute to platform strategy and vision.
- Improve and refine processes to ensure smooth operations and reduce friction for engineering teams.
- Act and train others as an Incident Commander, and participate in the Platform on-call rotation.
- Improve the internal observability of our systems through technical projects and enablement.
- Help the organization navigate tradeoffs between reliability and its other goals and priorities.
- Optional: act as an external ambassador through blog posts, conference talks, and presentations with support from our DevRel team.
What You’ll Bring:
- Strong experience in leading cross-team or organizational-level technical initiatives.
- Strong experience in AWS and Kubernetes.
- Solid Helm, Terraform, and CI/CD skills.
- Experience with software engineering (Golang is a plus).
- Exceptional communication skills, with the ability to manage up, negotiate, influence, and bring stakeholders along.
- A balance of technical depth and organizational perspective, with the ability to scale systems and processes.
- Familiarity with observability concepts (SLOs, instrumentation) and data-driven decision making.
- Experience with incident and change management.
- Comfort operating in ambiguity, with a bias for action and experimentation.
- Interest in both the technical and human sides of reliability engineering.
- A curiosity to learn how people and systems work, and the willingness to make them partners in your initiatives.
Base Salary based on level of experience
$210,485 - $236,500 USD
What you'll get when you join the Hive:
- A stake in our success - generous equity with employee-friendly stock program
- It’s not about how strong of a negotiator you are - our pay is based on transparent levels relative to experience
- Time to recharge - Unlimited PTO and paid sabbatical
- A remote-first mindset and culture (really!)
- Home office, co-working, and internet stipend
- 100% employee/75% for dependents coverage for all benefits
- Up to 16 weeks of paid parental leave, regardless of path to parenthood
- Annual development allowance
- And much more...
Create a Job Alert
Interested in building your career at Honeycomb.io? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field