
Senior/Staff Site Reliability Engineer
Mochi Health’s mission is to be the discovery layer of healthcare. We are building a platform that makes it easier for patients to find the right providers, access the right medications, and take control of their health with transparency and trust.
Over the past few years, we have experienced rapid growth by combining operational excellence, clinical expertise, and innovative technology to deliver care that is more human, intuitive, and effective. From pharmacy pricing transparency and personalized medication management, to long-term medical record access and community-based chronic illness support, Mochi is creating a new model of care that empowers patients, providers, and pharmacies alike.
We believe the future of healthcare is personal, and we are building the technology to power it. At Mochi Health, you will join a team that values inclusivity, collaboration, and bold thinking, and you will have the opportunity to do the most meaningful work of your career.
$250,000 - $300,000
Full-time / Onsite (5 days/week)
About The Role
We’re looking for a Senior/Staff Site Reliability Engineer to build Mochi’s AI-driven APM and incident management system that alert and page, but learns. This is a foundational role at the intersection of SRE, platform engineering, and applied AI: you’ll design the feedback loops (human-in-the-loop / RLHF-style), guardrails, and automation that let our reliability posture improve over time.
You’ll own the systems and workflows that turn incidents into intelligence: automated triage, root cause analysis, remediation, and bug-fix proposals (PRs, test runs, staged rollouts) when issues are code-level.
If you’re excited by the idea of building a self-improving SRE “copilot”, this job is for you.
What You’ll Do
-
Build an AI-driven SRE platform that ingests telemetry (logs/metrics/traces), deploy events, and incident artifacts to detect anomalies, summarize failures, and propose mitigations.
-
Design a human-in-the-loop learning loop (RLHF-style) so the system gets better with every incident: capturing decisions, outcomes, and postmortems into training/evaluation data.
-
Create safe auto-remediation capabilities: runbook execution, automated rollbacks, feature-flag actions with strong guardrails, auditability, and progressive rollout controls.
-
Build tooling that can propose bug fixes: generate well-scoped PRs, run tests, support canary releases—with clear handoff and approval flows.
-
Define and operationalize SLOs/SLIs and error budgets for critical user journeys (patient onboarding, provider workflows, pharmacy fulfillment, billing, etc.).
-
Level up observability end-to-end: alert quality, dashboarding, tracing standards, and “unknown unknown” detection.
-
Lead incident response excellence: on-call improvements, incident command, blameless postmortems, and driving systemic fixes that reduce repeat failures.
-
Partner with product + engineering teams to reduce toil and improve reliability via better architecture, load testing, resilience testing, and capacity planning.
-
Establish reliability standards and patterns across the org (golden signals, deployment safety, dependency management, fault isolation).
Who You Are
-
7+ years in SRE / platform / infrastructure engineering, with a track record of owning production reliability at scale.
-
Deep experience operating Kubernetes-based systems in the cloud (AWS preferred), including networking, autoscaling, rollout strategies, and incident mitigation.
-
Strong software engineering ability—you can debug production issues across services, understand failure modes, and contribute code when needed (Python/Go/TypeScript are all great).
-
Expert-level grasp of observability and incident response: metrics, logs, tracing, alerting design, and postmortem-driven improvements.
-
Comfortable building automation that touches production—and obsessive about safety: least-privilege access, audit logs, approvals, canaries, and rollback.
-
Excited by AI tooling and agentic workflows (or already experienced): LLM-based triage/summarization, retrieval over runbooks/postmortems, evaluation harnesses, and feedback loops.
-
Strong communication and collaboration skills—you can lead during incidents, write clearly, and align teams around reliability priorities.
-
Startup mindset: you move fast, take end-to-end ownership, and love turning ambiguity into shipped systems.
-
Excited to work in-person with our team in San Francisco.
Nice to Haves
-
Experience building LLM-powered internal tools (incident copilots, automated debugging, RAG over docs/runbooks) and/or RLHF-style feedback pipelines.
-
Familiarity with security and compliance in regulated environments (HIPAA, SOC 2, audit requirements, PHI handling).
-
Experience with chaos engineering / game days and resilience testing programs.
-
Experience building CI/CD guardrails and progressive delivery systems (canaries, automated verification, safe rollout policies).
-
Prior work on distributed tracing standards (OpenTelemetry), service meshes, or large-scale event-driven systems.
Our Core Technologies Include: AWS, Kubernetes, Postgres, Redis, TypeScript/Node.js, Python, SQL (plus whatever we need to build a world-class reliability platform)
Life at Mochi
At Mochi, we believe your best work happens when you feel your best—so we’ve designed an environment that fuels your creativity, supports your growth, and makes every day exciting.
🥗 Daily Meals and Espresso Bar - Breakfast, lunch, and dinner every weekday. Our on-site barista keeps the espresso and matcha flowing all day
💰 Pre-Tax Commuter Perks - Save on transit and parking through pre-tax commuter benefits
💸 Top-of-Market Compensation - We offer competitive salaries along with generous equity packages so you can share in the success you help create
💣 Profitable and Rapid Growth - We’re scaling fast, with financial discipline and long-term vision. No VC constraints, just sustainable momentum and smart decisions
🚀 High-Impact Work - Help shape the future of digital healthcare. Your work here directly improves lives and scales nationwide
👩💻 World-Class Team - Collaborate with teammates from Tesla, SpaceX, Citadel, Harvard, IIT, and more. We value excellence, humility, and empathy in equal measure
✨ Comprehensive Benefits - 401(k) with match, generous time off, life insurance, and high-quality medical, dental, and vision plans
💊 Mochi Health Membership – We cover your monthly subscription fee so you can experience the same care as our patients (medications not included)
🌴 Time to Recharge – Enjoy unlimited PTO, generous company holidays, and true flexibility. We trust you to take the time you need to rest, reset, and thrive
🧘 Wellness First – From weekly mindfulness sessions to group workouts and fitness perks, your physical and mental health are top priority
🎉 Team Socials and Community - We make time to connect through regular socials, happy hours, and spontaneous events. Our stocked kitchen doesn’t hurt either
📍 Downtown SF HQ - Our San Francisco office is just steps from BART, Muni, and great food. It’s designed for deep work and casual collaboration
--
The base salary for this full-time position ranges from $250,000 to $300,000, in addition to equity and benefits. The salary range listed in each job posting represents the minimum and maximum targets for new hire salaries across all locations. Actual compensation within this range is determined by various factors, such as job-related skills, experience, relevant education or training, and location.
#LI-Onsite #LI-AK1
Workplace Policy
Mochi Health is an in-person company based in San Francisco, CA. Our team works together in person five days a week to foster collaboration, innovation, and strong connections. We believe that face-to-face interaction builds a culture of excellence and allows us to deliver the best outcomes for the patients and providers we serve.
For office-based roles, the standard schedule is Monday through Friday, 9:00 a.m. to 7:00 p.m. Actual hours may vary depending on business needs and role responsibilities. All employees receive meal and rest breaks in accordance with applicable state and local laws.
For designated remote roles, this in-person policy does not apply.
Equal Opportunity
Mochi Health is an Equal Opportunity Employer. We make all employment decisions based solely on merit. We provide equal employment opportunities to all applicants and employees without discrimination on the basis of race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, disability status, or any other applicable legally protected characteristic. We prohibit any form of discrimination or harassment. This policy applies to all terms and conditions of employment, including hiring.
Candidate Privacy Notice
Please review Mochi Health's Candidate Privacy Notice here.
Accommodations
Mochi Health complies with the Americans with Disabilities Act (ADA), as amended by the ADA Amendments Act, and all applicable state or local laws. We will reasonably accommodate qualified individuals with a disability during the application process and throughout employment as required by law.
If you need any assistance or accommodations due to a disability, please contact us at hr@joinmochi.com.
Create a Job Alert
Interested in building your career at Mochi Health? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field