
Senior Site Reliability Engineer
At Tactile Medical, we specialize in developing at-home therapy devices to treat lymphedema, chronic venous insufficiency and respiratory illnesses.
The Senior Site Reliability Engineer (SRE) is responsible for ensuring reliability, observability, and operational excellence across Tactile Medical’s digital products and internal platforms. This includes the digital therapy ecosystem (mobile apps, React portals, clinician tools), the .NET API layer, CosmosDB backed data platforms, WooCommerce commerce components, Azure Service Bus integrations, and the cloud infrastructure that supports regulated medical device workflows. This role sits at the intersection of DevOps, cloud operations, compliance, and product support, ensuring that production systems meet uptime expectations and regulatory requirements while enabling rapid iteration for the Digital Solutions and Software Engineering teams. The SRE will help establish and mature the operational reliability strategy — including incident management, performance monitoring, infrastructure automation, and continuous improvement — with a specific focus on supporting a regulated medical device + digital health environment. The systems managed by this role directly impact patients and device connectivity, making reliability and quality essential to business continuity and patient outcomes.
Accountabilities & Responsibilities
Production Ownership & Incident Response:
- Serve as the operational owner for the production environment supporting Tactile’s digital solutions.
- Lead incident response processes, coordinating with Digital, IT, Marketing, Operations, and Product Support teams.
- Participate in on-call rotation and oversee escalation pathways for Tier 2 & 3 technical support.
- Ensure post incident documentation aligns with regulated quality expectations (e.g., CAPA inputs, RCA documentation in accordance with ISO 13485 / QMS processes).
Observability, Monitoring & Data Quality:
- Build and maintain end to end observability across: Native and hybrid mobile applications, Patient, partner and internal portals, Device connectivity & data ingestion services, Payment and WooCommerce commerce flows, .NET backend services and Azure integrations
- Build dashboards and alerts in Datadog, Azure Monitor (or preferred tools) to detect anomalies.
- Conduct database level investigations for usage analytics, reliability metrics, and management level reporting (patient usage trends, connectivity patterns, error rates).
Automation, Infrastructure & Cloud Operations:
- Lead infrastructure automation using Terraform and Azure DevOps
- Automate monitoring configuration, system audits, log standards, and compliance-related reporting.
- Collaborate with IT Security and Compliance to maintain operational readiness for HIPAA and internal QMS audits.
- Support the transition of legacy components toward more scalable and modern cloud patterns where needed.
Reliability Engineering & Development Partnership:
- Define and maintain SLOs, SLIs, and reliability metrics that balance innovation velocity with platform stability.
- Work closely with developers to embed reliability into CI/CD, code quality, test coverage, and deployment patterns.
- Lead post incident reviews and manage the continuous reliability improvement backlog.
- Offer guidance on resilient architecture decisions, retry patterns, failure modes, and performant API design.
Cross-Functional Collaboration:
- Act as a reliability subject matter expert across Digital Solutions, Product, Engineering, Product Support, and Security.
- Ensure production change control aligns with quality and regulatory expectations.
- Support compliance documentation for software releases, infrastructure changes, and security controls.
Qualifications
Education & Experience
Required:
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- Master’s degree or certifications (e.g., Azure, Kubernetes, SRE) are a plus.
- 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
- Proven experience in regulated industries (healthcare, finance, etc.) is highly desirable.
- Hands-on experience with: Cloud platforms (AWS, GCP, Azure), Container orchestration (Kubernetes, Nomad), Monitoring tools (Prometheus, Grafana, Datadog), CI/CD pipelines and automation frameworks
Knowledge & Skills
- Strong understanding of CI/CD pipelines, version control workflows, and automated deployment practices, including the ability to build and maintain secure, reliable pipelines in Azure DevOps or GitHub Actions.
- Knowledge of how APIs work, their endpoints, request methods, authentication mechanisms, response formats, error handling, and rate limiting
- Familiarity with web services technologies, including HTTP/HTTPS protocols, JSON, XML, and data serialization/deserialization methods
- Understanding of various data formats and protocols used in API communication, such as JSON, XML, CSV, and protocol buffers. Knowledge of data transformation techniques to convert data between different formats
- Experience with integration platforms, preferably Dell Boomi, and ability to use these platforms to build, deploy, and manage integrations between systems
- Basic understanding of databases and SQL (Structured Query Language) and ability to query databases, retrieve data, and perform data manipulations as part of integration processes
- Understanding of security principles and best practices for API integration, including authentication, authorization, encryption, and data privacy regulations (e.g., GDPR, HIPAA)
- Goal oriented with solid planning and time management skills
- Excellent communication, follow through, attention to detail, documentation and collaboration skills
- Ability to think critically and use strong problem-solving skills
- A team-oriented personality with the initiative to accomplish goals
- Able to simultaneously manage many details and priorities
Competencies
- Technical Excellence — Drives high quality, scalable, reliable systems.
- Operational Rigor — Ensures production processes meet QMS and regulatory expectations.
- Collaboration — Works fluidly across engineering, product, and operational teams.
- Critical Thinking — Anticipates reliability risks and mitigates proactively.
- Leadership — Influences without authority; leads incident response and remediation.
- Adaptability — Comfortable operating in a fast-paced environment with evolving technology stacks.
Our total compensation package includes medical, dental and vision benefits, retirement benefits, employee stock purchase plan, paid time off, parental leave, family medical leave, volunteer time off and additional leave programs, life insurance, disability coverage, and other life and work wellness benefits and discounts. Benefits may be subject to generally applicable eligibility, waiting period, contributions, and other requirements and conditions.
Below is the starting salary or hourly range for this position, although offers may differ based on the candidate's location, job-specific knowledge, skills and experience.
US Pay Range
$93,600 - $131,040 USD
To learn more about our Privacy Statement follow this link - https://tactilemedical.com/privacy-statement/
To learn more about our California Privacy Notice follow this link - https://tactilemedical.com/california-privacy-notice/
Create a Job Alert
Interested in building your career at Tactile Medical? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field