tags.new

Senior Site Reliability Engineer

Pune, India

About AppDirect

Become a digital, global citizen and enable the new generation of digital entrepreneurs around the world. AppDirect offers a subscription commerce platform to sell any product, through any channel, on any device - as a service. We power millions of subscriptions worldwide for organizations. We do this by our values-driven culture - one that enables you to Be Seen, Be Yourself, and Do Your Best Work.

About the DevOps Platform Team

Our mission is to provide a robust Internal Developer Platform to AppDirect’s engineering teams, which makes it easy, safe and fun to design, implement, release and maintain the world’s leading subscription commerce platform. We are proud to be core contributors and maintainers of AppDirect’s Software Development Lifecycle (SDLC), through close alignment with Reliability, Quality, Data, InfoSec, Cloud, and other technology leadership.

We enable DevOps culture through our self-service, automated CI/CD platform. Currently, teams are leveraging the platform to make more than 3000 code deliveries every month, to 700 applications, on AWS, Azure, and on-premise environments, while remaining ISO27001, SOC2 and PCI compliant. Our Datadog instrumentation allows teams to have clear insights, monitoring, and alerting, in order to maintain the availability of their experiences.

What you'll do and how you'll have an impact

  • Be the founding SRE for India within the DevOps Platform Team, establishing operating rhythms, guardrails, and best practices that raise reliability across hundreds of services and 30+ Kubernetes clusters.
  • Lead global incident management from India time zones: triage and drive resolution as Incident Commander, coordinate war rooms, manage stakeholder communications, and publish timely status page updates.
  • Maintain automations to enable on-call rotations, escalation policies, and incident workflows in PagerDuty, Datadog and Slack.
  • Create actionable runbooks to reduce MTTA/MTTR.
  • Define and operationalize SLIs/SLOs and error budgets with product and engineering teams; coach teams on using error budgets for release decisions and reliability trade-offs.
  • Create high-signal observability: instrument services, tune alerts to reduce noise, and build reliability dashboards in Datadog.
  • Own planned maintenance: plan and schedule maintenance windows, coordinate execution across teams and environments (AWS, Azure, on-prem), communicate broadly, and verify recovery with clear rollback plans.
  • Eliminate toil through automation: build ChatOps, status page automation, auto-remediation workflows, and runbooks-as-code; integrate incident and maintenance workflows into CI/CD (Jenkins, Argo).
  • Drive production readiness: define PRR checklists, bake reliability gates into pipelines, and improve deployment strategies (blue/green, progressive delivery).
  • Partner with DevOps Platform Engineers to harden the Internal Developer Platform and improve developer experience while maintaining compliance requirements (e.g., ISO27001, SOC2, PCI).
  • Lead blameless postmortems, track corrective actions, and maintain a reliability backlog that measurably improves availability, latency, and change success rate.
  • Mentor engineers and evangelize SRE principles through documentation, training, and a reliability guild/community of practice.

What we're looking for

  • 4+ years in SRE/Production Engineering/DevOps operating distributed systems and microservices at scale, including Kubernetes and containerized workloads.
  • Proven incident response leadership: incident triage and coordination, clear stakeholder/customer communications, status page management, and creation of robust runbooks.
  • Strong observability skills: ideally in Datadog (metrics, logs, traces, dashboards, monitors) or familiarity with Prometheus/Grafana, NewRelic, Dynatrace, or similar tools.
  • Expertise designing actionable alerts tied to SLIs/SLOs and managing error budgets.
  • Hands-on with CI/CD and release engineering: GitHub Actions, Argo (or similar), progressive delivery, feature flags, and safe rollout/rollback patterns.
  • Proficiency in at least one programming language (Golang preferred) plus Bash.
  • Ability to automate incident workflows, status page updates, and remediation tasks via APIs and ChatOps.
  • Solid foundations in Linux, networking, web protocols, DNS/TLS, load balancers/CDNs, and performance/capacity analysis.
  • Experience with databases and messaging systems is a plus.
  • Cloud fluency in Kubernetes, AWS and/or Azure – understanding of multi-tenant, multi-region, and hybrid/on-prem environments.
  • Security-minded and comfortable working within compliance frameworks.
  • Infrastructure as Code experience (Terraform, Ansible, Kubernetes or similar) and Git-centric workflows.
  • Excellent written and verbal communication skills. Able to translate technical detail into concise business updates under pressure.
  • Self-starter comfortable with ambiguity and a founding-role mindset: high ownership, bias for action, data-driven decision making, and a passion for eliminating toil.
  • Willingness to participate in on-call during India hours and collaborate with global teams for follow-the-sun coverage.

At AppDirect, we believe that innovation thrives in an environment that houses diversity of excellence, experience and thought. We respect each AppDirector as their own fingerprint; unique with no one alike. We foster an environment of inclusion without regard to race, religion, age, sexual orientation, or gender identity enabling AppDirectors to embrace their uniqueness to do their best work. As such, we strongly encourage applications from Indigenous peoples, racialized people, people with disabilities, people from gender and sexually diverse communities, and/or people with intersectional identities.

At AppDirect we take privacy very seriously. For more information about our use and handling of personal data from job applicants, please read our Candidate Privacy Policy. For more information of our general privacy practices, please see AppDirect Privacy Notice: https://www.appdirect.com/about/privacy-notice

 

Create a Job Alert

Interested in building your career at AppDirect? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf


Select...