New

Site Reliability Engineer II

Bengaluru, India

About EarnIn

As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. Our community members access their earnings as they earn them, with options to spend, save, and grow their money without mandatory fees, interest rates, or credit checks.

We’re fortunate to have an incredibly experienced leadership team, combined with world-class funding partners like A16Z, Matrix Partners, DST, Ribbit Capital, and a very healthy core business with a tremendous runway. We’re growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of our growth journey.

POSITON SUMMARY

We have a real passion for delivering the best product experience for our community members. We work closely with all teams and share responsibility for rapidly delivering production-ready features to our community. We build or contribute to infrastructure, reliability tooling, and practices that help teams ship quickly and safely. We think a lot about things like good alert hygiene, friendly runbooks, clear SLOs, and how to make deployments feel boring in the best possible way.

As an SRE, you are a well-rounded practitioner in designing, observing, and operating our systems in production. Rather than following established playbooks, you are starting to write them. You work with confidence across observability tooling, incident response, and infrastructure-as-code, and you know how to communicate tradeoffs clearly to the engineers and teams around you. This position will be hybrid, based in our Bengaluru office, with 2 days a week required in the office, as part of our expanding site. EarnIn provides excellent employee benefits, including healthcare, internet/cell phone reimbursement, a learning and development stipend, and opportunities to collaborate with and travel to our Palo Alto HQ and Bangkok Site. Our salary ranges are determined by role, level, and location. We are unable to provide visa sponsorship or immigration support for this position.  

WHAT YOU'LL DO

  • Design systems with resilience, graceful degradation, and capacity in mind.
  • Define and measure SLOs and SLIs that actually reflect what our customers feel.
  • Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.
  • Configure alerting and routing that reach engineers through incident.io, where we run incident management and on-call, so that when a human gets paged, it really matters.
  • Continuously improve our incident lifecycle, from fast detection and solid triage, through clear communication, to blameless, actionable follow-ups.
  • You will combine solid software fundamentals with reliability thinking so our systems are highly available, easy to debug, and a joy to work on. You know that the only good 2 a.m. alert is the one that never fires in the first place.
  • You have a software background and are passionate about optimizing both quality of service and developer experience.
  • You are calm and collected, cool under pressure, and not afraid to voice your opinion even in the heat of an incident. You can turn a wall of logs and a few graphs into a clear hypothesis instead of pure adrenaline.
  • You are excited to work with both technical and non-technical teams throughout our organization and can explain SLOs and error budgets in plain language.
  • You have experience working with large-scale, secure, and performant distributed systems, including the fun parts like retries, backoff, and timeouts that actually work together.
  • You are genuinely excited about AI, not just as a productivity tool, but as a platform for building smarter, more autonomous SRE operations. You build with it, validate what it produces, and push its limits responsibly.
  • You are passionate about learning new technologies and adopting the right tools to manage services in production, keeping SLAs and MTTR in mind at all times, instead of just adding another dashboard.
  • You have the ability to plan and execute on reliability and operability initiatives for the team, with an eye toward growing your scope and impact over time.
  • You ask, “What is the SLO?” when someone says, “The system is slow,” and you get suspicious when a graph is perfectly flat for too long.
  • You can laugh about on-call life, you have opinions about which alerts should never wake a human, and you may or may not have strong feelings about incident channel naming conventions in incident.io.

WHAT WE'RE LOOKING FOR 

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in an SRE or Software Engineering role.
  • Hands-on coding experience in any two programming languages
  • Experience successfully managing production environments and understand that you need more than a for-loop and SSH to make it happen.
  • A strong belief that observability is critically important to run highly available and performant services, not an optional nice-to-have.
  • Experience using SLOs, SLIs, and KPIs to guide decisions, prioritize work, and explain tradeoffs, not just decorate dashboards or slide decks.
  • Read part of, or all of, the SRE book and contextualize it for different engineering teams and cultures, rather than treating it as a one-size-fits-all checklist.
  • Proficiency experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, ChatGPT, or similar tools) or prompt engineering as part of your software development workflow to reduce operational toil, accelerate incident root cause analysis, and optimize infrastructure-as-code workflows.
  • Demonstrated experience building or meaningfully contributing to agentic AI workflows: runbook automation, AI-assisted alert triage, LLM-driven postmortem generation, or similar. You know how to validate what AI produces before it touches production.
  • Hands-on experience shepherding services from design to production, through incident learnings, and into a state where on-call actually gets quieter over time.
  • Tackled production incidents, learned the lessons, and know how to turn those lessons into concrete, technical, and process changes that make it much harder for the same problem to happen again.
  • Interest in mentoring peers and a belief in the investment of people as one of the highest leverage ways to improve reliability and reduce toil.

#LI-Hybrid

 

 

At EarnIn, we believe that the best way to build a financial system that works for everyday people is by hiring a team that represents our diverse community. Our team is diverse not only in background and experience but also in perspective. We celebrate our diversity and strive to create a culture of belonging. EarnIn does not unlawfully discriminate based on race, color, religion, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, legally protected medical condition, family care status, military or veteran status, marital status, registered domestic partner status, sexual orientation, genetic information, or any other basis protected by local, state, or federal laws. EarnIn is an E-Verify participant. 

EarnIn does not accept unsolicited resumes from individual recruiters or third-party recruiting agencies in response to job postings. No fee will be paid to third parties who submit unsolicited candidates directly to our hiring managers or HR team.

Create a Job Alert

Interested in building your career at EarnIn? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...

Should you receive an offer, this will be used for offer and onboarding purposes.

Select...
Select...

 

Select...