Back to jobs
New

Senior Site Reliability Engineer

New York (Remote) or Boston (Remote)

In the time it takes you to read this job description, RapidSOS will have handled ~1,380 emergencies.

At RapidSOS, we are committed to using technology to build a safer, stronger future and working together to save lives. We’re in an exciting phase of growth, welcoming new members from across the globe to our mission-driven, ambitious, and inclusive team. Our work is founded on our values of elevating purpose, inventing tomorrow, delivering with urgency, serving with integrity, and winning together, all of which support a company culture where people can innovate, collaborate, grow, and, above all, make an impact. 

RapidSOS is ​​the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. Real-time data from the world’s largest safety network of 700M+ devices, 200+ global enterprises, and 23,000+ federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most. Learn more at www.RapidSOS.com.

What this role is about:
Are you excited to work on systems where reliability directly impacts real-world outcomes? At RapidSOS, we build technology that powers emergency response, ensuring critical data gets to the right place at the right time. When these systems degrade or fail, the impact is real and reliability isn’t a background function. It’s fundamental to how our product shows up in critical moments.

We’re seeking a Senior Site Reliability Engineer to own the performance and stability of services that operate at scale in real-world, high-stakes environments. You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code, identifying and resolving issues at their root cause while proactively shaping how systems are built to improve reliability from the start. You’ll go beyond surface-level fixes, digging into everything from service behavior in Kubernetes to application-level decisions that impact performance, cost, and reliability. You’ll collaborate closely with engineering teams to improve how our systems are built, observed, and operated. Along the way, you’ll help shape how we approach reliability as a discipline—closing visibility gaps, improving resilience, and ensuring our platform performs when it matters most.

What you’ll do: 

  • Own performance and reliability outcomes: Ownership of how application-level decisions create system-level impact, including connection pooling, database architecture, traffic routing patterns, and memory allocation. Collaboration with engineering teams that own specific domains, partnering directly to improve reliability and performance across their systems.
  • Design for system resilience: Responsibility for strengthening reliability through proactive design decisions, including safer deployment patterns, failover strategies, and redundancy approaches that improve system behavior under stress.
  • Build observability into system behavior: Proactively instrument services with structured logging, metrics, and alerting so systems are easier to understand and debug. The focus is on creating clear signals from production behavior before issues escalate.
  • Own incidents from signal to resolution: Ownership of production issues from first signal through resolution, including investigation across infrastructure and application layers, root cause identification, and implementation of fixes that restore stability and strengthen system behavior long term.
  • Work across the stack without a permission slip: You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code. When issues come up, you don’t wait for a handoff—ownership is taken directly and driven through to resolution.

What we’re looking for in our ideal candidate: 

  • 5+ years of professional engineering experience with deep expertise in Python 
  • Real cloud infrastructure experience with AWS: networking, managed databases, cost implications of traffic routing decisions, IAM, DNS-based routing and failover
  • Hands-on kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate, you can read events, understand resource limits, know when to drain vs. delete a node, and understand the tradeoffs between orchestration models
  • Strong understanding of distributed systems and how they fail, including resource exhaustion, replication lag, queue backpressure, and other common failure modes
  • Experience operating high-throughput messaging systems (RabbitMQ, Kafka, AWS SNS / SQS, etc.)  and the infrastructure around them, including infrastructure-as-code (e.g., Terraform) and CI/CD pipelines, with an emphasis on improving reliability and scalability
  • Experience building or improving observability through logging, metrics, and alerting
  • Demonstrable experience in using AI to safely and securely enhance velocity, improve reliability and recoverability of services
  • Strong communication and interpersonal skills; is a team player with a positive attitude 
  • Highly self-motivated; ability to adapt and learn quickly in a fast-paced environment with a strong sense of ownership
  • Strong proficiency in coding best practices – ability to write clean, maintainable, and testable code 
  • Demonstrated expertise in problem solving – comfortable working across both infrastructure and application layers to diagnose and resolve issues at the source
  • Ability and willingness to collaborate in-person a few times per quarter, or as needed

Nice-to-have experience (but not required!):

  • Experience supporting production systems in an on-call or similar capacity where reliability matters
  • Experience with observability and GitOps tooling; hands-on with Datadog (APM, alerting), Elasticsearch/OpenSearch, and ArgoCD-based GitOps deployments; comfortable modernizing legacy CI/CD pipelines (e.g., Concourse, Jenkins) toward cloud-native approaches

What we offer: 

  • The chance to work with a passionate team on solving one of the largest challenges globally 
  • Competitive salary and benefits and equity participation 
  • A dynamic, flexible and fun start-up work environment with a highly talented team

If you're curious to learn more about RapidSOS, you can check out https://rapidsos.com/blog/ 

Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $160,000 - $195,000. This role will also be eligible to receive equity options. #LI-Remote

RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status. 

Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.

Create a Job Alert

Interested in building your career at RapidSOS? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in RapidSOS’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.