
Senior Site Reliability Engineer
Roadie, a UPS company, is a leading logistics and delivery platform that helps businesses tackle the complexities of modern retail with unmatched delivery coverage, flexibility and visibility. Reaching 97% of U.S. households across more than 30,000 zip codes — from urban hubs to rural communities — Roadie provides seamless, scalable solutions that meet a variety of delivery needs.
With a network of more than 310,000 independent drivers nationwide, Roadie offers flexible delivery solutions that make complex logistics challenges easy, including solutions for local same-day delivery, delivery of big and bulky items, ship-from-store and DC-to-door.
Roadie is seeking a Senior Site Reliability Engineer to join our growing Technical Operations Team. We are looking for a candidate who has experience implementing site reliability principals, as well as production level Kubernetes experience. The ideal candidate is a skilled problem solver with intimate knowledge of site reliability practices, standard dev ops principles, AWS, scripting languages and Kubernetes.
What You'll Do
- Build systems that optimize the uptime and reliability of our platform, and support the management and optimization of our software delivery pipeline, observability and infrastructure operations
- Maintain, support, and engineer production and non-production Kubernetes Clusters (EKS) as well as ES, MSK, RDS, and EC (Redis) clusters
- Deploy and maintain monitoring and logging solutions based on Prometheus, Loki, Thanos, Grafana, OpenTelemetry and New Relic
- Collaborate with cross-functional teams to identify and address potential bottlenecks, optimize resource utilization, and proactively prevent system failures
- Define and manage SLO, SLI and error budgets
- Develop processes, tools and automation to reduce toil across engineering teams
- Plan and forecast service capacity and demand, assess cost optimization, and tune systems and software
- Debug production/non-production issues
- Take part in 24/7 on-call rotation
Technology We're Using Now
- Python, Ruby on Rails, Golang
- React/Redux, Objective-C and Swift, Android
- Postgres, Redshift, Redis, Kafka
- AWS/GCP
- Docker/Kubernetes
- OpenTelemetry/Prometheus/Thanos/Loki/Grafana/New Relic/Sentry
- Git/CircleCI
- ArgoCD
What You Bring
- 6+ Years in various SRE roles
- 6+ Years in various DevOPS/System Engineering roles
- 6+ Years of experience building and managing production Kubernetes infrastructure
- 6+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.)
- Experience with Infrastructure as code such as Terraform or Crossplane
- Experience with CI/CD Development tools (CircleCI, etc.)
- Experience with GitOPS Tools (ArgoCD)
- Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.)
- Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc)
- Must be able to work independently, be self-motivated and handle multiple priorities
- Comfortable working in a fast-paced agile environment
Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly.
Why Roadie?
- Competitive compensation packages
- 100% covered health insurance premiums for yourself
- 401k with company match
- Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!)
- Flexible work schedule with unlimited PTO
- Monthly 3-day weekends
- Monthly WFH stipend
- Paid sabbatical leave- tenured team members are given time to rest, relax, and explore
- The technology you need to get the job done
Create a Job Alert
Interested in building your career at Roadie? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field