Senior Site Reliability Engineer - Multi-Cloud Environment
Who We Are
Building the Backbone of Distributed Applications.
At Orkes, we are reimagining how developers build reliable, scalable, event-driven applications without wrangling complex infrastructure.
Born from Netflix's battle-tested open-source project, OSS Conductor, our platform now powers billions of mission critical workflows across fintech, e-commerce, logistics, and healthcare, helping businesses run their core operations with confidence.
We’re solving some of the hardest infrastructure problems: how do you make distributed systems reliable, observable, and scalable – without every team having to reinvent the wheel? We absorb the complexity of distributed systems, letting teams build with confidence.
These are the workflows behind every transaction, every delivery, every patient check-in, quietly orchestrating our digital world.
We're building the next generation of resilient, scalable infrastructure that powers millions of users worldwide. Our platform operates across multiple cloud environments, ensuring 99.99% uptime while handling massive traffic spikes and complex distributed workloads.
We believe that reliability isn't just about keeping the lights on—it's about creating systems so robust and well-designed that our engineering teams can innovate fearlessly, knowing their applications will scale seamlessly and recover gracefully from any failure.And the best part is we’re just getting started.
What You’ll Do
- Write clean, maintainable code and champion engineering best practices for infrastructure automation.
- Design and implement multi-cloud infrastructure spanning AWS, GCP, Azure, and hybrid environments.
- Solve complex problems in distributed computing and event-driven systems, including designing high availability architectures.
- Develop sophisticated monitoring and alerting systems that provide deep visibility into distributed applications
- Collaborate closely with product, design, and engineering teammates to ship high-impact features
- Automate everything – from infrastructure provisioning to incident response and capacity planning
- Lead incident response and conduct thorough post-mortems to continuously improve system reliability
- Collaborate with engineering teams to embed reliability principles into application design from day one
- Optimize costs across multi-cloud deployments while maintaining performance and reliability standards.
- Drive technical discussions and decisions that influence the future of the platform
- Work with Orkes SDKs and code written in Java.
- 5+ years of SRE/DevOps experience with production systems at scale
- Capability to code in statically compiled language. Java preferably. By definition SRE is software engineer not sysadmin.
- Deep understanding of distributed systems, microservices, and cloud-native architecture
- Experience with REST, gRPC, and asynchronous messaging systems (Kafka, RabbitMQ, etc.)
- Strong problem-solving skills and a passion for learning new technologies
- A collaborative mindset—we win as a team
- Expert-level knowledge of at least one cloud provider (AWS, GCP, Azure) including their core services, networking, and security models and exposure to 2 remaining ones.
- Possessing expert-level knowledge of at least one cloud provider (AWS, GCP, or Azure) is essential, encompassing their core services, networking, and security models. Exposure to the remaining two providers is also required.
- Strong programming skills in Java, Go, or similar languages – you're a software engineer first.
- Deep understanding of containerization (Docker, Kubernetes) and orchestration across cloud environments and corresponding security hardening.
- Experience with Infrastructure as Code (Terraform, Pulumi) for multi-cloud deployments.
- Proficiency with monitoring tools (Prometheus, Grafana, DataDog, New Relic) and log aggregation systems
- Solid grasp of networking concepts including load balancing, CDNs, DNS, and security best practices.
- Experience with service mesh technologies (Istio, Linkerd, Consul Connect)
- Knowledge of database administration across cloud-native and traditional systems
- Familiarity with chaos engineering and disaster recovery testing
- Understanding of compliance frameworks (SOC2, PCI DSS, HIPAA) in multi-cloud environments
- Experience with orchestration engines or workflow systems (e.g., Conductor, Camunda, Temporal)
- Familiarity with Kubernetes and cloud-native environments (AWS/GCP/Azure)
- Contributions to open-source infrastructure projects
- Background in security engineering or FinOps practices
- Experience with edge computing and CDN optimization
- You’ll be working on real and difficult problems that developers face every day
- You’ll join a team that thinks deeply about the systems we build and the people who rely on them
- We’re a fast-growing team backed by top-tier investors and used by some of the world’s biggest companies
- Remote-friendly, flexible working hours, and a strong culture of trust and ownership
- You’ll have a huge impact on the direction of our technology and our company
Not 100% sure you fit the description? That’s okay.
We care more about your potential and passion than checking every box. If this role excites you, we want to hear from you.
Ready to help shape the future of distributed applications?
Apply now and build something developers will love
More Details:
The base salary for this role is between $180,000 - $220,000 USD for West Coast, $160,000 - $200,000 USD for East Coast, and CAD 170,000 - 200,000. When determining compensation, a number of factors will be considered: skills, experience, job scope, location, and competitive compensation market data.
- Start Date: ASAP
- Status: Full time
- Type: Remote
- Location: United States/ PT hrs, Canada/PT hrs
- Department: Engineering
- Reports to: Head of Engineering
Benefits
- Comprehensive health coverage including medical, dental, and vision
- Flexible PTO
- Support for personal development
At Orkes, we are committed to building a team that reflects a rich tapestry of perspectives, identities, and professional experiences. We believe that diversity is not just a checkbox, but a driving force behind innovation, creativity, and success. By embracing a variety of backgrounds, we cultivate an inclusive environment where every team member feels valued and empowered to bring their authentic selves to work.
Join us at Orkes and be a part of a team where your unique perspectives are not only welcomed but celebrated. Together we are shaping the future technology by leveraging the strength that comes from embracing diversity in all its forms. Your Journey with us is an opportunity to contribute to something greater and make a lasting impact.
Create a Job Alert
Interested in building your career at Orkes? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field