
AI Engineer - Site Reliability Researcher
About Traversal
Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work.
Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry: Citadel Securities, Cockroach Labs, Datadog, DE Shaw, ServiceNow, Glean, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.
The Role
Site Reliability Engineering and troubleshooting are at the core of what Traversal does, and while that’s simple to say, it’s hard to do, and even harder to explain. SREs analyze customer issues, but SRE Researchers figure out how they analyze customer issues then work with engineering to teach the AI to replicate their process. In addition, our target user base is experienced SREs (like you) so be prepared to put yourself in the mindset of end users and help shape the product directly. To sum up, Traversal wants to model your troubleshooting talent in code, putting you at the nexus of current customers, potential customers, developers, AI engineers, UI experts and more.
We’re entering a phase of rapid growth driven by the needs of customers from mid-market to Fortune100 enterprises. We need people with an engineering mindset who enjoy solving puzzles and have the flexibility to do something different every day. You’ll play a key role in establishing the SRE research practices that allow us to exceed customer expectations today, tomorrow and beyond.
Responsibilities
- Troubleshooting Disparate Systems: Our customers use a wide variety of platforms so flexibility and curiosity are critical
- External Interface: Gather requirements from new customers, guide them through on-boarding and maintain positive relationships to ensure their success
- Internal Collaboration: Partner with engineering, AI, and product teams, passing along what you learn from end-users, as well as your own input
- Evaluation and Analysis: Using your troubleshooting and customer RCAs to evaluate Traversal’s performance and find ways to further improve it
- Incident Management: Lead and further our internal on-call and incident response processes, including alerting, debugging, and postmortems
Requirements
- 5+ years of experience as an SRE, infrastructure engineer, or similar role in fast-paced environments
- Innate ability to debug distributed systems (e.g.: bare metal, VMs, Kubernetes, Docker, containers), understand how you did it and explain it to others
- Expertise with observability and metrics tools (Datadog, Elasticsearch, Grafana, OpenTelemetry, Prometheus, ServiceNow, Splunk, etc) and incident response
- Understanding of networking including routers, switches, firewalls, VPNs, etc
- Hands-on experience with cloud environments (AWS, Azure, Digital Ocean, GCP) and Infrastructure As Code like Helm and Terraform
- Experience supporting cloud/on-prem and hybrid deployments
Nice to Have
- Background in developer productivity tooling or internal platform teams
- Prior experience building systems that connect infra events to developer workflows
- Exposure to agentic systems or AI observability platforms
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.
Why You Should Join Us
We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.
Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.
Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.
Create a Job Alert
Interested in building your career at Traversal? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
