
AI Researcher
About Traversal
Traversal is building an AI site reliability engineer that identifies, root causes, and fixes production issues within complex software infrastructure. We're a team of some of the best AI talent - including professors at Columbia & Cornell with PhDs from MIT & Berkeley - with over 20 years of research experience, pushing the forefront of AI Agents and Causal AI. Traversal is already deployed in some of the largest enterprises in the world helping improve the resilience of mission critical systems that serve millions of people worldwide, with a decrease in MTTD/MTTR of up to 90%.
The Role
As an AI Researcher at Traversal, you’ll work on training intelligent AI agents to perform root cause analysis (RCA) in complex, high-pressure production environments—even when labeled data is limited or nonexistent. Your work will shape the foundational reasoning capabilities of our agentic systems, powering insights that help engineers detect, diagnose, and resolve incidents across massive volumes of observability data.
You’ll lead research across areas like LLM fine-tuning, reward modeling, synthetic data generation, and deep reinforcement learning, all in service of building more robust, interpretable, and autonomous agents. This is a high-impact role at the intersection of AI, infrastructure, and systems reliability, ideal for those with deep research experience looking to translate bold ideas into real-world impact.
Responsibilities
- LLM & Agent Research: Prototype and evaluate novel prompting strategies, reasoning workflows, and tool-use policies for agents handling large-scale observability data and complex troubleshooting workflows.
- Training & Alignment: Explore and apply fine-tuning, reinforcement learning, and reward modeling techniques to align AI behavior with real-world SRE workflows and debugging practices.
- Synthetic Data & Experimentation: Design pipelines to generate synthetic incidents and observability signals, enabling scalable training and testing in data-scarce scenarios.
- Cross-Team Collaboration: Work closely with AI engineers, infrastructure teams, and product leads to bring research into production and close the loop between experimentation and impact.
- Stay on the Frontier: Track the latest developments in LLMs, agentic architectures, and AI alignment—translating insights into actionable improvements.
Requirements
- PhD in Computer Science, Machine Learning, Statistics, Electrical Engineering, or related field (or equivalent research experience).
- Strong foundation in deep learning, reinforcement learning, or robotics.
- Hands-on experience with fine-tuning LLMs.
Ability to design and run rigorous experiments, analyze outcomes, and iterate quickly. - Demonstrated ability to collaborate across engineering and research boundaries and communicate complex ideas clearly.
Nice to Have
- Background in observability (logs, metrics, traces) or production infrastructure debugging.
- Experience with RLHF, synthetic data pipelines, or LLM evaluation tooling.
- Contributions to open-source agent frameworks or academic publications.
- Familiarity with Terraform, Kubernetes, or ML orchestration platforms.
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.
Why You Should Join Us
We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.
Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.
Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.
Apply for this job
*
indicates a required field