Senior Engineering Manager, Compute
About Us
Senior Engineering Manager, Compute
Companies at the frontier of the AI revolution run on Temporal. OpenAI runs on Temporal, handling millions of requests. Cursor runs its cloud coding agents on Temporal at over 50 million actions a day across 7M+ workflows, and more than a third of the pull requests its users merge now come from those agents. Replit, Lovable, Abridge, and Hebbia build their agents on it too. In the last year alone, AI-native companies executed 1.86 trillion actions on Temporal Cloud, and the curve is still bending upwards. Backed by a recent $300M Series D at a $5B valuation, we are building the durable execution layer the agentic era depends on.
The Compute team owns the layer all of that runs on. We are looking for a Senior Software Development Manager to lead the effort to make any aspects of Temporal's compute invisible to our customers, allowing them to focus on application layer innovation, while we handle the compute muck. This is a rare, build-the-foundation mandate: the compute substrate that the world's most demanding AI workloads will run on. We want a leader who has operated compute at planet scale, thinks in fleets, goodput, and cost-per-unit-of-compute, and pairs that with the operational rigor to run a service that frontier-AI companies bet production on.
Responsibilities
- Strategic direction for Compute: Own the strategy and standards of excellence for the compute layer that the world's agents run on, across design, delivery, and operations. Build a culture of ownership, quality, and customer-first decision-making.
- Technical leadership: Lead, hire, and grow a high-ownership team; roll up sleeves, ready to do deep into the trenches, by staying close to design docs and code, rather than managing from a distance. Coach engineers, level them up, and clear the friction that slows them down.
- Roadmap & trajectory: Drive the arc from today's compute toward the next-generation of compute platforms. Ground prioritization in customer and design-partner feedback, and turn ambiguous, fast-moving requirements into predictable, iterative delivery.
- Operational excellence: When you run frontier AI in production, reliability is the product. Own operations, run on-call and incident response, and drive blameless postmortems and the systemic fixes that prevent recurrence.
- Technical depth: Guide the hard architectural decisions for large-scale, multi-tenant compute, where technical concerns cut across workload isolation and security, scheduling, fleet efficiency / utilization / goodput, and performance, while ensuring the platform is reliable and efficient for the workloads that depend on it.
- Capacity, supply & economics: Own utilization, capacity and supply planning, and the cost-per-unit-of-compute and margin profile of the fleet, across CPU compute today and accelerated compute ahead.
- Cross-team & customer execution: Partner with leadership, Product, SDK, UX/DX, Security, and design-partner customers to align priorities and unblock delivery. Communicate progress, tradeoffs, and risk clearly to technical and non-technical audiences alike.
Qualifications
- Proven experience leading software engineering teams that build and operate large-scale compute platforms or fleets, with strong operational practices.
- 12+ years in software and/or infrastructure engineering, including 7+ years of people management and demonstrated ownership of delivery and live-site outcomes.
- Deep distributed-systems and compute infrastructure depth, with the hands-on judgment to guide architecture and execution rather than from a distance.
- Experience operating multi-tenant compute that other people's production workloads depend on.
- Bachelor's degree in Computer Science or related field, or equivalent practical experience; advanced degree a plus.
- Excellent communication skills, with the ability to partner across engineering, product, and leadership and fold customer feedback into the roadmap.
Required Skills
- Strong leadership, coaching, and performance management; ability to grow engineers and build a healthy, accountable, high-ownership team.
- Excellence in execution: planning, prioritization, and delivering iterative milestones in an ambiguous, fast-moving environment while managing unplanned work.
- Fleet thinking: utilization, goodput, capacity and supply planning, and cost discipline as first-class engineering concerns.
- Live-site reliability craft: on-call, incident management & response, and postmortem-driven continuous improvement.
- Strong command of the building blocks of a compute platform: multi-tenant isolation and security, scheduling, and resource management.
- Ability to review and raise the bar on technical artifacts (design docs, code reviews) across a distributed-systems codebase.
Preferred Experience
- MicroVMs and virtualization (Firecracker, gVisor, Edera) or managed-compute primitives (AWS Fargate, GCP Cloud Run, AWS Lambda), and/or Kubernetes internals.
- Building serverless or hosted-compute products from 0 to 1, including the rapid-delivery-vs-durable-platform tradeoffs that come with it.
- Multi-cloud delivery across AWS and GCP.
- Cold-start, warm-pool, and scheduling/latency optimization for on-demand compute.
- Agent sandboxes, secure execution of untrusted code, or other AI-agent infrastructure.
- GPU / accelerated compute: fractional GPUs (MIG, MPS, time-slicing), GPU scheduling, training vs. inference fleets, and multi-tenant GPU isolation.
---
The ideal candidate is a strategic thinker with a hands-on approach, energized by building the compute foundation the AI era runs on. They are comfortable shaping a space that doesn't fully exist yet, obsessed with reliability when customers bet production on it, default to working backwards from customers, and balance the speed to ship 0-to-1 against the durable design a planet-scale platform demands.
Compensation
- The estimated pay range for this role is $320,000 - $335,000.
- This role is eligible to participate in Temporal's equity plan
- Unlimited PTO, 12 Holidays + 2 Floating Holidays
- 100% Premiums Coverage for Medical, Dental, and Vision
- AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
- Empower 401K Plan
- Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more!
Paid Time Off (PTO) and Benefits outside the United States vary by country, and are issued in partnership with Remote.com. Additionally, Temporal offers perks to all international employees for learning & career development, a lifestyle spending account, in-home office setup (in addition to company-issued hardware), professional memberships, work-from-home meals, and access to the Calm app for mental wellness.
Travel
Temporal is a globally distributed, collaborative team that values opportunities for in-person connection. Occasional travel may be required for company events, team offsites, and other meaningful moments that bring us together.
- $3,600 / Year Work from Home Meals
- $1,800 / Year Professional Enrichment (Career Development & Professional Memberships)
- $1,200 / Year Lifestyle Spending Account
- $1,000 / Year In-Home Office Setup (In addition to Temporal issued equipment - laptop, monitor, keyboard, mouse, trackpad, and extension power cable at no cost to you)
- $74 / Month Reimbursement for Internet
- Calm App Subscription for Mental Health & Wellness
Apply for this job
*
indicates a required field
