Back to jobs
New

Systems Engineer: Real-Time Engine

Seattle, Washington

About the Role

We're building the engine that powers our AI avatar: a real-time interactive loop that continuously senses the user (audio and video), orchestrates inference across multiple models, manages state, and renders a coherent audio-visual response within tight latency budgets.

Traditional real-time systems are hard because the timing requirements are strict. This system is harder: the system components are neural networks with variable latency, non-deterministic outputs, and no ability to pause the user while they think. You're building a system that has to feel instantaneous while running inference that isn't. This is the runtime that makes a human-AI conversation feel alive,.

You’ll own this runtime and collaborate closely with our research team on how models are invoked, how conversational context is assembled, and how response quality is balanced against latency. You’ll have direct influence over architecture decisions as an early engineer at a small, well-funded team.

What You’ll Do

  • Build and own the server-side real-time engine: session lifecycle, state management, and the architecture of the interaction loop, including the timing and scheduling layer that keeps the loop coherent
  • Integrate GPU-backed model inference into the real-time loop, wiring model outputs into the engine's state and render pipeline
  • Develop performance tooling for latency breakdowns (TTFO, steady-state), tracing, profiling, and regression detection
  • Collaborate with product and research to define how the system behaves at its boundaries — APIs, event streams, and the invariants the engine guarantees to the rest of the stack

Required Skills

  • Real-time streaming systems experience. You’ve built systems that operate on a continuous real-time loop with hard per-tick latency budgets, where output must never stall.
  • Strong Python and async programming. You need to be productive immediately in Python — asyncio should be second nature. The key skill is writing prototype code with clean enough architecture that it survives a language port.
  • Systems programming background. The production system will be written in Rust. You don’t need to know Rust today, but you should have experience in at least one systems language (Rust, C++, Go) and be motivated to adopt Rust.
  • Concurrency and state machine design. Experience designing concurrent systems: async runtimes, thread models, lock contention, schedulers. Specifically, managing multiple in-flight async processes with cancellation, priority switching, and preemption
  • Strong intuition for latency. Profiling, tail behavior, and tradeoffs across throughput vs. responsiveness. Ability to reason about end-to-end pipelines across CPU and GPU boundaries.
  • Comfort building from scratch under time pressure. This is a “design the architecture and ship it” role, not a “maintain existing infrastructure” role. You’re comfortable with ambiguity and rapid iteration.

Bonus Points

  • Experience with real-time media systems: WebRTC, RTP/RTCP, jitter buffers, A/V sync
  • Experience with real-time tick-loop architectures (e.g., game engines, simulation runtimes, audio DSP pipelines, robotics)
  • Experience with GPU inference serving and optimization: Triton, TensorRT, vLLM, CUDA profiling
  • Building LLM agent orchestration systems
  • Familiarity with streaming generation systems: incremental decoding and mid-stream control, lock-free data structure design

 

Nuance Labs Key Facts

  • $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels including Synthesia’s former CPO.

  • A world-class team of PhDs from MIT, UW, and Oxford with decades of industry experience at Apple and Meta, advancing real-time avatars from cutting-edge research to products used by millions.

  • In-person collaboration, 5 days a week at Seattle HQ

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf


Select...