Back to jobs
New

Senior / Staff AI Research Engineer, Real-Time Inference

Milpitas, CA

Why RoboForce

RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
 
We are looking for a Senior / Staff AI Research Engineer, Real-Time Inference to make embodied AI practical on the edge. In this role, you will drive the full stack of model optimization — from CUDA kernel engineering to quantization and compression — to deploy high-performance AI models on edge compute platforms powering RoboForce robots in the field.
 
Responsibilities
  • Develop and optimize inference pipelines for embodied AI models (VLA, perception, world models) targeting real-time execution on edge hardware such as NVIDIA Jetson platforms.
  • Implement CUDA-level optimizations including custom kernels, memory layout tuning, and hardware-aware graph compilation to minimize model latency.
  • Apply and advance model compression techniques — quantization (INT8/FP16/INT4), pruning, distillation, and structured sparsity — to achieve production-grade throughput on constrained devices.
  • Profile and debug end-to-end inference stacks using tools such as NSight, TensorRT, and Triton to identify and eliminate performance bottlenecks.
  • Collaborate with ML research and robotics teams to co-design model architectures that meet real-time control-loop latency requirements.
  • Establish benchmarking frameworks to evaluate model performance across latency, throughput, power consumption, and accuracy tradeoffs on target hardware.
Requirements
  • Master's degree in Computer Science, Electrical Engineering, or related field with 4+ years of experience, or a PhD degree.
  • Deep expertise in CUDA programming, GPU architecture, and low-level kernel optimization, including custom kernel authoring with tools such as Triton.
  • Hands-on experience with model quantization, pruning, distillation, and deployment using frameworks such as TensorRT, ONNX Runtime, TVM, or Triton.
  • Proficiency in C++ and Python; strong systems programming and performance profiling skills.
  • Experience deploying ML models on edge or embedded hardware (e.g., NVIDIA Jetson, Orin, or equivalent ARM/GPU SoCs).
  • Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
  • Familiarity with embodied AI models — VLA, multimodal transformers, or diffusion-based policies — and their inference characteristics.
  • Familiarity with compiler-based optimization pipelines such as XLA, torch.compile, or MLIR for graph-level model acceleration.
  • Understanding of robotics system constraints such as control-loop timing, sensor fusion latency, and memory bandwidth limits on edge SoCs.
  • Publication or production work in efficient deep learning or on-device ML systems.
Benefits
  • Competitive stock options/equity programs.
  • Health, dental, and vision insurance, 401(k) plan.
  • Visa sponsorship and green card support for qualified candidates.
  • Lunches and dinners, a fully stocked kitchen, and regular team-building events.

Create a Job Alert

Interested in building your career at RoboForce? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf