
Member of Technical Staff, Post-training
About Hark
Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.
We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.
To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.
About the Role
We are looking for a Member of Technical Staff, Post-Training to lead the development of post-training strategies that define how our models acquire coding, computer use, and agentic capabilities at scale.
This role sits at the frontier of a rapidly emerging discipline — one where reinforcement learning, simulation, and large-scale model training converge to produce agents that can reason, plan, and act over long horizons. There is no established playbook here. We're looking for researchers and engineers who can bring rigor and creativity from adjacent fields — RL, robotics, game-playing systems, compiler tooling, formal verification, or program synthesis — and apply them to the next generation of coding and agentic AI.
Responsibilities
- Design and implement post-training strategies, primarily RL-based, to develop strong coding agents capable of multi-step reasoning, tool use, and long-horizon task completion.
- Build and scale simulation and scaffolding environments for agentic RL: code execution sandboxes, computer use environments, tool-calling harnesses, and verifiable reward signals.
- Develop reward modeling pipelines — including outcome-based, execution-based, and process-based reward signals — and iterate on them based on training dynamics.
- Scale synthetic data generation and trajectory distillation pipelines that feed RL training and improve sample efficiency.
- Design and run rigorous ablations to understand how algorithm choice, data mixture, reward shaping, and scale interact in the agentic setting.
- Build evaluation frameworks grounded in real agent tasks — code correctness, execution success, multi-step tool use — to measure progress and guide iteration.
- Collaborate with mid-training, infrastructure, and product teams to translate research insights into durable improvements on the model.
Requirements
- Strong background in machine learning, with hands-on experience training or fine-tuning large models — LLMs, multimodal, or equivalent systems.
- Deep understanding of reinforcement learning: policy optimization, reward design, exploration, and the interplay between environment design and agent behavior.
- Experience building or working within simulation or execution environments (e.g., code interpreters, sandboxed execution, game environments, robotics simulators).
- Proven ability to design and execute rigorous experiments, with strong intuition for diagnosing training failures and scaling bottlenecks.
- Proficiency in Python and PyTorch; comfort working across research and systems code.
- Ability to work in a fast-moving, research-forward environment where the right approach is often unknown at the outset.
We expect strong candidates to come from a range of backgrounds — RL research, robotics, competitive programming systems, compilers, formal methods, or large-scale ML — rather than post-training specifically. The field is new enough that directly relevant experience is rare; what matters is depth, rigor, and transferability.
Bonus Qualifications
- Experience with RL algorithms applied to language or code: RLHF, DPO, GRPO, PPO, or similar paradigms in the LLM setting.
- Familiarity with coding agent benchmarks and evaluation environments (e.g., SWE-bench, HumanEval, LiveCodeBench, competitive programming judges).
- Background in reward modeling — outcome-based, process-based, or learned reward signals.
- Experience with trajectory-based training, imitation learning, or data distillation from stronger models or human demonstrations.
- Prior work on computer use, GUI agents, or tool-using LLMs (e.g., OSWorld, WebArena-style tasks).
- Experience training or scaling models at 10B+ parameters, with attention to efficiency, stability, and GPU utilization.
- Contributions to open-source ML projects or publications at top venues (NeurIPS, ICML, ICLR, EMNLP, COLM, etc.).
Compensation
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components and benefits depending on the specific role. This information will be shared if an employment offer is extended.
Apply for this job
*
indicates a required field