Back to jobs

Member of Technical Staff - Open Endedness

San Francisco

About Vmax

Vmax is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimising beyond the local maxima of learning from human expertise.

About the role

A core focus of ours is agents that can learn to find their own objectives in the world. We are looking for researchers to design and build new ways of using RL where the formulation of rewards and tasks need to be discovered, rather than given. 

Responsibilities

  • Develop RL methods for agents that can discover useful objectives, tasks and curricula without relying entirely on human-specified rewards.
  • Design systems for open-ended learning, including unsupervised/automated environment design, asymmetric self-play, and intrinsic motivation.
  • Build training loops where agents learn from interaction, exploration, novelty, competence progress, self-generated challenges, or other nonstandard reward signals.
  • Investigate how agents can avoid collapse into trivial, degenerate, or easily exploitable objectives.
  • Own and develop a research agenda within Vmax, from identifying promising directions to executing experiments and communicating results.

Minimum Requirements

  • PhD or equivalent experience in machine learning, reinforcement learning, artificial intelligence, or a closely related field.
  • Track record of strong technical work, demonstrated through publications, open-source projects, deployed systems, competitions, or equivalent contributions.
  • Deep understanding of reinforcement learning
  • Strong interest in open-ended learning
  • Experience with LLM post-training
  • Strong empirical research ability, including designing experiments, choosing meaningful baselines, running ablations, and diagnosing unexpected results.
  • Strong programming ability in Python and experience with at least one major ML framework such as PyTorch or JAX.
  • Ability to work independently on ambiguous research problems and turn high-level ideas into concrete experimental programs.
  • Ability to collaborate effectively with researchers and engineers on ambiguous, fast-moving technical problems.
  • Clear written and verbal communication of technical ideas, results, tradeoffs, and risks.

Nice to have

  • Experience with open-ended learning, automatic curriculum generation, intrinsic motivation, self-play, goal-conditioned RL, unsupervised skill discovery, multi-agent RL, quality-diversity, or evolutionary methods.
  • Familiarity with methods such as POET,  population-based training, or multi-agent RL
  • Experience designing benchmarks or evals for generalization, exploration, long-horizon learning or behavioral diversity
  • Demonstrated taste for identifying non-obvious research directions and converting them into tractable experiments.

Role specific location policy

  • this role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement

Compensation

The expected salary range for this position is $300,000 - $500,000 USD

Apply for this job

*

indicates a required field

Phone
Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf


Briefly describe your specific contribution Max 250 words

Please highlight to us what makes you proud about this and why. Max 250 Words

Max 250 Words