Director, Evaluations
LawZero is a non-profit building safe-by-design AI systems. We’re building the Scientist AI, an advanced AI system designed from the ground up to be both highly capable and safe. As we develop both general‑purpose Scientist AI models and safety guardrails for frontier LLMs, we need rigorous, independent evaluation of every capability and safety claim we make. We are looking for a Director of Evaluations to build, lead, and grow LawZero’s Evaluations Team.
This is a foundational hire. You will define what world‑class evaluation looks like at LawZero, build the team and infrastructure to deliver it, and ensure that evaluations remain independent of the main research stream so that capability and safety claims can be trusted both internally and externally by the wider AI and AI safety community.
Key responsibilities
- Define LawZero’s evaluations strategy and roadmap, prioritising what needs to be measured and when, in close coordination with both research and product teams.
- Build up the Evaluations Team during your first 3–6 months, scaling to roughly 8–10 people across research, engineering, dataset and benchmark design, and red‑teaming.
- Operate the team independently of the main research and product streams in order to avoid conflicts of interest, including designing novel benchmarks that can be applied apples‑to‑apples to evaluate both the Scientist AI and frontier LLMs.
- Oversee the design and construction of new datasets, tasks, and virtual or interactive environments to measure performance of the Scientist AI across capabilities, safety (including honesty and goal-directedness), explainability, causal mechanisms and detecting adversarial attacks.
- Lead evaluation of the Scientist AI when deployed as a guardrail around frontier models, including its ability to comply with harm specifications, detect and block harmful responses, explain its decisions, and resist adversarial attacks such as jailbreaks, prompt injection, and data poisoning.
- Establish and lead our automated and manual red‑teaming programmes, both in‑house and in partnership with external providers, to stress test the Scientist AI as a general‑purpose model and as a guardrail.
- Lead the construction of internal tooling and infrastructure needed to run evaluations at scale, automating and standardizing the pipeline wherever possible.
- As needed and where possible, directly support research and product streams with their own internal requirements w.r.t. evaluations and benchmarking to unblock and speed up.
- Own LawZero’s public communication of evaluation results, including model and system cards, technical reports, peer‑reviewed publications and blog posts, to build trust with the wider AI safety community.
- Represent LawZero externally on evaluations and AI safety measurement, including engagements with AI safety institutes, research collaborators, and grant funders.
Skills and qualifications
- An advanced degree (MSc or higher) in machine learning, computer science, or a closely related field.
- 10+ years of experience in machine learning, with at least 5 years in a leadership role building or scaling technical teams working on real-world ML products.
- Hands‑on expertise in designing and running large‑scale evaluations of LLMs or other frontier ML systems across capabilities, safety, and adversarial robustness.
- A track record of building evaluation datasets, benchmarks, or interactive environments from scratch, including for safety‑relevant properties such as honesty, sycophancy, refusal behaviour, and adversarial robustness.
- Strong written and verbal communication skills, including the ability to translate technical results for non‑technical audiences such as executives, funders, and policymakers.
- Comfortable operating in a research‑driven, fast‑moving environment with significant ambiguity, and able to bring structure to it without slowing it down.
Nice to have:
- Experience leading red‑teaming exercises (automated, manual, or both) and working with third‑party evaluation or red‑teaming partners is a bonus.
- Experience releasing open‑source datasets, benchmarks, or evaluation tooling is a bonus.
- Familiarity with current AI safety policy and standards work (UK AISI, US AISI, NIST, EU AI Act, etc.) is a bonus.
- Experience contributing to or coordinating with external safety institutes, grant funders, or government bodies is a bonus.
What we offer
- The opportunity to contribute to a unique mission with a major impact
- Comprehensive health benefits
- A minimum of 20 days vacation per year upon start
- A minimum retirement savings employer contribution of 4%
- Generous flexible benefits designed to contribute to your well-being
- A team of passionate experts in their field
- A collaborative and inclusive work environment with offices in the heart of Little Italy, in the trendy Mile-Ex district, close to public transportation.
About LawZero
LawZero is a non-profit organization committed to advancing research and creating technical solutions that enable safe-by-design AI systems. Its scientific direction is based on new research and methods proposed by Professor Yoshua Bengio, the most cited AI researcher in the world. Based in Montreal, LawZero’s research aims to build non-agentic AI that learns primarily to understand the world rather than to act in it, giving truthful answers to questions based on transparent and externalized probabilistic reasoning. Such AI systems could be used to accelerate scientific discovery, to provide oversight for agentic AI systems, and to advance the understanding of AI risks and how to avoid them. LawZero believes that AI should be cultivated as a global public good—developed and used safely towards human flourishing. For more information, visit www.lawzero.org
You belong here
At LawZero, diversity is important to us. We value a work environment that is fair, open and respectful of differences. We welcome applications from highly qualified individuals interested in working towards our mission in a respectful, inclusive and collaborative setting.
Your personal information will be collected and processed by LawZero to evaluate your application for employment in compliance with our Privacy Policy. Under privacy laws in force in your country of residence, you may have several privacy rights, such as to request access to your personal information or to request that your personal information be rectified or erased. Details on how you can exercise your rights can be found in our Privacy Policy.
Apply for this job
*
indicates a required field

.png?1748917260)