
Principal AI Systems Engineer
About Atari
Atari is an interactive entertainment company and an iconic gaming industry brand recognized worldwide for its multi-platform games and licensed products. Atari owns and/or manages a portfolio of more than 400 games and franchises, including globally recognized brands such as Asteroids®, Centipede®, Missile Command®, Pong®, and RollerCoaster Tycoon®.
The Atari family of brands includes Digital Eclipse, Nightdive Studios, Infogrames, AtariAge, MobyGames, as well as Coatsink, Early Morning Studios, and Stormteller Games—spanning game development, publishing, and community experiences worldwide.
Atari operates internationally with offices in New York, Paris, Germany, and India.
Overview
Architect, build, and own AI systems that automate expert-intensive technical workflows end-to-end — from CLI frameworks, MCP servers, and agent tooling through to production deployment, business outcome tracking, and continuous improvement. You solve real business problems with AI, ensure solutions are fully implemented and adopted, and measure whether they are actually working.
What You’ll Do
System Architecture and Framework Design
- Own the end-to-end architecture of AI automation systems: workflow decomposition, component communication, human checkpoint placement, and failure behaviour.
- Design and build the internal CLI frameworks, reusable libraries, and agent scaffolding that all pipelines are built on.
- Author and maintain agent instruction files (SKILL.md, CLAUDE.md, system prompts), context configurations, and MCP server definitions that govern agent behaviour and tool access.
- Configure and govern AI coding environments such as Claude Code and Codex CLI: wiring MCP servers, defining tool permissions, building slash commands, and setting the engineering standards the team operates by.
- Evaluate and document architectural trade-offs across reliability, latency, cost, and maintainability.
Pipeline Development and Integration
- Build production-grade AI pipelines in Python: orchestration logic, tool definitions, structured prompting, context assembly, output schema validation, and retry and escalation strategies.
- Integrate AI systems with external tooling — version control, build pipelines, platform SDKs, compliance databases, internal APIs — with well-defined permission boundaries and failure handling.
- Design context assembly: how domain knowledge, runtime state, retrieved documents, and tool outputs compose into the precise input each pipeline stage needs.
- Build and operate multi-agent systems: orchestrator-worker patterns, agent memory management, structured handoffs, and conflict resolution between concurrent agents.
Prompt Engineering
- Design, write, and version all system prompts and agent instructions as first-class engineering artefacts: role definition, behavioural constraints, output format contracts, tool usage rules, and fallback handling — with chain-of-thought, few-shot, and structured output techniques applied deliberately per task.
- Own output schema design and prompt regression testing: define format contracts downstream stages depend on, validate against them, and maintain a ground-truth eval set that gates every instruction or model change before deployment.
Context Engineering, Knowledge, and Domain Collaboration
- Engineer context windows with precision: determine what enters each pipeline stage, in what order, and at what granularity — balancing model comprehension and task accuracy against token cost and latency through compression, summarisation, and selective retrieval where needed.
- Work directly with domain experts to extract specialist knowledge and translate it into agent behaviour: decision logic, edge case handling, quality criteria, and failure modes.
- Partner with the RAG Engineer to define retrieval requirements — what knowledge is needed, under what conditions, at what granularity — ensuring retrieved context is precise and actionable.
- Build and maintain structured runtime knowledge assets: curated document corpora, rule sets, decision trees, and validation reference libraries.
Evaluation, Quality, and Reliability
- Build and own the evaluation framework: test suites, regression benchmarks, LLM-as-judge pipelines, per-stage quality metrics, and end-to-end pass rates against expert-validated ground truth.
- Implement production monitoring across all pipelines: latency, token consumption, per-stage success rates, failure distribution, and output quality drift, using tooling such as LangFuse, Arize, or equivalent.
- Run structured failure analysis: diagnose root causes in context assembly, orchestration logic, knowledge coverage, or tool integration — and implement targeted fixes.
- Define and track automation rate as a first-class metric: workflow instances completing end-to-end without human intervention and the verified quality of those outputs.
- Measure and report on business effectiveness of deployed systems: whether the solution is achieving its intended objective, where gaps remain, and what changes would close them.
Governance, Safety, and Technical Leadership
- Implement full audit trails: inputs received, tools called, outputs produced, and human review triggers — making system behaviour traceable and defensible.
- Enforce versioning of all agent instructions, context configurations, and system prompts as engineered artefacts subject to regression testing and controlled rollout.
- Set the technical standard for AI system development within the organisation — architecture patterns, evaluation practices, framework conventions, and quality gates — documented so what you build is maintainable by others.
- Collaborate with engineering, domain, and product teams throughout implementation: answering technical questions, unblocking dependencies, and ensuring the system is correctly adopted by everyone depending on it.
- Engage engineering leadership on roadmap priorities, scope trade-offs, and technical risk with clear technical perspective.
Requirements & Qualifications
- Proven track record of building production AI automation systems from scratch — independently scoping, architecting, and delivering end-to-end from framework design through deployment and ongoing improvement.
- Hands-on expertise with AI coding agents — Claude Code, Codex CLI, Cursor, or equivalent — including MCP server configuration, tool permission design, agent instruction authoring, and CLI-driven workflow automation.
- Experience designing, building, and deploying MCP servers and custom tools from scratch: tool schema design, authentication, permission boundaries, and governing what agents can access and act on.
- Experience creating internal tooling and automation that measurably improved engineering team efficiency — reducing manual processes, accelerating workflows, and enabling teams to operate at higher leverage.
- Experience defining and tracking productivity and efficiency metrics for AI systems: automation rate, time-to-completion reduction, human intervention rate, and business outcome measurement.
- Experience working with data scientists and domain experts to implement AI solutions that measurably improved team productivity and workflow efficiency.
- Deep prompt engineering and context engineering practice: system prompt authoring, few-shot design, chain-of-thought and structured output techniques, token budget management, context window strategy, and prompt versioning with regression testing.
- Proficiency with LLM orchestration frameworks — LangChain, LangGraph, LlamaIndex, AutoGen, or equivalent — with the judgment to know when to use them and when to build leaner.
- Experience building internal CLI frameworks, agent scaffolding, and reusable libraries that others build on.
- Production Python engineering: modular, testable, well-logged code with proper error handling.
- Experience integrating AI systems with external tools and APIs: tool definition design, permission management, rate limit and failure handling.
- Experience building AI evaluation frameworks: test suites, regression benchmarks, LLM-as-judge pipelines, and production quality monitoring.
- Cloud platform experience (AWS, Azure, or GCP): deploying and monitoring AI workloads with containerisation and orchestration tooling.
Preferred / Nice-to-Have
- Experience in the gaming industry: game development pipelines, engine architectures such as Unity or Unreal, platform certification processes, or cross-platform porting workflows.
- Familiarity with game engine scripting, asset pipelines, or platform-specific SDK integration (Xbox GDK, PlayStation SDK, or similar).
To Apply
Please submit your resume and a brief cover letter outlining your experience and interest in the role. If available, you are also welcome to include a link to your portfolio.
EEO Statement
Atari is an equal opportunity employer and we are committed to providing a workplace free from harassment and discrimination. We are committed to equal employment regardless of race, religion or lack thereof, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, medical condition, veteran status, ancestry, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.
Create a Job Alert
Interested in building your career at Atari, Inc.? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field