MLOps Engineer
About Turing
Based in Palo Alto, California, Turing is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems. Turing helps customers in two ways: working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilingualism, STEM and frontier knowledge; and leveraging that expertise to build real-world AI systems that solve mission-critical priorities for Fortune 500 companies and government institutions. Turing has received numerous awards, including Forbes's "One of America's Best Startup Employers," #1 on The Information's annual list of "Most Promising B2B Companies," and Fast Company's annual list of the "World's Most Innovative Companies." Turing's leadership team includes AI technologists from industry giants Meta, Google, Microsoft, Apple, Amazon, Twitter, McKinsey, Bain, Stanford, Caltech, and MIT. For more information on Turing, visit www.turing.com. For information on upcoming Turing AGI Icons events, visit go.turing.com/agi-icons.
About the Role Turing is looking for an MLOps Engineer to join our growing AI research engineering team. Your primary responsibility will be to manage and optimize our Ray clusters on GCP/GKE, which we use for multi-node, multi-GPU fine-tuning, inference, and reinforcement learning with large language models (LLMs). In addition, you’ll help streamline our experimental workflows by maintaining reproducible environments, resolving dependency issues, and automating key parts of the infrastructure. This role is ideal for someone who is excited about working closely with AI researchers and helping scale the infrastructure behind cutting-edge LLM training and experimentation.
Key Responsibilities
● Manage and maintain Ray clusters deployed on GCP/GKE to support distributed LLM training and inference.
● Optimize multi-node, multi-GPU workloads for both fine-tuning and inference pipelines using Ray, Kubernetes, and GCP services.
● Assist the research team with environment debugging, dependency management, and containerization (e.g., CUDA/PyTorch/Flash-Attn stacks).
● Build and maintain reusable infrastructure templates (e.g., Terraform modules, Helm charts) for reproducible research environments.
● Monitor system performance and optimize cluster resource allocation and autoscaling.
● Support CI/CD workflows for experiment tracking and deployment pipelines.
● Collaborate with research engineers to improve the usability, reliability, and scalability of our training infrastructure.
Requirements
● 3+ years experience in DevOps/MLOps roles with a focus on machine learning infrastructure.
● Solid hands-on experience with Ray, Kubernetes (GKE preferred), and multi-GPU orchestration.
● Proficiency with GCP services (Compute Engine, GCS, IAM, VPC, etc.).
● Strong working knowledge of Python and shell scripting.
● Experience managing CUDA-based environments for training and inference with PyTorch.
● Familiarity with containerization (Docker) and environment isolation (Conda, virtualenv).
● Experience with IaC tools (Terraform, Helm).
● Strong troubleshooting skills in distributed environments (networking, storage, job failures, etc.).
Nice to Have
● Experience with LLM training, LoRA fine-tuning, or RLHF pipelines.
● Familiarity with FlashAttention, DeepSpeed, FSDP, or other large-scale model optimization techniques.
● Knowledge of CI/CD tools (GitHub Actions, ArgoCD) and experiment tracking (e.g., MLflow, Weights & Biases).
● Exposure to event-driven compute or serverless functions on GCP.
● Ability to write clean internal tooling (e.g., dashboards, CLI utilities).
Advantages of joining Turing:
- Amazing work culture (Super collaborative & supportive work environment; 5 days a week)
- Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
- Competitive compensation
- Flexible working hours
- Full-time remote opportunity
Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. Turing is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, or any other legally protected characteristics. At Turing we are dedicated to building a diverse, inclusive and authentic workplace and celebrate authenticity, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.
For applicants from the European Union, please review Turing's GDPR notice here.
Apply for this job
*
indicates a required field