
Machine Learning Ops & Infrastructure Engineer
About Noble Machines
Noble Machines (formerly Under Control Robotics) builds multipurpose robots to support human workers in the world's toughest jobs—turning dangerous work from a necessity into a choice. Our work demands reliability, robustness, and readiness for the unexpected—on time, every time. We're assembling a mission-driven team focused on delivering real impact in heavy industry, from construction and mining to energy. If you're driven to build rugged, reliable products that solve real-world problems, we'd love to talk.
Position Overview
At noble machines AI, we are pushing the boundaries of machine learning and artificial intelligence. To support our rapid pace of innovation, we are looking for an experienced ML Ops & Infrastructure Engineer to build the foundational systems that power our AI development.
In this role, you will sit at the critical intersection of our Research and Engineering teams. You won’t just be maintaining systems; you will be architecting the high-performance ML infrastructure that enables our researchers to seamlessly transition from data collection and model training to evaluation and production. If you are passionate about scalable compute, elegant data platforms, and robust deployment pipelines, we want you on our team.
Responsibilities
- End-to-End ML Infrastructure: Design, build, and maintain a highly scalable and reliable machine learning infrastructure that accelerates the research and development lifecycle.
- Data Platform & Management: Architect and manage robust data ingestion, collection, and processing pipelines. You will own the data platforms that ensure our models are trained on high-quality, perfectly versioned datasets.
- Training & Evaluation Pipelines: Build and optimize the environments used for distributed model training, hyperparameter tuning, and automated model evaluation.
- Cloud Compute Orchestration: Manage and orchestrate heavy compute workflows seamlessly across AWS and/or Google Cloud Platform (GCP), optimizing for both performance and cost.
- Containerization & Kubernetes: Take full ownership of containerizing ML workloads and orchestrating them via Kubernetes (K8s) to ensure high availability, scalability, and reproducibility.
- Cross-Functional Collaboration: Partner closely with ML Researchers and Software Engineers to understand their bottlenecks, gather requirements, and build tooling that makes their workflows frictionless.
Requirements
- Proven Industry Experience: 3+ years of hands-on industry experience building scalable ML infrastructure, MLOps platforms, or data engineering systems.
- Cloud & Orchestration Mastery: Deep expertise in cloud platforms (AWS or GCP) and modern orchestration tools, specifically Docker and Kubernetes (K8s).
- Software Engineering Fundamentals: Strong programming skills in Python, alongside experience with bash scripting and version control (Git).
- Data & Pipeline Expertise: Hands-on experience building large-scale data management pipelines and using workflow orchestration tools (e.g., Airflow, Argo, Kubeflow, or similar).
- Relevant Domain Background: While explicit robotics experience is not required, we highly value candidates with backgrounds in hardware-interfacing AI, autonomous driving, computer vision, or other high-complexity ML fields.
Nice to Have
- Experience with Infrastructure as Code (IaC) tools like Terraform.
- Familiarity with distributed training frameworks (e.g., PyTorch DDP, Horovod, Ray).
- Experience implementing model observability, monitoring, and data drift detection in production environments.
- A background handling large volumes of unstructured data (video, sensor data, spatial data).
To apply, submit your resume here or email people@noblemachines.ai. To increase your chances of being selected for an interview, we encourage you to include up to TWO examples of your most representative work featuring hardware demonstrations.
Apply for this job
*
indicates a required field
