Job Application for ML Platform Engineer at Avride

About the team

The ML Platform team at Avride builds the infrastructure that powers large-scale ML training and data processing for autonomous driving. We sit between Cloud Platform and ML engineers, turning low-level compute, storage, and networking primitives into an ML platform that teams actually use — scalable orchestration, distributed compute, and production-grade tooling for the full model lifecycle.

About the role

As an ML Platform Engineer at Avride, you'll own critical pieces of the ML stack: workflow orchestration, distributed execution, resource governance, performance.You will shape how ML teams across the company run experiments and train models at scale. You will build the abstractions and services that make training workloads reliable, cost-efficient, and fast, helping ML teams run at scale on Kubernetes with strong reliability and excellent developer experience.

What you will do

Build and scale our ML compute platform on Kubernetes, using Argo Workflows for training, evaluation, and data processing orchestration
Design and implement core platform capabilities, including a Ray-based internal SDK for distributed execution, and multi-tenant resource governance — scheduling, priorities, quotas, and policy enforcement across GPU, CPU, memory, and IO
Improve end-to-end training throughput and platform efficiency by optimizing data access patterns, caching, and removing bottlenecks in storage, network, and resource contention
Work directly with ML teams to debug complex workload issues, drive root-cause analysis, and turn recurring problems into platform-level fixes
Evaluate, integrate and extend open-source tooling (Argo Workflows, Ray, Kubernetes ecosystem) to meet evolving platform needs

What you will need

Strong proficiency in Python or Go; C++ is a plus
Track record of designing and building scalable, maintainable systems and services
Experience operating production services end-to-end: APIs, reliability practices, observability
Deep knowledge of Kubernetes: how scheduling, resource management, controllers, and pod lifecycle actually behave under pressure
Solid Linux and systems debugging skills: performance investigation, networking, storage/IO
Ability to troubleshoot complex production issues across logs, metrics, and traces and drive them to resolution

Nice to have

Experience with Argo Workflows, Ray, MLflow, or comparable distributed ML tooling
Hands-on experience building or operating large-scale ML training systems: GPU scheduling, distributed training, training data pipelines
Track record of optimizing resource usage and performance in distributed environments

Candidates are required to be authorized to work in the U.S. The employer is not offering relocation sponsorship, and remote work options are not available.

Avride is an equal opportunity employer and committed to providing reasonable accommodations to qualified applicants and employees with disabilities to ensure they have equal access to employment opportunities. Avride complies with the Americans with Disabilities Act (ADA), if you need a reasonable accommodation to assist with the application or hiring process, or to perform the essential functions of a job, please email jobs@avride.ai.

First Name

Last Name

Preferred First Name

Country

Phone

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

Are you legally authorized to work in the United States?

Select...

Will you now or in the future require employer sponsorship for employment?

Select...

This is not a remote role. Are you able to work on site at our facility in North Austin, Monday-Friday?

Which City and State are you currently based in?

How soon are you available to start?

What are your salary expectations?

Avride complies with the Americans with Disabilities Act (ADA). Can you perform the essential functions of the position for which you are applying, with or without reasonable accommodation?

Select...

If you require an accommodation to participate in the application process or to perform the essential duties of the job, please describe the accommodation requested below. (Note: You are not required to disclose specific medical diagnoses).

ML Platform Engineer

About the team

About the role

What you will do

What you will need

Nice to have

Apply for this job