Sr Software Engineer - ML Infrastructure
Company Overview:
Let’s be real—AI isn’t magic; it’s a tool, and it’s only as powerful as the systems, workflows, and most importantly the people powering it. Yurts is unlocking all of that unstructured, hard-to-find, disconnected data and turning it into a secure knowledge management platform that’s actually designed to serve people—wherever they work.
Born from a partnership with the Department of Defense to meet the demands of mission-critical environments, Yurts is redefining how teams harness AI for productivity and performance. Named after the resilient, adaptable yurt, our platform is designed to work with the systems you use—not replace them.
In collaboration with Nvidia, HPE, and Oracle we offer secure, innovative solutions, including air-gapped deployments for defense, a full ML ops analytics framework to drive cost efficiencies and a world-class proprietary RAG system serving answers you can trust. Yurts unifies applications, data, and workflows protecting investments and saving customers money.
Join us to shape the future of AI, creating solutions that elevate both technology and the people who use it.
Responsibilities:
- Design, deploy, and maintain robust ML infrastructures using Kubernetes and containerization technologies to enable seamless and scalable deployment of machine learning models.
- Utilize your deep knowledge of CUDA and GPU-accelerated computing to optimize ML inference, delivering high-performance and low-latency models for demanding applications.
- Champion DevOps practices and streamline CI/CD pipelines to enhance the software development lifecycle and increase deployment efficiency.
- Lead efforts to develop and implement model scheduling and autoscaling strategies, dynamically allocating resources based on real-time inference demands to ensure optimal resource utilization.
- Collaborate with cross-functional teams, taking an active role in architectural discussions and hands-on development to drive innovation and push the boundaries of ML infrastructures.
Requirements:
- 5+ years of relevant experience in ML infrastructure development.
- 1+ years of professional development experience with Rust
- Proven track record of extensive experience with Kubernetes and containerization technologies, demonstrating a strong ability to deploy and manage distributed systems at scale.
- Hands-on experience in optimizing ML inference using CUDA and GPU-accelerated computing, achieving significant performance gains for complex ML models.
- Deep understanding of DevOps practices and experience implementing CI/CD pipelines, ensuring a smooth and efficient development and deployment process.
- Demonstrated expertise in model scheduling and autoscaling techniques, allowing dynamic resource allocation to meet varying inference workloads.
- Strong architectural and software development skills, with a passion for crafting elegant and efficient solutions that push the boundaries of ML infrastructure capabilities.
Preferred Qualifications (not mandatory):
- Experience in deploying and managing machine learning models in cloud environments such as AWS, GCP, or Azure.
- Knowledge of machine learning frameworks such as TensorFlow, PyTorch, or ONNX, and their integration with inference engines
Note:
At Yurts, we believe in harnessing the power of ML infrastructure to achieve outstanding performance. If you're interested in exploring the possibilities of machine learning and its potential impact, check out our blog: The bridge to enterprise AI | Yurts Enterprise AI | Blog for fascinating insights. This will give you a glimpse of the exciting world of ML infrastructure and its applications.
Compensation Information
$200,000 - $265,000 USD
Apply for this job
*
indicates a required field