Machine Learning Infrastructure Engineer
At DNOVO Labs, we’re revolutionizing drug delivery through advanced artificial intelligence. We apply cutting-edge machine learning techniques to accelerate the discovery and optimization of innovative delivery systems. Our AI-driven approach enables rapid exploration of vast chemical spaces, allowing us to design and iterate on molecular structures that can transform how drugs and vaccines are transported in the body. This work is tightly coupled with our state-of-the-art wet lab, where we rapidly synthesize and test the most promising candidates, creating a seamless loop between in silico predictions and experimental validation.
Role Overview:
As a Machine Learning Infrastructure Engineer, you will be responsible for both developing advanced ML pipelines and managing the robust infrastructure that powers our algorithms. Your role involves architecting and maintaining scalable, distributed systems, optimizing cloud resources, and ensuring the reliability of our computational environment. You'll work on building efficient data processing pipelines, while also focusing on infrastructure management tasks such as capacity planning, performance tuning, and implementing DevOps practices. You’re energized by balancing pipeline development with infrastructure management will be crucial in empowering our research team to make groundbreaking discoveries in drug delivery and therapeutics.
Key Responsibilities
- Design, implement, and maintain state-of-the-art machine learning pipelines for biological simulations.
- Manage and optimize large-scale distributed computing infrastructure, including cloud resources and on-premises systems
- Implement DevOps practices to improve deployment, monitoring, and maintenance of ML systems
- Optimize inference and training workloads across our computational environment
- Develop tools and frameworks to enhance the efficiency and reproducibility of our ML experiments
- Collaborate with ML researchers to translate their algorithms into production-ready code
- Implement robust monitoring, logging, and alerting systems for our entire ML infrastructure
- Continuously improve the scalability, performance, and cost-effectiveness of our distributed systems
- Stay up-to-date with the latest advancements in ML infrastructure and incorporate new technologies as appropriate.
You may thrive in this role if you:
- Have 3-5+ years building core infrastructure.
- Have experience running inference clusters at scale.
- Have experience operating orchestration systems such as Kubernetes at scale.
- Take pride in building and operating scalable, reliable, secure systems.
- Experience with DevOps.
- Can work well with both business and research oriented team members.
Apply for this job
*
indicates a required field