Role: Senior Software Engineer - InfrastructureTeam: Machine Learning PodLocation: Hayes Valley, San Francisco, CA
Basic Job Details
Job Type: Full TimeWork Model: HybridRemote Days: Monday & FridayOffice Days: Tuesday, Wednesday, Thursday
Job Description
As a Senior Software Engineer within our Machine Learning Team, you will tackle complex challenges in distributed systems and ML operations to enhance our machine learning infrastructure. You’ll build scalable ML infrastructure from the ground up - supporting model deployment, distributed training, real-time inference, and more. You’ll be a key partner to the Data Science team, helping bring value to production quickly and reliably. This role requires a blend of advanced Python programming skills within production environments and expertise in distributed computing.
Responsibilities
Own Core ML Infrastructure:
Build and scale distributed systems for ML training, serving, and inference.
Design and implement real-time ML workflows that power core product features.
Implementation of Distributed Systems:
Build robust distributed systems tailored for efficient ML training and seamless operational deployment.
Feature Engineering Enhancement:
Streamline and manage both online and offline feature stores, optimizing feature engineering processes for greater efficiency.
Real-Time ML Workflow Enhancement:
Improve real-time machine learning workflows to support dynamic decision-making and automate core operational processes.
Platform Level Ownership:
Lead the development of ML Ops systems, including model deployment, monitoring, and experiment tracking.
Architect and manage scalable feature stores for online and offline usage.
AI-Driven Optimization:
Contribute to agentic AI systems for freight matching, ETA prediction, and load scheduling.
Support systems that improve Stop Estimation Accuracy and Cross-Mode Optimization.
Production Ready Engineering:
Write production-grade Python that operates at scale, with reliability and performance top of mind.
Collaborate across engineering and data science to turn models into resilient software systems.
Required Qualifications
Production Python Expertise:
Advanced Python proficiency in large-scale production environments.
Distributed Systems Expertise:
Experience building scalable backend or ML infrastructure using distributed computing techniques.
Strong background in AWS and cloud-native data/compute services.
Machine Learning Operations:
Hands-on experience with distributed training pipelines, model serving, and monitoring.
Deep familiarity with SQL (OLTP & OLAP), feature engineering, and caching patterns.
Preferred Qualifications
5 to 8 years of backend or ML infrastructure experience.
Proven track record building production ML workflows at scale.
Experience in industry logistics, transportation, or freight is a bonus.
The Perks
Competitive Base Salary
Long Term Cash Incentive Plans
Annual Company Bonus
401k with Matching
Hybrid Work Schedule
Comprehensive Health Coverage
Hyper-Stable, publicly traded Enterprise
Employee Stock Purchase Program (15% discount to market value)
Collaborative, Tech-Forward, Cozy Office environment in Hayes Valley
Compensation Range: The annual base salary range for this position is $200,000 - $250,000*
Compensation will vary based on factors including skill level, transferable knowledge, and experience.Note that the above is not the representation of total compensation, which includes our LTI Package as well.In addition to base salary, Baton's full-time employees are eligible for an annual company performance bonuses.