Principal Site Reliability Engineer, ML Platform
About Zscaler
Serving thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world’s largest security cloud, Zscaler accelerates digital transformation so enterprises can be more agile, efficient, resilient, and secure. The pioneering, AI-powered Zscaler Zero Trust Exchange™ platform, which is found in our SASE and SSE offerings, protects thousands of enterprise customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location.
Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler.
Our Engineering team built the world’s largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your vision and passion to our team of cloud architects, software engineers, security experts, and more who are enabling organizations worldwide to harness speed and agility with a cloud-first strategy.
Processing billions of transactions and generating trillions of data points daily, we believe data is the key to disrupting the cybersecurity market through AI. Reporting to the EVP of AI Innovations, who leads this team directly under our CEO, this is a career-defining opportunity to influence Zscaler’s AI strategy and drive innovation. This position is hybrid, based in our New Jersey office three days a week. Exceptional remote candidates will also be considered. As a Principal Site Reliability Engineer - ML Platform, you will:
- Architect, build, and maintain large-scale distributed systems to support end-to-end AI pipelines, including data collection, feature engineering, model training, evaluation, deployment, and real-time serving
- Act as the owner of Site Reliability Engineering (SRE) for AI-driven applications deployed on AWS, ensuring performance, availability, observability, and scalability
- Collaborate with the engineering team to design and implement CI/CD pipelines, infrastructure provisioning, scripting automation for deployment and customer-facing services, robust monitoring frameworks using tools and techniques for real-time statistics and performance tracking across production systems
- Drive innovation and best practices in integrating Kubernetes, ArgoCD, and similar tools into cloud environments, with a focus on AI/ML pipelines and GPU-based cloud structures (e.g., SkyPilot)
- Serve as the group's FinOps expert and AWS admin, taking ownership of hosting cost optimization and all administrative aspects of the AWS account for ZAIRe
What We're Looking for (Minimum Qualifications):
- 10+ years of experience in Site Reliability Engineering, cloud infrastructure, and/or applications architecture, with a strong foundation in Kubernetes and Docker
- Proven programming expertise in Python, SQL, and distributed processing technologies such as Spark, BigQuery, or Apache Beam
- Hands-on experience building and maintaining CI/CD pipelines, leveraging infrastructure-as-code tools like ArgoCD, Terraform, or similar
- Strong knowledge of cloud platforms (AWS preferred, GCP acceptable), including certification or equivalent skills specific to cloud-native system management
- Bachelor's degree in Computer Science, Engineering, or a related field
What Will Make You Stand Out (Preferred Qualifications):
- Working knowledge of AI/ML pipelines and frameworks (e.g., SkyPilot, mobile ML training) and experience with GPU-optimized cloud infrastructure
- Experience with SQL/NoSQL databases, ML automation platforms, and tools for full production lifecycle of AI-based products
- Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, or related field, with a demonstrated ability to lead projects and innovate quickly in a fast-paced environment
#LI-Hybrid
#LI-KM9
Zscaler’s salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors, including job-related skills, experience, and relevant education or training.
The base salary range listed for this full-time position excludes commission/ bonus/ equity (if applicable) + benefits.
Base Pay Range
$164,500 - $235,000 USD
At Zscaler, we are committed to building a team that reflects the communities we serve and the customers we work with. We foster an inclusive environment that values all backgrounds and perspectives, emphasizing collaboration and belonging. Join us in our mission to make doing business seamless and secure.
Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages, including:
- Various health plans
- Time off plans for vacation and sick time
- Parental leave options
- Retirement options
- Education reimbursement
- In-office perks, and more!
Learn more about Zscaler’s Future of Work strategy, hybrid working model, and benefits here.
By applying for this role, you adhere to applicable laws, regulations, and Zscaler policies, including those related to security and privacy standards and guidelines.
Zscaler is committed to providing equal employment opportunities to all individuals. We strive to create a workplace where employees are treated with respect and have the chance to succeed. All qualified applicants will be considered for employment without regard to race, color, religion, sex (including pregnancy or related medical conditions), age, national origin, sexual orientation, gender identity or expression, genetic information, disability status, protected veteran status, or any other characteristic protected by federal, state, or local laws. See more information by clicking on the Know Your Rights: Workplace Discrimination is Illegal link.
Pay Transparency
Zscaler complies with all applicable federal, state, and local pay transparency rules.
Zscaler is committed to providing reasonable support (called accommodations or adjustments) in our recruiting processes for candidates who are differently abled, have long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support.
Apply for this job
*
indicates a required field