Senior MLOps Engineer
Razorpay was founded by Shashank Kumar and Harshil Mathur in 2014. Razorpay is building a new-age digital banking hub (Neobank) for businesses in India with the mission is to enable frictionless banking and payments experiences for businesses of all shapes and sizes. What started as a B2B payments company is processing billions of dollars of payments for lakhs of businesses across India.
The Role:
We are seeking a skilled Senior MLOps Engineer to join our team and drive the scalability and reliability of Razorpay’s machine learning infrastructure. In this role, you will work closely with Data Scientists, Machine Learning Engineers, and other stakeholders to streamline the deployment and maintenance of machine learning models, ensuring robust production-grade solutions.
Key Responsibilities:
-
Collaborate Effectively: Partner with Data Scientists and ML Engineers to understand model requirements and transform them into efficient, scalable production solutions.
-
Enhance ML Infrastructure: Contribute to key projects including our feature store, model-serving platform, and model registry to elevate Razorpay’s machine learning capabilities.
-
Optimize ML Pipelines: Oversee and refine machine learning pipelines, focusing on training, evaluation, and deployment. Enhance processes to boost cost efficiency and reduce runtime, leveraging platforms like DataRobot.
-
Improve Real-time Reliability: Design and implement strategies to reduce latency and increase reliability for real-time feature and model serving.
-
Implement Best Practices: Establish and promote version control standards, CI/CD processes, and automated testing for robust and reliable ML model deployment.
-
Manage Cloud Infrastructure: Oversee cloud resources, manage ML-related data storage and compute instances, and drive improvements across the infrastructure stack.
-
Stay Current with Emerging Technologies: Continuously explore and integrate new tools and advancements in the MLOps ecosystem to refine and improve our systems.
-
Document and Share Knowledge: Maintain comprehensive documentation of processes, architectures, and workflows to foster knowledge sharing and team cohesion.
Skills and Qualifications:
-
Build & Manage Cloud Environments: Experience managing cloud environments, with a strong preference for AWS. Proficiency in infrastructure automation tools like Terraform and configuration management tools like Helm, Puppet, Chef, or Ansible.
-
Proficient in CI/CD & Version Control: Skilled with CI/CD tools and version control systems (e.g., Git) to maintain reliable deployment processes.
-
Containerization & Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies.
-
ML Frameworks & Tools: Knowledgeable in machine learning concepts and frameworks (TensorFlow, PyTorch, Scikit-learn) and MLOps platforms like MLflow, Kubeflow, or DataRobot.
-
Distributed Systems Experience: Ability to build and maintain low-latency, distributed backend APIs.
-
Problem-solving & Collaboration: Strong analytical skills and the ability to work collaboratively in a team-oriented environment.
Mandatory Qualifications:
-
3-5 years of experience in DevOps, with an emphasis on MLOps practices.
-
Strong cloud infrastructure experience, particularly with AWS.
-
Expertise in scripting languages (e.g., Python, Shell, Ruby, Go) and troubleshooting
Linux production environments.
-
Knowledge of network concepts in AWS and infrastructure operations, especially in
regulated environments such as banking.
-
Proficient in monitoring, logging, and database technologies (e.g., MariaDB/MySQL).
Preferred Qualities:
-
Background in a product-focused organization.
-
Demonstrated ability to manage infrastructure at scale.
-
Side projects or contributions to open-source projects (e.g., on GitHub).
-
Experience with backend programming languages.
Apply for this job
*
indicates a required field