Back to jobs
Site Reliability Engineer
Bangalore
Key Responsibilities:
- Design, build, and operate cloud infrastructure
- Plan and drive cost optimization and efficiency improvements for cloud resources
- Develop and implement automation tools to enhance operational efficiency
- Define and measure Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Set up system monitoring, configure alerts, and handle incidents with continuous improvement
- Respond to outages and conduct postmortem analyses
- Perform capacity planning and performance tuning
- Build and improve CI/CD pipelines
- Manage identities and access for development tools
Required Skills & Experience:
- Hands-on experience designing and operating cloud infrastructure on AWS
- Practical experience in Linux/Unix system administration
- Experience with Infrastructure as Code tools (e.g., Terraform, Ansible)
- Proficiency in container technologies (e.g., Docker, Kubernetes)
- Experience setting up and maintaining monitoring systems
- Experience building and operating CI/CD pipelines
- Basic understanding of networking
Preferred Skills & Experience:
- Hands-on experience designing and operating cloud infrastructure on Google Cloud or Azure
- Experience operating microservices architecture
- Proven track record in analyzing cloud resource usage and driving cost efficiency
- Programming skills in Python, TypeScript, Shell scripting, or similar
- Experience in database administration
- Knowledge of security best practices
Ideal Candidate Attributes:
-
Self-driven with a strong understanding of objectives and the ability to take initiative
-
Excellent teamwork and communication skills
-
Ability to propose and implement process improvements to enhance team productivity through automation
-
Highly motivated to continuously learn and acquire new technical knowledge
-
Strong comprehension of complex systems
-
Capable of creating and maintaining clear and thorough documentation
Apply for this job
*
indicates a required field