Associate Architect
Job Description: Associate Architect - SRE
Company Overview: Myntra’s Engineering team builds the technology platform that
empowers our customers’ shopping experience and enables the smooth flow of
products from suppliers to our customers’ doorsteps. We work on areas such as
building massive-scale web applications, engaging user interfaces, big-data analytics,
mobile apps, workflow systems, inventory management, etc. We are a small technology
team where each individual has a huge impact. You will have the opportunity to be part
of a rapidly growing organization and gain exposure to all the parts of a comprehensive
e-commerce platform.
About The Team: The Cloud Platform Engineering (CPE) group is responsible for
developing and managing platforms that allow Myntra’s tech products to be deployed
and run at scale. The CPE team builds and maintains centralized and high-scale
platforms for sophisticated application security frameworks, log collection, monitoring
systems, access management, secret management, database access, change
management systems, build, release and deployment. You will be part of the SRE team
under the CPE division.
Position: Associate Architect - Site Reliability Engineering (SRE)
Location: Bengaluru
Employment Type: Full-time
Role Overview:
We are seeking an Associate Architect - SRE to drive the reliability, scalability, and
observability of our e-commerce platform. This role will focus on designing and
implementing robust monitoring and logging platforms, automation using Python or
Golang, and ensuring high availability of services on Kubernetes and cloud
environments. The ideal candidate will work closely with engineering teams to create
scalable and automated solutions that enhance performance and system stability.
Responsibilities: Hosting infrastructure and setting up the core platform form the
backbone of any system. As part of this team, you will be responsible for
● Architect and implement a scalable monitoring and logging platform stack to
enhance observability across microservices.
● Design, develop, and maintain automation scripts and tools using Python or
Golang to streamline infrastructure operations.
● Optimize cloud infrastructure (AWS, GCP, or Azure) and Kubernetes-based
deployments for high availability and resilience.
● Collaborate with development and operations teams to define best practices for
reliability, scalability, and incident management.
● Enhance security, compliance, and fault tolerance in the cloud and Kubernetes
ecosystem.
● Analyze system metrics and logs to proactively identify performance bottlenecks
and prevent failures.
● Lead post-mortem analysis and drive root cause identification for production
incidents.
● Capacity planning and workload optimization, cost monitoring by adopting best
practices.
● Good understanding of protocols like gRPC, layer 7 security etc.
Requirements:
● Strong experience in designing and implementing monitoring/logging platforms
(e.g., Prometheus, Grafana, ELK, Loki, OpenTelemetry, Datadog, NewRelic etc.).
● Hands-on experience in automation and scripting using Python or Golang.
● Proficiency in managing and scaling Kubernetes (K8s) clusters in production.
● Experience with cloud platforms such as AWS, GCP, or Azure, and understanding
of their networking, security, and scalability aspects.
● Knowledge of Infrastructure as Code (IaC) tools like Terraform, Helm, or Ansible.
● Experience in CI/CD tooling (Jenkins, GitHub Actions, ArgoCD, etc.).
● Experience with service mesh technologies like Istio or Linkerd.
● Understanding of SRE principles such as SLIs, SLOs, and error budgets.
● Familiarity with incident response and post-mortem best practices.
● Strong problem-solving skills
Apply for this job
*
indicates a required field