
SRE - 2 (Big Data)
About PhonePe Limited:
Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.
PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.
Culture:
At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!
Job Overview:
As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will play a critical role in deployment, ensuring the reliability, scalability, and performance of our Cloudera Data Platform (CDP) infrastructure. You will collaborate closely with cross-functional teams to design, implement, and maintain robust systems that support our data-driven initiatives. The ideal candidate will have a deep understanding of Data Platform, strong troubleshooting skills, and a proactive mindset towards automation and optimization.You will play a pivotal role in ensuring the smooth functioning, operation, performance and security of large high density Cloudera-based infrastructure.
Roles and Responsibilities:
- Work on tasks related to implementation of Cloudera Data Platform Cloudera Data Platform on-premises and be a part of planning, installation, configuration, and integration with existing systems.
- Infrastructure Management: Manage and maintain the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability. This includes monitoring system health, and performing routine maintenance tasks.
- Strong troubleshooting skills and operational expertise in areas such as system capacity, bottlenecks, memory, CPU, OS, storage, and networking.
- Creating Runbooks and automating them using scripting tools like Shell scripting, Python etc.
- Working knowledge with any of the configuration management tools like Terraform, Ansible or SALT
- Data Security and Compliance: Implement and enforce security best practices to safeguard data integrity and confidentiality within the Cloudera environment. Ensure compliance with relevant regulations and standards (e.g., GDPR, HIPAA, DPR).
- Performance Optimization: Continuously optimize the Cloudera infrastructure to enhance performance, efficiency, and cost-effectiveness. Identify and resolve bottlenecks, tune configurations, and implement best practices for resource utilization.
- Capacity Planning: Planning and performance tuning of Hadoop clusters, Monitor resource utilization trends and plan for future capacity needs. Proactively identify potential capacity constraints and propose solutions to address them.
- Collaborate effectively with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability.
- Work closely with teams to optimize the overall performance of the PhonePe Hadoop ecosystem.
- Backup and Disaster Recovery: Implement robust backup and disaster recovery strategies to ensure data protection and business continuity. Test and maintain backup and recovery procedures regularly.
- Develop tools and services to enhance debuggability and supportability.
- Patches & Upgrades: Routinely apply recommended patches and perform rolling upgrades of the platform in accordance with the advisory from Cloudera, InfoSec and Compliance.
- Documentation and Knowledge Sharing: Create comprehensive documentation for configurations, processes, and procedures related to the Cloudera Data Platform. Share knowledge and best practices with team members to foster continuous learning and improvement.
- Collaboration and Communication: Collaborate effectively with cross-functional teams including data engineers, developers, and IT operations personnel. Communicate project status, issues, and resolutions clearly and promptly.
Skills Required:
- Bachelor's degree in Computer Science, Engineering, or related field.
- Proficiency in Linux system administration, shell scripting, and networking concepts including IPtables, and IPsec.
- Strong understanding of networking, open-source technologies, and tools.
- 3-5 years of experience in the design, set up, and management of large-scale Hadoop clusters, ensuring high availability, fault tolerance, and performance optimization.
- Strong understanding of distributed computing principles and experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
- Experience with Kerberos and LDAP.
- Strong Knowledge of databases like Mysql,Nosql,Sql server
- Hands-on experience with configuration management tools (e.g., Salt,Ansible, Puppet, Chef).
- Strong scripting skills (e.g., PERL,Python, Bash) for automation and troubleshooting.
- Experience with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
- Knowledge of networking principles and protocols (TCP/IP, UDP, DNS, DHCP, etc.).
- Experience with managing *nix based machines and strong working knowledge of quintessential Unix programs and tools (e.g. Ubuntu, Fedora, Redhat, etc.)
- Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
- Excellent analytical, problem-solving, and troubleshooting skills..
- Proven ability to work well under pressure and manage multiple priorities simultaneously.
Good To Have:
- Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
- Minimum 2 years of experience in managing and administering medium/large hadoop based environments (>100 machines), including Cloudera Data Platform (CDP) experience is highly desirable.
- Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
- Familiarity with containerization and orchestration technologies (e.g. Docker, Kubernetes, OpenShift) is a plus
- Design,develop and maintain Airflow DAGs and tasks to automate BAU processes,ensuring they are robust,scalable and efficient.
PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)
- Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
- Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
- Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
- Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
- Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
- Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy
Our inclusive culture promotes individual expression, creativity, innovation, and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity, ideas and debates, where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender, sexual preference, religion, race, color or disability. If you have a disability or special need that requires assistance or reasonable accommodation, during the application and hiring process, including support for the interview or onboarding process, please fill out this form.
Read more about PhonePe on our blog.
Apply for this job
*
indicates a required field