Back to jobs
New

Site Reliability Engineer

LATAM

Welcome to 10Pearls!  

We believe in harnessing the power of technology for social good through our core values: Innovate, modernize and accelerate.

 

About us

We are 10Pearls, an award-winning digital development company, helping business with product design, development, and technology acceleration. We have a culture of innovation, uniquely designed to help businesses transform, digitalize and scale levering digital technology.

 

The Role

We are seeking a well-rounded SRE/DevOps Engineer who has practical experience in infrastructure operations and DevOps practices. This role is ideal for an experienced jack of all trades who enjoys solving complex technical challenges and implementing reliable, scalable solutions. You will work closely with the Head of SRE to enhance our infrastructure, automate processes and improve observability and disaster recovery strategies.

Responsibilities

  • Develop and maintain scalable test automation frameworks for UI (React/TypeScript), API, and performance testing
  • Build and implement automation using Cypress or Playwright, REST-assured, and k6
  • Collaborate with engineers to integrate automated testing into the development lifecycle and improve test coverage (unit, integration, and end-to-end)
  • Drive Shift-Left Testing by ensuring quality is embedded early in the development process
  • Optimize test execution in AWS environments using Terraform and CloudFormation
  • Implement and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools
  • Mentor junior engineers, conduct code reviews, and advocate for best practices in test automation
  • Ensure compliance with healthcare security and regulatory standards (HIPAA, SOC 2, etc.)

Requirements

    • Enhance disaster recovery and multi-region capabilities to improve system resilience.
    • Improve monitoring, alerting and observability through tools like Grafana, Prometheus, Sentry.
    • Support on-call processes by enhancing alerting strategies and automating responses where possible.
    • Collaborate with development and operations teams to address reliability challenges and enhance performance.
    • Automate routine tasks using scripting languages and Infra as Code tools.
    • Contribute to postmortems and continuous improvement initiatives to enhance system stability.
    • Maintain clear documentation of infra, processes and disaster recovery.

    Skills and Qualifications:

    • Experience: 2-4 years in Site Reliability Engineering, DevOps, or related role.
    • Infrastructure: Experience with AWS (EC2, Lambda, S3, Route 53), Kubernetes, Docker.
    • Automation: Hands on experience with terraform, or other infra as code tools.
    • Monitoring and Observability: Familiarity with monitoring and alerting systems (OpenSearch, Grafana, Prometheus, Sentry).
    • CI/CD: Experience with pipeline tools such as Gitlab CI or similar.
    • Scripting: Proficiency in Python, Bash, or similar scripting languages.
    • Operations: Basic knowledge of networking, DNS, and load balancing.
    • Problem-Solving: Strong troubleshooting skills with the ability to respond to incidents effectively.
    • Collaboration: Ability to work within a team and support cross-functional initiatives.
    • Documentation: Clear and concise technical writing skills.

    Nice to Have:

    • Experience with multi-region AWS setups and disaster recovery planning.
    • Knowledge of Grafana monitoring and dashboards
    • Understanding of compliance frameworks (HIPAA, HITRUST).

 

 

We thank you for applying to this job position, we’re more than thrilled to start reviewing your profile and great skills! This is the first step in our selection process, so you will be hearing back from our awesome recruitment team regarding the next steps 😀 

10Pearls Team

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf