
Site Reliability Engineer (SRE)
Mattermost is seeking a highly skilled Site Reliability Engineer (SRE) to help design, operate, and improve the infrastructure powering our secure, mission-critical collaboration platform. As part of our globally distributed Engineering team, you will focus on reliability, scalability, performance, and automation across cloud and hybrid environments.
You will play a key role in ensuring our systems are observable, resilient, and efficient, working closely with development, security, and operations teams to deliver exceptional uptime and performance to our customers in defense, government, and critical infrastructure sectors.
Responsibilities Include:
- Build, maintain, and optimize containerized workloads for production environments
- Implement infrastructure-as-code for repeatable and reliable deployments
- Implement and maintain compliant cloud environments to meet regulatory and security requirements for customers in highly regulated domains (e.g., FedRAMP, DoD).
- Establish and maintain observability solutions for monitoring, alerting, and performance tuning
- Perform incident response for production systems, including root cause analysis and remediation
- Drive automation to reduce manual operations and improve system reliability
- Collaborate across teams to design scalable, secure, and compliant architectures
- Participate in an on-call rotation for production systems
Requirements:
- BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles.
- Strong background in container orchestration platforms, ideally Kubernetes
- Proven experience with infrastructure-as-code tooling, ideally Terraform
- Proven experience with cloud service providers, ideally AWS
- Experience designing and maintaining monitoring and alerting solutions
- Strong skills in troubleshooting and performance tuning for distributed systems
- Proficiency in at least one scripting or programming language for automation
- Excellent communication skills and ability to work in distributed teams
- For candidates residing in the U.S.: This role may require the ability to obtain and maintain a U.S. government security clearance in the future. As such, U.S. applicants must be U.S. citizens and eligible under applicable clearance requirements.
- Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR.
Preferences:
- Familiarity with observability stacks such as Grafana and Prometheus
- Knowledge of high-availability, disaster recovery, and scaling strategies
- Experience in highly regulated industries such as defense, finance, or critical infrastructure
- Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards.
- Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support.
- Open-source contributions related to reliability or infrastructure tooling
- Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect)
Mattermost takes a market-based approach to pay and pay may vary depending on your location. The successful candidate’s starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.
Salary Range
$150,000 - $190,000 USD
Create a Job Alert
Interested in building your career at Mattermost? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field