Azure Site Reliability Engineer
Pavilion Payments enables the world’s gaming entertainment leaders to create amazing consumer experiences and maximize spend across all of their physical and digital properties. Our complete suite of payment solutions enables safe, secure, and trusted cash access at the cage, on the casino floor, or online. Our compliance and security solutions offer additional layers of automation and risk protection. And our analytics solutions enable clients to view performance across all of their gaming properties.
About the Role
As Pavilion Pay’s inaugural Azure Site Reliability Engineer (SRE), you will play a foundational role in building a resilient infrastructure and ensuring high availability across our systems. This position, part of IT Operations, will work closely with Network, Cloud Infrastructure, DevOps, and Cloud Architects to implement best practices in system reliability, observability, and automated response. This role emphasizes reliability, platform management, and network security.
Key Responsibilities:
Reliability and Incident Management
- Establish and track reliability metrics such as Latency, Traffic, Errors, and Capacity, focusing on uptime across applications and products, with plans to expand monitoring to kiosk and edge networks.
- Develop and refine monitoring systems using Grafana to ensure comprehensive visibility, focusing on continuous improvements in reliability.
- Establish robust processes for incident response and root cause analysis, leveraging OpsGenie to ensure timely and structured responses.
- Work with TailScale, SUSE, F5 and Palo Alto Firewall to support secure, resilient network connectivity and load balancing.
Platform Management and Service Objectives
- Collaborate with IT leadership to define and maintain service level objectives (SLOs) and monitor performance against these standards.
- Structure and optimize platform management with a focus on supporting uptime in our production environment.
Automation, IaC, and CI/CD Pipelines
- Develop and maintain Terraform configurations for scalable, repeatable infrastructure deployment, focusing on minimizing manual tasks and ensuring resource consistency.
- Work with DevOps to optimize CI/CD workflows using Azure DevOps, focusing on pipeline automation and deployment efficiency.
- Automate repetitive tasks and enhance deployment processes within AKS and Azure environments, aiming to reduce potential deployment bottlenecks.
Network and Security Collaboration
- Partner with network engineers to optimize and maintain F5 load balancers and Palo Alto Networks/Panorama for secure, resilient network operations.
- Collaborate with security teams to ensure network traffic and access patterns align with security best practices, integrating observability into network operations.
Requirements (Must have)
- Technical Skills : Proficiency with SUSE, AKS, Linux, Azure Cloud, Grafana, Rancher, Terraform, Azure DevOps pipelines.
- Monitoring Tools: Strong experience with Grafana for observability and OpsGenie for incident response, with a focus on maintaining uptime and proactive alerts.
- Automation and Scripting: Proficiency in scripting (e.g., Bash, Python) and experience with TailScale for secure networking solutions.
- Problem-Solving Mindset: Experience in identifying and remediating performance and security issues, focusing on proactive, long-term solutions.
First 90 Days:
- Understand the Product: Deepen familiarity with Pavilion Pay's products and their interdependencies.
- Microsoft Azure (Cloud & Networking) – Acts as a backup for the Cloud Engineer, managing Azure resources and infrastructure monitoring.
- Automation Deployment: Design an automation validation process with testing Terraform & Infrastructure as Code (IaC) for task’s such as back up’s, patching, uptime.
- Platform Familiarization: Gain familiarity with platform elements, especially Terraform, CI/CD, and SLO definitions, to support a highly reliable production environment.
Your skills and our values
- Working with Others - Works with other department employees to accomplish individual andcompany service levels and goals. Display enthusiasm and promote a friendly group working environment.
- Embracing Change - Remains open-minded and reacts quickly to new information.
- Adapts to varying department changes
- Attention to Detail – Stays alert in a high-paced environment. Follows detailed procedures andensures accuracy in documentation and data.
- Decision-Making and Problem-Solving – Recognizes possible system issues that may affect payment entry processing and alerts Supervisor.
*****Applicants must be US Citizen and authorized to work in the USA, now and in the future.****
Pavilion Payments provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex (including pregnancy), national origin, ancestry, age, marital status, sexual orientation, gender identity or expression, disability, veteran status, genetic information or any other basis protected by law. Those applicants requiring reasonable accommodation to the application and/or interview process should notify a representative of the Human Resources Department
Apply for this job
*
indicates a required field