Senior Site Reliability Engineer
Visier is the leader in people analytics and we believe in a 'people-first' approach to business strategy. Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, we are looking for talented individuals who can help both Visier and our customers grow, evolve and win!
Visier’s Shared Services SRE team is responsible for operating the cloud infrastructure underlying our technology platform and for working with the development teams to effectively use these technologies in production environments. We are also responsible for our AWS integration, API gateway, Cassandra, Kafka, Vault, and Consul implementations, data science workbench, and the network infrastructure and security that tie everything together.
Our job is to provide the infrastructure for our analytic platform and services to scale.
What you'll be doing...
- Deploying and maintaining highly available services in AWS using Terraform, Cloudformation, and Jenkins
- Debugging production issues at any level, from the hardware layers and the OS kernels all the way up to working hand in hand with the developers to improve our application behaviour.
- Working with the Kong API gateway to provide secure & reliable API access to our customers and partners
- Writing secure code to safeguard Visier and our customers' data, including developing our application security infrastructure
- Optimizing our diagnostics infrastructure components like Splunk, Cloudwatch, and Prometheus
- Supporting large clusters of 3rd party systems like Cassandra, Postgres, and Kafka
- Preparing for and simulating disasters of all sorts. We’re mission-critical for our customers and need to stay up, no matter what
- Work closely with other development teams to design the infrastructure to support application features.
What you'll bring to the table...
- Extensive practical experience in networking, network security, firewalls, routing, DNS, and advanced Linux operating system technologies and security.
- Hands-on proficiency with AWS services, including EC2, S3, RDS,IAM, Lambda, and VPC
- Strong knowledge of deployment and configuration management tools
- Skilled in developing and maintaining Terraform code and modules for robust Infrastructure as Code (IaC) implementations.
- Experience in deep troubleshooting and root cause analysis methodologies, demonstrating the ability to effectively diagnose and resolve complex issues to ensure system reliability and performance.
- Experience in system security patching, ensuring timely updates across diverse environments to maintain system integrity and resilience
- Strong experience with container technologies such as Kubernetes or ECS
- Strong experience with infrastructure as code tools and languages such as Terraform
- Coding skills in Java, Scala, Python, or Groovy
Most importantly, you share our values...
- You roll up your sleeves
- You make it easy
- You are proud
- You never stop learning
- You play to win
Apply for this job
*
indicates a required field