Back to jobs
New

Senior Site Reliability Engineer (Crypto Exchange)

Hong Kong

We are working with a decentralised exchange which looks to innovate on providing the best of CEXs and DEXs, focusing on building a safe, simple and scalable platform for trading. They differentiate themselves by offering institutional level systems and support whilst remaining on-chain and decentralised.

Seeking a Senior Site Reliability Engineer to join our team in ensuring the stability, scalability, and performance of a cutting-edge platform. You will balance production reliability with engineering-driven automation, reducing manual processes through innovative tooling and process improvements. This role requires a strong commitment to on-call ownership and a passion for building resilient, observable, and self-healing infrastructure.

Key Responsibilities

  • Design, implement, and maintain scalable infrastructure for a high-performance, low-latency trading platform.
  • Operate and enhance Kubernetes and Nomad-based environments to ensure system stability, scalability, and security.
  • Develop infrastructure automation and deployment pipelines using Terraform, Ansible, ArgoCD, and GitHub Actions.
  • Collaborate with engineering teams to streamline service onboarding, automate repetitive tasks, and improve deployment efficiency.
  • Enhance observability and reliability through improved logging, metrics, tracing, and alerting using the Grafana ecosystem.
  • Perform root cause analysis and postmortems for production incidents, driving continuous improvements in system resilience and incident response.
  • Work with security and compliance teams to ensure infrastructure meets regulatory and organizational standards.
  • Support multi-environment deployments (dev, staging, testnet, mainnet) with a focus on safe rollouts, rollbacks, and configuration management.
  • Contribute to capacity planning, cost optimization, and infrastructure scaling strategies to support platform growth.

Experience & Skills Requirements

  • 5+ years of relevant experience as DevOps/ SRE Engineers.
  • Proven ability to participate in an on-call rotation, demonstrating ownership in incident response and a focus on long-term system stability.
  • Extensive experience operating and maintaining low-latency, distributed systems in production environments.
  • Proficiency with cloud-native platforms and container orchestration tools, including AWS, GCP, Kubernetes, and Nomad.
  • Strong knowledge of Linux/Unix internals and the TCP/IP networking stack.
  • Proficiency in one or more of: Bash, Go, or Python.
  • Expertise in root cause analysis, performance tuning, and system-level debugging in complex service architectures.
  • Experience building and managing end-to-end infrastructure, including infrastructure as code, CI/CD pipelines, and monitoring systems.
  • Familiarity with modern GitOps workflows and tools such as GitHub Actions, ArgoCD, Argo Workflows, and Argo Events.
  • Ability to own production systems end-to-end, from infrastructure as code to automated monitoring and deployment workflows.
  • Pragmatic approach with a focus on depth, ownership, and a bias for action over broad familiarity.
  • Bonus: Experience with the Aeron messaging system is a strong advantage.

Create a Job Alert

Interested in building your career at Hyphen Connect Limited? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Working Location *
Select...
Web3 Vertical Experience *

N/A if none.