Job Application for Senior Software Engineer, Site Reliability Engineering(Ad Cloud) at Appier

New

About Appier

Appier is a software-as-a-service (SaaS) company that uses artificial intelligence (AI) to power business decision-making. Founded in 2012 with a vision of democratizing AI, Appier’s mission is turning AI into ROI by making software intelligent. Appier now has 17 offices across APAC, Europe and U.S., and is listed on the Tokyo Stock Exchange (Ticker number: 4180). Visit www.appier.com for more information.

About Ad Cloud Serving Services Team

At Appier's Ad Cloud Team Serving Services Team, we build and maintain the critical systems that power the core of Appier's advertising technology. These systems handle massive, high-concurrency, real-time requests, ensuring the stability, low latency, and high performance of our ad delivery. Our SRE professionals are crucial in guaranteeing these services remain reliable under extreme loads, continuously enhancing product availability and performance through innovation and automation.

About the role

We are looking for a Senior SRE Engineer to join our Serving Services Team. You'll have the opportunity to tackle the unique, large-scale challenges at Appier, leveraging your expertise in coding, system design, and automation to optimize existing systems, build infrastructure, and drive service reliability and development efficiency. If you are passionate about solving complex system challenges and eager to play a pivotal role in a rapidly growing and impactful product line, we invite you to join us.

Responsibilities

Engage in and improve the entire lifecycle of services—from design, development, deployment, operation, and refinement, ensuring the high availability and performance of the Serving Services team's core systems.
Design and implement scalable cloud-native solutions: Actively participate in service architecture design, especially Kubernetes-based containerized deployment and management strategies, ensuring stable operation and efficient scaling of systems.
Drive automation and Infrastructure as Code (IaC): Reduce repetitive work and enhance deployment, monitoring, and operational efficiency by developing automation tools and scripts (e.g., using Python, Shell Script, Terraform, Ansible).
Perform in-depth optimization and troubleshooting: Leverage your deep understanding of large-scale distributed systems (across multi-cloud environments like AWS, Azure) to diagnose and resolve complex performance bottlenecks, network issues (including Service Mesh, Istio, Calico-related), and conduct root cause analysis.
Establish and maintain robust observability: Design, implement, and optimize monitoring (Prometheus, Grafana), logging, and tracing systems to ensure transparency of system health and support rapid problem identification and resolution.
Practice Continuous Integration and Continuous Deployment (CI/CD): Collaborate closely with development teams to optimize deployment pipelines (e.g., using Jenkins, GitHub Actions), promoting strategies like blue/green deployments and canary releases to achieve zero-downtime service upgrades.
Mentor and share knowledge: As a senior member, mentor junior engineers within the team, sharing your expertise and best practices to collectively enhance the team's technical capabilities.
Participate in on-call rotation: Handle critical incidents and conduct blameless postmortems.

About you

[Minimum qualifications]

Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
3-5+ years of software development or SRE-related experience, with at least 2+ years in designing, analyzing, and troubleshooting large-scale distributed systems.
Proficiency in Kubernetes operations, deployment, troubleshooting, and optimization, with practical experience managing production-grade K8s clusters.
Strong command of at least one programming language (e.g., Python, Go) and proficiency in using Shell Script for automation.
Solid understanding of Linux system administration, networking (TCP/IP, HTTP, DNS), and system programming.
Extensive hands-on experience in planning, deploying, and operating services in production.

[Preferred qualifications]

Experience with Service Mesh (e.g., Istio) implementation, operation, or troubleshooting.
Deep experience with major public cloud platforms (e.g., AWS, GCP, Azure), including cloud resource provisioning and optimization.
Familiarity with building, managing, and optimizing CI/CD toolchains (e.g., Jenkins, GitHub Actions).
Experience in database management (MongoDB, Cassandra, MySQL, PostgreSQL), including setup, backup & restore, and system tuning.
Operational experience with large-scale data systems (e.g., data access, collection, processing, and storage).
Basic knowledge of security concepts (e.g., firewall setup, proper security policy design, network attack defense).
Experience in mentoring or leading junior engineers.

#LI-SW1

Senior Software Engineer, Site Reliability Engineering(Ad Cloud)

About Ad Cloud Serving Services Team

Responsibilities

About you

Apply for this job