Back to jobs
Senior Engineer(Shenzhen) - DevOps, Infrastructure Operations and AIOps
Shenzhen
Purpose of Position
- The Senior Engineer, DevOps, Infrastructure Operations and AIOps will be responsible for the design, implementation, enhancement, and support of the company’s cloud and infrastructure platforms, deployment pipelines, operational tooling, and service reliability capabilities. The incumbent will work closely with engineering, product, security, and business teams to ensure that technology platforms are stable, scalable, secure, and efficient in supporting business growth.
- The incumbent is expected to bring strong hands-on experience in DevOps, infrastructure operations, automation, and production support, together with practical exposure to AIOps capabilities that improve monitoring, incident detection, alerting, troubleshooting, and operational efficiency in a fast-paced and dynamic environment.
Job Description
- Design, build, maintain, and optimize infrastructure platforms, DevOps pipelines, and operational tooling to support reliable and scalable technology delivery.
- Manage and enhance cloud and on-premise infrastructure environments, including compute, storage, network, access control, backup, disaster recovery, and related platform services.
- Build and improve CI/CD pipelines, release automation, environment provisioning, and infrastructure-as-code capabilities to strengthen delivery speed, consistency, and quality.
- Work closely with application engineering teams to ensure solutions are deployable, observable, secure, and operationally supportable across development, testing, and production environments.
- Support production operations, service monitoring, incident response, root cause analysis, problem remediation, and continuous service improvement.
- Implement and maintain observability capabilities including logging, metrics, tracing, dashboards, and alerting to improve system visibility and operational responsiveness.
- Drive automation initiatives across infrastructure operations, deployment, configuration management, patching, capacity management, and operational workflows.
- Contribute to AIOps use cases including intelligent alert correlation, anomaly detection, predictive monitoring, incident triage support, operational insights, and workflow automation to improve efficiency and reduce manual effort.
- Collaborate with security teams to ensure infrastructure and DevOps practices align with security, compliance, vulnerability management, and access governance requirements.
- Support system resilience through backup, recovery planning, failover readiness, and disaster recovery testing.
- Work with third-party vendors and service providers where required to maintain service quality, issue resolution, and platform stability.
- Document infrastructure standards, runbooks, procedures, and best practices to strengthen team knowledge sharing and operational consistency.
Requirements
- Strong hands-on engineering experience in DevOps, infrastructure operations, site reliability, or platform engineering roles.
- Good understanding of cloud infrastructure, network fundamentals, system administration, environment management, backup and recovery, and operational support practices.
- Solid experience with CI/CD, infrastructure as code, configuration management, scripting, automation, and deployment tooling.
- Experience in production support, incident management, troubleshooting, root cause analysis, and operational improvement in business-critical environments.
- Strong understanding of observability and monitoring practices including logs, metrics, tracing, dashboards, and alerting.
- Practical experience with AIOps, intelligent operations tooling, or AI-enabled monitoring and automation capabilities is preferred.
- Familiarity with applying AI or machine learning concepts in operations use cases such as anomaly detection, event correlation, incident prediction, intelligent alerting, and workflow automation is a plus.
- Good understanding of infrastructure security, access control, patching, vulnerability management, and compliance requirements.
- Experience working with engineering teams in agile and fast-paced delivery environments.
- Strong problem-solving skills with the ability to troubleshoot across infrastructure, platform, deployment, and application support layers.
- Known for promoting automation, operational excellence, reliability, and engineering best practices.
- At least 5-8 years of relevant working experience in DevOps, infrastructure engineering, platform operations, or related technology roles.
- Experience in eCommerce, retail, or high-transaction consumer-facing environments is a plus.
- Experience in multicultural and fast-paced organizations is preferred.
- Strong communication and collaboration skills.
- Proficient in spoken and written Cantonese and English.
Create a Job Alert
Interested in building your career at CASETiFY? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
