Back to jobs
New

Member of Technical Staff — CI Engineer

Palo Alto, CA

About the Role

RadixArk is hiring a Member of Technical Staff — CI Engineer to own the infrastructure that keeps SGLang moving. Our CI system runs 300+ GPU tests across NVIDIA, AMD, Intel, and Ascend hardware pools, gating every commit to one of the fastest-growing open-source LLM inference engines. When CI is green and fast, 100+ contributors ship with confidence. When it isn't, the entire project stalls. That bottleneck is your problem to solve.

You won't just maintain pipelines — you'll architect them. You'll replace brittle static thresholds with regression-based detection, harden runners against supply-chain attacks from fork PRs, and cut cycle times so contributors get feedback in minutes, not hours. You'll work directly with core maintainers, hardware partners, and the open-source community to keep the system that gates every merge request trustworthy, fast, and secure.

This is not a role for someone who wants to write CI YAML and walk away. It's for an engineer who treats CI infrastructure the way we treat serving infrastructure — as a system worth designing well.

 

What You’ll Do

  • Own CI reliability end-to-end — triage failures, distinguish real regressions from flaky tests and infra issues, keep main green
  • Build regression-based CI — replace hardcoded static thresholds with automated baseline comparison (metrics pipeline, durable storage, detection logic)
  • Harden runner infrastructure — ephemeral runners, container isolation, security hardening for fork PR execution
  • Cut CI time — right-size eval suites, deduplicate server startups, separate PR smoke tests from nightly full runs
  • Improve developer experience — faster feedback, clearer failure messages, workflow orchestration

 

Requirements

    • 3+ years operating CI/CD at scale (GitHub Actions, Buildkite, Jenkins, GitLab CI, or similar)
    • Deep Linux, Docker, GPU computing knowledge
    • Self-hosted runner management experience
    • Strong Bash and Python
    • Security mindset — CI supply chain risks, fork PR attack vectors, runner hardening
    • NVIDIA GPU drivers, CUDA, NCCL, InfiniBand/RDMA experience in CI contexts
    • Familiarity with ML inference workloads (model loading, KV cache, quantization)

 

Nice to Have

  • Large open-source project CI experience (100+ contributors)
  • AMD ROCm or Intel XPU CI pipelines

 

What Success Looks Like

  • Day 20 — Full CI landscape understood, daily triage taken over, top recurring flaky tests fixed, PR CI time reduced 30%+
  • Day 40 — Regression-based checks live on nightly CI, ephemeral runner prototype deployed, runner isolation in place
  • Day 60 — Zero flaky tests. Main CI 100% green when no real regression exists

 

About RadixArk

RadixArk is an infrastructure-first company built by engineers who've shipped production AI systems, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework).

We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training.

Our team has optimized kernels serving billions of tokens daily, designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastructure that powers leading AI companies and research labs.

We're backed by well-known infrastructure investors and partner with Nvidia, Google, AWS, and frontier AI labs.

Join us in building infrastructure that gives real leverage back to the AI community.

 

Compensation

We offer competitive base with meaningful equity, comprehensive health benefits, and flexible work arrangements. Compensation is determined by location, level, and experience.

 

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

 

How to Apply

Reach out via Slack or email. CI fix PRs to major open-source projects are worth more than a resume.




Apply for this job

*

indicates a required field

Phone
Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf