Job Application for Data Infrastructure Engineer at RadixArk

About the Role

RadixArk is seeking a Data Engineer / Backend Engineer to build SGLang’s built-in usage statistics and data intelligence system from 0 to 1. This role is critical to helping us understand real-world usage, guide product decisions, monitor competitors, and enable data-driven strategy across the company.

You will own the full lifecycle of data systems—from instrumentation and pipelines to monitoring, reporting, and early predictive modeling. This role requires strong execution, clear data judgment, and a hacker mindset suited for fast-moving, resource-constrained environments.

Requirements

Must-Have

2–3 years of experience in data engineering or backend development
Proven 0-to-1 experience building data systems (not just maintaining existing ones)
Strong understanding of what data matters and why for product and business decisions
Proficiency in Python and SQL
Experience building data pipelines and APIs
Familiarity with data privacy and compliance requirements (e.g. GDPR, opt-in telemetry, anonymization)

Nice-to-Have

Startup experience (Series A–C), delivering quickly with limited resources
Experience at early-stage startups (Series A–B)
Open-source contributor
Experience with AI / ML infrastructure
Built systems that scaled 10× or more
Prior experience implementing telemetry or analytics systems

Responsibilities

Built-in Usage Statistics (First Weeks)

Add opt-in, anonymous usage statistics to SGLang
Design data collection schemas covering:
- GPU count
- Token throughput
- Model types
- Hardware configuration
Implement strong privacy protection mechanisms, including data anonymization and differential privacy
Achieve 50%+ deployment coverage

Tech Stack: Python, PostgreSQL / TimescaleDB, Redis

Data Pipeline & Infrastructure (Weeks 2–3)

Collect usage data from distributed SGLang deployments into a centralized database
Implement real-time data aggregation (hourly / daily / weekly)
Build data quality monitoring and alerting
Implement data export APIs for internal and external use

Tech Stack: Airflow or Prefect, PostgreSQL, FastAPI, Grafana

Competitor & Ecosystem Monitoring (Week 4)

Automatically collect and track data from:
- GitHub APIs
- PyPI download statistics
Monitor relevant news sources and social media mentions
Build storage, scheduling, and update mechanisms for competitive intelligence

Optimization & Expansion (Months 2–3)

Improve usage coverage to 75%+
Build a data quality scoring system
Implement automated reporting
Begin developing predictive models based on usage and ecosystem signals

About RadixArk

RadixArk is an infrastructure-first company built by engineers who have shipped production AI systems , created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles, our large-scale reinforcement learning framework.

We are on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training. RadixArk is backed by leading infrastructure investors and partners with Google, AWS, and frontier AI labs.

Compensation

We offer competitive compensation, including equity, comprehensive health benefits, and flexible work arrangements. Compensation is determined by location, level, and experience.

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Data Infrastructure Engineer