Back to jobs

Data Infrastructure Engineer

Palo Alto

About the Role

RadixArk is seeking a Data Engineer / Backend Engineer to build SGLang’s built-in usage statistics and data intelligence system from 0 to 1. This role is critical to helping us understand real-world usage, guide product decisions, monitor competitors, and enable data-driven strategy across the company.
You will own the full lifecycle of data systems—from instrumentation and pipelines to monitoring, reporting, and early predictive modeling. This role requires strong execution, clear data judgment, and a hacker mindset suited for fast-moving, resource-constrained environments.

Requirements

Must-Have
  • 2–3 years of experience in data engineering or backend development
  • Proven 0-to-1 experience building data systems (not just maintaining existing ones)
  • Strong understanding of what data matters and why for product and business decisions
  • Proficiency in Python and SQL
  • Experience building data pipelines and APIs
  • Familiarity with data privacy and compliance requirements (e.g. GDPR, opt-in telemetry, anonymization)
Nice-to-Have
  • Startup experience (Series A–C), delivering quickly with limited resources
  • Experience at early-stage startups (Series A–B)
  • Open-source contributor
  • Experience with AI / ML infrastructure
  • Built systems that scaled 10× or more
  • Prior experience implementing telemetry or analytics systems

Responsibilities

Built-in Usage Statistics (First Weeks)

  • Add opt-in, anonymous usage statistics to SGLang
  • Design data collection schemas covering:
    • GPU count
    • Token throughput
    • Model types
    • Hardware configuration
  • Implement strong privacy protection mechanisms, including data anonymization and differential privacy
  • Achieve 50%+ deployment coverage
Tech Stack: Python, PostgreSQL / TimescaleDB, Redis
 

Data Pipeline & Infrastructure (Weeks 2–3)

  • Collect usage data from distributed SGLang deployments into a centralized database
  • Implement real-time data aggregation (hourly / daily / weekly)
  • Build data quality monitoring and alerting
  • Implement data export APIs for internal and external use
Tech Stack: Airflow or Prefect, PostgreSQL, FastAPI, Grafana

Competitor & Ecosystem Monitoring (Week 4)

  • Automatically collect and track data from:
    • GitHub APIs

    • PyPI download statistics

  • Monitor relevant news sources and social media mentions
  • Build storage, scheduling, and update mechanisms for competitive intelligence

Optimization & Expansion (Months 2–3)

  • Improve usage coverage to 75%+
  • Build a data quality scoring system
  • Implement automated reporting
  • Begin developing predictive models based on usage and ecosystem signals

About RadixArk

RadixArk is an infrastructure-first company built by engineers who have shipped production AI systems , created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles, our large-scale reinforcement learning framework.
We are on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training. RadixArk is backed by leading infrastructure investors and partners with Google, AWS, and frontier AI labs.

Compensation

We offer competitive compensation, including equity, comprehensive health benefits, and flexible work arrangements. Compensation is determined by location, level, and experience.

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply for this job

*

indicates a required field

Phone
Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf