Back to jobs

Senior Scalability Engineer - Observability

Remote

About Judi Health

Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans, including:

  • Capital Rx, a public benefit corporation delivering full-service pharmacy benefit management (PBM) solutions to self-insured employers,
  • Judi Health™, which offers full-service health benefit management solutions to employers, TPAs, and health plans, and
  • Judi®, the industry’s leading proprietary Enterprise Health Platform (EHP), which consolidates all claim administration-related workflows in one scalable, secure platform.

Together with our clients, we’re rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve. To learn more, visit www.judi.health.

Location: Remote 

Position Summary: 

Our Scalability team as a Senior Scalability Engineer focused on observability platform development and engineering productivity. In this role, you will define, own, and build Judi Health's organization-wide observability strategy, tooling, and platform products. Beyond maintaining infrastructure, you'll architect and develop a custom observability platform that gives engineering teams powerful, fast, and cost-effective visibility into every layer of our infrastructure—from application logs and metrics to distributed traces. You'll build production-grade internal products using React/TypeScript frontends with Python and Rust backends, creating tools that fundamentally improve how engineers at Judi Health debug, monitor, and optimize their systems. Working closely with leadership and cross-functional teams, your work will be foundational to platform stability, performance optimization, and developer productivity across our rapidly growing healthcare platform. 

Position Responsibilities: 

In this role, you'll own the observability infrastructure that powers our engineering organization. You will:  

  • Architect observability platform: Design, implement, and maintain the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus) as the primary observability platform across all engineering teams, making architectural decisions that balance cost, performance, and developer experience.
  • Build internal observability products: Design and develop production-grade internal platform products with React/TypeScript frontends and Python/Rust backends that provide engineers with powerful log search, metrics visualization, and trace analysis capabilities.
  • Develop custom log indexing systems: Architect and build high-performance log indexing solutions using Rust that process logs and provide sub-second search across billions of log lines at a fraction of the cost.
  • Integrate SQL analytics for logs: Design and implement solutions leveraging AWS Athena or similar SQL query engines (DuckDB, ClickHouse) for ad-hoc log analysis and historical queries, enabling engineers to run complex SQL queries over S3-based log data for deep investigations and trend analysis. 
  • Create advanced query interfaces: Build sophisticated web interfaces that allow engineers to query logs, metrics, and traces with features like saved queries, query templates, correlation analysis, and pattern detection, supporting both full-text search and SQL-based analytics. 
  • Balance cloud-native and open-source: Architect solutions that thoughtfully leverage both AWS-managed services (CloudWatch, Athena, Kinesis) and open-source tooling (LGTM stack, Quickwit) to optimize for cost, performance, and operational flexibility based on use case requirements. 
  • Integrate AWS observability: Design seamless integration between AWS CloudWatch Logs/Metrics and our custom observability platform, providing unified visibility across managed and self-hosted infrastructure. 
  • Build intelligent alerting: Develop smart dashboards, monitors, and alerting systems that reduce noise, detect anomalies, and help teams respond to incidents quickly. 
  • Partner with engineering teams: Work directly with product teams to integrate observability into their services, establish logging and metrics standards, and instrument code effectively, serving as the observability subject matter expert. 
  • Enable performance optimization: Provide the observability foundation that allows the Scalability team to identify performance bottlenecks, track optimization impact, and measure platform stability with data-driven insights. 
  • Establish observability standards: Define and document comprehensive observability standards including structured logging patterns, metric naming conventions, trace instrumentation, dashboard design principles, and query best practices. 
  • Drive platform adoption: Lead workshops, create documentation, and build self-service tooling that democratizes observability across engineering, making it easy for teams to adopt best practices. 
  • Demonstrate technical leadership: Mentor engineers on observability practices, lead architecture reviews for instrumentation approaches, and represent the Scalability team in cross-functional planning. 
  • Work in an Agile/Scrum environment to continually deliver value to stakeholders and clients. 
  • Code of Conduct: Responsible for adherence to the Capital Rx Code of Conduct including reporting of noncompliance. 

Required Qualifications: 

  • 10+ years of software engineering or infrastructure engineering experience with demonstrated progression into technical leadership roles. 
  • Several years of experience leading technical initiatives, building platform products, or serving as a subject matter expert on observability infrastructure. 
  • Strong experience with React/TypeScript for frontend development and Python (Flask/SQLAlchemy) for backend services. 
  • LGTM stack expertise: Deep production experience with Loki, Grafana, Tempo, and Prometheus/Mimir for logs, metrics, and distributed tracing at scale. 
  • AWS observability: Extensive experience with AWS CloudWatch Logs and Metrics, including custom metrics, log insights, dashboard creation, and integration patterns. 
  • SQL analytics for logs: Production experience with SQL-based log analytics using AWS Athena, DuckDB, or similar query engines for analyzing structured and semi-structured data at scale. 
  • Cloud-native and open-source balance: Demonstrated ability to architect solutions leveraging both managed cloud services and open-source tooling, understanding trade-offs between operational overhead, cost, flexibility, and vendor lock-in. 
  • Search and indexing experience: Hands-on experience building or operating search systems using OpenSearch, Elasticsearch, Lucene, Tantivy, or similar search and analytics engines. 
  • Performance-critical systems: Experience building high-performance systems that process large volumes of data efficiently (millions of log lines, high-cardinality metrics). 
  • Systems thinking: Deep understanding of distributed systems, microservices architectures, and the complex observability challenges they present. 
  • Data at scale: Proven track record handling high-volume structured and unstructured logging data, identifying patterns, and building efficient search/query solutions that perform well under load. 
  • Product mindset: Ability to build internal platform products that engineers love to use, with attention to UX, performance, and reliability. 

Preferred Qualifications: 

  • Rust development experience: Production experience with Rust for building high-performance data processing, indexing, or search systems. Strong interest in learning Rust is acceptable if combined with systems programming experience in C/C++/Go. 
  • Infrastructure as code: Experience with Terraform for managing observability infrastructure and AWS resources. 
  • Additional observability platforms: Experience architecting or operating Datadog, New Relic, Splunk, or other enterprise observability platforms. 
  • Advanced query languages: Deep expertise with PromQL, LogQL, SQL optimization, and query optimization for high-cardinality data. 
  • Columnar storage formats: Experience with Parquet, ORC, or other columnar storage formats for efficient log storage and analytics on S3. 
  • Incident management: Experience designing incident response workflows, postmortem processes, and SLO/SLI frameworks that drive reliability improvements. 
  • Cost optimization: Track record of reducing observability costs while maintaining or improving capabilities (e.g., CloudWatch → S3/custom indexing migration). 
  • Data pipelines: Experience with streaming data pipelines, ETL processes, or real-time data processing. 
  • Distributed tracing: Deep knowledge of OpenTelemetry, Jaeger, Zipkin, or distributed tracing architectures. 
  • Git expertise and experience working in a mono repository. 
  • Previous Pharmacy Benefits Manager (PBM) or healthcare technology experience. 
  • Experience building developer tools or internal platforms that improve engineering productivity. 

This range represents the low and high end of the anticipated base salary range for the NY - based position. The actual base salary will depend on several factors such as: experience, knowledge, and skills, and if the location of the job changes. 

Nothing in this position description restricts management’s right to assign or reassign duties and responsibilities to this job at any time. 

Salary Range

$160,000 - $220,000 USD

All employees are responsible for adherence to the Capital Rx Code of Conduct including the reporting of non-compliance. This position description is designed to be flexible, allowing management the opportunity to assign or reassign duties and responsibilities as needed to best meet organizational goals.

Judi Health values a diverse workplace and celebrates the diversity that each employee brings to the table. We are proud to provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, medical condition, genetic information, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. 

By submitting an application, you agree to the retention of your personal data for consideration for a future position at Judi Health. More details about Judi Health's privacy practices can be found at https://www.judi.health/legal/privacy-policy.

Create a Job Alert

Interested in building your career at Judi Health? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...

U.S. Standard Demographic Questions

We invite applicants to share their demographic background. If you choose to complete this survey, your responses may be used to identify areas of improvement in our hiring process.
Select...
Select...
Select...
Select...
Select...
Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in Judi Health’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.