Back to jobs
New

Senior/Principal AI Performance Engineer

Remote

CIQ OVERVIEW

CIQ builds the enterprise infrastructure that powers the world's most demanding workloads. From the operating system layer through AI infrastructure, high-performance computing, and cloud-native orchestration, CIQ delivers the speed, security, scalability, and sovereignty that major enterprises, government agencies, and research institutions depend on.

CIQ is the founding support and services partner of Rocky Linux and the developer of the RLC Pro family of Enterprise Linux distributions, Fuzzball workload orchestration, Warewulf Pro cluster provisioning, and Ascender Pro automation. Our customers include some of the largest and most technically sophisticated organizations in the world, working across HPC, AI/ML, defense, and regulated industries.

We are a company of builders, operators, and open source practitioners. If you want to do work that matters, at a company that is genuinely changing how enterprise infrastructure gets built and run, we want to talk.

POSITION SUMMARY

CIQ is seeking a highly experienced Senior or Principal AI Engineer to own and drive AI/ML innovation across our product portfolio. This role sits at the intersection of AI engineering and systems performance - the right candidate brings deep expertise in model inference optimization, training workflows, and production AI deployment, combined with a strong instinct for performance at the systems level.

In this role, you will be the AI engineering standard-bearer at CIQ. You will design and build turnkey AI workload examples - both internal reference pipelines and customer-facing solutions - ensuring that CIQ’s AI story is always compelling, practical, and demonstrably best-in-class. You will integrate deeply with Fuzzball, CIQ’s cloud-native computing platform, running AI workloads end-to-end through it and helping customers do the same.

KEY RESPONSIBILITIES

This role is leveled as Senior or Principal based on qualifications and demonstrated capabilities.

AI Inference Optimization

  • Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency.
  • Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion.
  • Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform.
  • Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead.
  • Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions.

AI Training Workflows

  • Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies.
  • Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements.
  • Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure.
  • Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers.
  • Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods.

Turn-Key AI Examples & Reference Workloads

  • Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows.
  • Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball.
  • Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings.
  • Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike.
  • Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples.

AI Engineering & Tooling

  • Build and maintain AI-powered engineering tooling - leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization.
  • Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products.
  • Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap.
  • Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community.

Fuzzball Integration

  • Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment.
  • Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines.
  • Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale.
  • Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands.

Cross-Functional Collaboration

  • Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer.
  • Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work.
  • Partner with product and customer success teams to translate real-world AI pain points into engineering priorities and measurable outcomes.
  • Document and communicate findings clearly, from low-level profiling data to executive-level summaries.
  • Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ’s reputation as an AI-forward infrastructure company.

NEEDED TO SUCCEED

Successful candidates will have: 

  • Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management.
  • Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
  • Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent).
  • Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling.
  • Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI,  and the intersection of HPC with modern AI workloads.
  • Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks.
  • Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work.
  • Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data.
  • Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
  • A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern.

EDUCATION AND EXPERIENCE

  • PhD in Computer Science, Machine Learning, Computer Engineering, or a related field strongly preferred; equivalent industry experience considered.
  • 10+ years of industry experience in AI/ML engineering, systems software, or a closely related discipline.
  • Demonstrated track record of measurable, published, or production-deployed AI performance improvements at scale.
  • Experience working in or with open-source AI ecosystems (PyTorch, Triton, ONNX, Hugging Face, etc.) is a strong plus.
  • Background with cloud-native, containerized, and/or HPC computing environments preferred.

BENEFITS

  • Medical, dental, and vision insurance.

  • Flexible paid time off.

  • Employee stock options.

  • Remote work; no travel required for most positions.

 

Create a Job Alert

Interested in building your career at CIQ? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...
Select...

U.S. Standard Demographic Questions

We invite applicants to share their demographic background. If you choose to complete this survey, your responses may be used to identify areas of improvement in our hiring process.
Select...
Select...
Select...
Select...
Select...
Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in CIQ’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.