New

Senior Researcher

London, UK
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com.
 
We're proud to be a Living Wage accredited Employer.

 

Role Overview

We are looking for a Senior Researcher to join Monolith’s Research team, now part of CoreWeave. This is a high-impact, high-ownership role for a researcher who combines deep technical expertise in machine learning, statistical modelling, optimisation, and large-scale systems data with the ability to take complex, ambiguous problems from first principles through to production.

The Monolith Data Science team is building a layered reliability and intelligence platform that shifts CoreWeave from reactive troubleshooting to proactive reliability engineering. The platform spans telemetry ingestion, feature engineering, anomaly detection, failure prediction, distributed straggler detection, performance modelling, workload optimisation, and agentic root cause analysis.

You will work closely with Fleet, Infrastructure, AI Platform, engineering, product, and client-facing teams to improve cluster reliability, increase effective utilisation, reduce MTTR, protect uptime, and turn large-scale GPU infrastructure telemetry into measurable operational and commercial impact.

This is not a traditional data science role focused on dashboards, business metrics, or standard forecasting. The role sits at the intersection of applied research, GPU infrastructure, high-performance computing, distributed systems, reliability engineering, telemetry, optimisation, and Physical AI. It demands rigorous scientific thinking, strong execution, and comfort working in a high-ambiguity environment where the right problem framing is often as important as the final model.

What You’ll Do

Research Leadership & Strategy

  • Contribute meaningfully to Monolith and CoreWeave’s research direction by identifying high-leverage problems in GPU infrastructure analytics, cluster reliability, workload performance, scheduling, and utilisation.
  • Originate novel research directions for turning raw infrastructure telemetry into actionable intelligence, rather than simply applying standard machine learning or data science techniques.
  • Evaluate emerging methods across statistical modelling, machine learning, observability, optimisation, simulation, reinforcement learning, anomaly detection, and autonomous diagnostics, providing well-grounded technical judgement on which approaches are most likely to create real-world impact.
  • Champion rigour, reproducibility, and scientific integrity across research outputs, experiments, prototypes, and production validation.
  • Help establish a research foundation for understanding how large-scale GPU systems behave, why workloads underperform, where bottlenecks emerge, and how reliability can be improved proactively.

Technical Depth & Execution

  • Lead the design and development of sophisticated statistical, machine learning, and optimisation systems for large-scale GPU infrastructure telemetry, including compute, networking, storage, workload, and distributed systems data.
  • Develop advanced models and methodologies to optimise GPU utilisation, workload scheduling, infrastructure efficiency, and system reliability.
  • Build models and methods for anomaly detection, failure prediction, distributed straggler detection, degraded workload identification, bottleneck diagnosis, and agentic root cause analysis.
  • Design experiments, analyse large-scale system telemetry, and prototype predictive and optimisation algorithms that directly inform production systems.
  • Drive technical decisions on difficult modelling problems involving noisy time-series data, high-dimensional telemetry, causal inference, uncertainty, robustness, generalisation, and out-of-distribution behaviour.
  • Explore simulation, digital-twin, reinforcement learning, and adaptive scheduling approaches where they can improve understanding or optimisation of GPU clusters and distributed training environments.
  • Take end-to-end ownership of research work from problem framing and exploratory analysis through prototype development, validation, and collaboration with engineering teams on production deployment.
  • Maintain deep personal technical expertise; remain a hands-on contributor in Python and modern scientific computing / machine learning tooling.

Organisational Influence & Collaboration

  • Serve as a strong technical voice within the research organisation, helping shape how Monolith approaches complex infrastructure intelligence problems.
  • Work closely with Fleet, Infrastructure, AI Platform, engineering, product, and customer-facing teams to ensure research work lands with real operational and commercial impact.
  • Translate research findings into production-ready prototypes, deployable solutions, and technical recommendations that improve performance, reliability, utilisation, and cost efficiency.
  • Contribute to research practices and norms that improve how the team handles ambiguous, high-dimensional, real-world systems problems.
  • Communicate complex technical work and its implications clearly to a range of audiences, from close technical collaborators to senior leadership and external stakeholders.
  • Help build a shared understanding of how large-scale AI infrastructure behaves, where it fails, and how it can be made more reliable, efficient, and intelligent.

Technical Focus

  • Applied machine learning for GPU infrastructure and distributed systems
  • Large-scale telemetry ingestion, feature engineering, and infrastructure analytics
  • GPU cluster reliability, utilisation, observability, and performance analysis
  • Anomaly detection, degradation detection, and failure prediction
  • Distributed straggler detection and workload performance diagnosis
  • Agentic root cause analysis and autonomous diagnostic systems
  • Time-series, high-dimensional, structured, and operational systems data
  • Performance modelling for distributed workloads and AI training jobs
  • Workload scheduling, capacity planning, forecasting, and resource allocation modelling
  • Optimisation techniques including stochastic optimisation, convex optimisation, reinforcement learning, and adaptive scheduling
  • Simulation and digital-twin approaches for complex infrastructure systems
  • Causal inference, controlled experiments, hypothesis testing, and statistical validation
  • End-to-end research systems: data pipelines, prototypes, validation, deployment, and monitoring

What We’re Looking For

  • 8+ years of experience, or equivalent research experience, applying statistical modelling, machine learning, optimisation, or applied AI to large-scale datasets.
  • MS or PhD in Computer Science, Statistics, Applied Mathematics, Machine Learning, Physics, Engineering, or a related quantitative field.
  • Strong proficiency in Python and scientific computing libraries such as NumPy, pandas, SciPy, scikit-learn, PyTorch, or TensorFlow.
  • Experience working with large-scale structured datasets, time-series data, infrastructure telemetry, performance data, sensor data, or other complex operational data.
  • Experience designing and analysing controlled experiments, including A/B testing, hypothesis testing, causal inference, or rigorous model validation.
  • Experience building and validating predictive models in production or research environments.
  • Experience with distributed data systems such as Spark, Ray, Dask, or similar.
  • Proficiency in SQL and working with large-scale structured data.
  • Strong understanding of optimisation techniques such as linear programming, convex optimisation, stochastic optimisation, reinforcement learning, or adaptive scheduling.
  • Demonstrated ability to solve ambiguous technical problems where the right approach is not already known.
  • Ability to translate research findings into production-ready prototypes, deployable workflows, or operational tooling.
  • Strong scientific judgement, including experimental design, reproducibility, validation, and awareness of uncertainty.
  • The ability to communicate clearly and influence across research, engineering, product, infrastructure, and leadership audiences.

Preferred Experience

  • PhD with published research in systems optimisation, distributed computing, ML systems, performance modelling, reliability engineering, scientific computing, or a related area.
  • Experience with GPU workloads, distributed training, AI infrastructure, HPC, or large-scale compute environments.
  • Familiarity with Kubernetes, containerised workloads, cloud-native systems, or distributed infrastructure.
  • Experience developing reinforcement learning, adaptive scheduling, autonomous diagnostics, or agentic systems.
  • Background in capacity planning, forecasting, resource allocation modelling, or infrastructure efficiency.
  • Experience with observability, hardware telemetry, performance monitoring, root cause analysis, or failure prediction.
  • Contributions to open-source machine learning, systems, infrastructure, or scientific computing projects.

Wondering If You’re a Good Fit?

We believe in investing in our people and value candidates who bring diverse experiences to our teams, even if they are not a 100% skill or experience match.

You may be a strong fit if:

  • You love uncovering hidden failure patterns in massive, noisy infrastructure datasets.
  • You are curious about building autonomous or agentic systems that investigate, explain, and optimise complex system behaviour.
  • You have deep expertise in predictive modelling, reinforcement learning, optimisation, statistical modelling, or large-scale data analysis.
  • You enjoy working from first principles on problems where the correct approach is not obvious.
  • You are interested in GPU infrastructure, distributed systems, AI training workloads, reliability engineering, and the operational behaviour of large-scale compute environments.
  • You want your research to move beyond analysis and into systems that improve real-world performance, uptime, utilisation, and cost.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast. We’re in an exciting stage of hyper-growth, operating at the centre of the demand for large-scale accelerated compute. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together

By joining Monolith’s Research team within CoreWeave, you will work on problems that sit directly at the frontier of AI infrastructure: how massive GPU systems behave, why workloads underperform, how they fail, and how they can be made more reliable, efficient, and intelligent.

This is an opportunity to help build a new category of infrastructure intelligence — one that moves beyond monitoring and dashboards toward systems that can understand, explain, predict, and optimise the behaviour of large-scale GPU clusters.

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As the organisation continues to grow, the opportunities to shape new technical directions are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.

To fulfill our obligation to protect client data, successful applicants offered employment with CoreWeave will be required to complete a basic criminal record check, conducted in compliance with GDPR. Employment offers are conditional upon receiving satisfactory check results

What We Offer

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

  • Family-level Medical Insurance
  • Family-level Dental Insurance 
  • Generous Pension Contribution 
  • Life Assurance at 4x Salary 
  • Critical Illness Cover 
  • Employee Assistance Programme 
  • Tuition Reimbursement
  • Work culture focused on innovative disruption

Benefits may vary by location. 

Equal Opportunity 

CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.

Recruitment Agencies

CoreWeave does not accept speculative CVs. Any unsolicited CVs received will be treated as the property of CoreWeave and your Terms & Conditions associated with the use of CVs will be considered null and void.

Any unsolicited CVs sent by your company to us – that is to say, in any situation where we have not directly engaged your company in writing to supply candidates for a specific vacancy – will be considered by us to be a “free gift”, leaving us liable for no fees whatsoever should we choose to contact the candidate directly and engage the candidate’s services, and will in no way establish any prior claim by your company to representation of that candidate should the candidate’s details also be submitted by any other party.

Export Control Compliance

This position requires access to export controlled information.  To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency.  CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.

 

Updated privacy notice - UK and EU Job Applications 

When you apply to a job on this site, the personal data contained in your application will be collected by CoreWeave UK Ltd. (“Controller”), which is located at

Phosphor (6th Floor), 133 Park Street, London, SE1 9EA

and can be contacted by emailing careers.eu@coreweave.com. Controller’s data protection officer can be contacted at privacy@coreweave.com. Your personal data will be processed for the purposes of managing Controller’s recruitment related activities, which include setting up and conducting interviews and tests for applicants, evaluating and assessing the results thereto, and as is otherwise needed in the recruitment and hiring processes. Such processing is legally permissible under Art. 6(1)(f) of (i) Regulation (EU) 2016/679 (General Data Protection Regulation (“GDPR”) and (ii) the GDPR as it forms part of the laws of the UK (“UK GDPR”), as necessary for the purposes of the legitimate interests pursued by the Controller, which are the solicitation, evaluation, and selection of applicants for employment. Your personal data will be shared with Greenhouse Software, Inc., a cloud services provider located in the United States of America and engaged by Controller to help manage its recruitment and hiring process on Controller’s behalf. With respect to transfers originating from the UK or the European Economic Area ("EEA") to a country outside the UK or the EEA, we implement the appropriate transfer mechanism(s) and other appropriate solutions to address cross-border transfers as required by applicable law. You may request a copy of the suitable mechanisms we have in place by contacting us at privacy@coreweave.com

Your personal data will be retained by Controller as long as Controller determines it is necessary to evaluate your application for employment. Where permitted by applicable law, we may also retain your personal data for a limited period after the recruitment process ends in order to consider you for future job opportunities, respond to legal claims, or comply with record-keeping obligations. Under the GDPR and the UK GDPR, you have the right to request access to your personal data, to request that your personal data be rectified or erased, and to request that processing of your personal data be restricted. You also have the right to data portability. In addition, you may lodge a complaint with the relevant supervisory authority: (i) A list of Europe’s data protection authorities can be found here; and (ii) for the UK, this is the Information Commissioner's Office. 

For additional information, please see our Privacy Policy.

Create a Job Alert

Interested in building your career at CoreWeave Europe? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...
Select...
Select...
Select...
Note: This information is required and only used by CoreWeave to ensure compliance with U.S. export control laws and regulations.
Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in CoreWeave Europe’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.