Principal System Test Engineer
About Graphcore
At Graphcore, we’re building the future of AI compute.We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Job Summary
As a Principal System Test Engineer, you will be part of the System Test Team responsible for developing hardware manufacturing test solutions for High Performance Modules (HPM), Server Blades, and Multi-node racks as part of the production assembly process. You will act as the technical lead for the creation of robust test executive software that sequences tests and integrates test content from multiple internal and external sources. You will develop manufacturing test content and test fixtures informed by a deep understanding of silicon, board, and system design. Working cross-functionally with hardware, firmware, and software teams, you will ensure the right tools and functionality exist to efficiently validate product quality at scale.
The Team
The Product Test and Diagnosis team is responsible for identifying and diagnosing hardware defects introduced during the manufacturing process. The team defines, develops, and executes an end-to-end test strategy spanning silicon, board-level assemblies, server blades, and rack-scale systems. Testing is performed across the full product lifecycle, including manufacturing, and deployed field environments.
Product Test and Diagnosis is part of the Manufacturing Operations organisation, which also comprises Manufacturing Technology, Supply Chain, and Quality teams. The group operates as a global team, with engineers based in the UK (Bristol and Cambridge), Taiwan, India, and the United States.
Responsibilities and Duties
- Provide technical leadership for the Hardware System Test Team, setting engineering direction and acting as the go-to expert.
- Provide technical leadership within the broader end-to-end test strategy by defining and evolving the manufacturing system test approach for High Performance Modules, Server Blades, and Multi-node racks.
- Architect, develop, and own the test executive software used in both manufacturing and lab environments to orchestrate test sequencing, logging, results aggregation, and integration of test content from diverse sources.
- Lead the technical definition and development of manufacturing test content, leveraging deep knowledge of silicon, board, and system design to detect defects and prevent escapes.
- Partner with internal software, firmware, and platform teams to define and secure the required hooks, diagnostics, telemetry, and APIs needed for effective test and diagnosis.
- Establish and enforce best practices for test architecture, code quality, coverage, traceability, release management, and configuration control of test software and content.
- Design and drive data-driven test improvements using manufacturing metrics (yield, escapes, false fails, coverage, cycle time) and lead technical initiatives to reduce cost and improve robustness.
- Lead complex root-cause investigations spanning manufacturing and field returns; drive corrective actions across design, validation, and manufacturing engineering.
- Technically lead the definition, design, and validation of manufacturing test fixtures for High Performance Modules and Server Blades, working with external subcontractors to ensure fixtures meet functional, reliability, and test coverage requirements.
- Ensure test solutions are scalable, maintainable, and production-ready, including documentation, factory deployment plans, and second-line technical support for production issues.
- Provide technical mentorship for engineers in the team, including design reviews, debugging guidance, and raising engineering standards across hardware/software boundaries.
- Support supplier/contract manufacturing engagements as needed, including test station readiness, deployment qualification, golden unit strategy, and ongoing production support.
Candidate Profile
Essential:
- Extensive experience developing and deploying manufacturing test solutions for high-performance servers, accelerators (GPU-class), or comparable large-scale compute hardware.
- Proven technical leadership as an individual contributor, owning architecture and driving alignment across multiple teams without direct line management responsibility.
- Deep technical understanding across silicon, board, and system design sufficient to define effective manufacturing test coverage and lead complex debug/root-cause analysis.
- Strong cross-functional collaboration skills, with experience driving requirements and delivery with internal software/firmware/platform teams to enable testability and diagnosability.
- Hands-on experience using Linux, OpenBMC, and vendor-specific test tools, diagnostics, and system-level utilities to test hardware in lab and manufacturing environments.
- Strong software engineering skills with scripting and automation (e.g., Python, Bash, or similar), including building reliable test infrastructure and integrating heterogeneous test content.
Desirable
- Familiarity with rack-scale/multi-node systems, high-speed interconnects, and production qualification of complex assemblies.
- Experience improving manufacturing KPIs (yield, test time, false fail reduction) through data-driven test optimisation and robust statistical/diagnostic approaches.
- Knowledge of open-source tools commonly used for manufacturing test, system bring-up, and hardware diagnostics.
- Experience designing testability/diagnosability features into products (e.g., telemetry, self-test hooks, debug interfaces) in partnership with design teams.
Benefits
In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications
Create a Job Alert
Interested in building your career at Graphcore? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field