Director, Platform Management & Observability
About Graphcore
How often do you get the chance to build a technology that transforms the future of humanity? Graphcore products have set the standard in made-for-AI compute hardware and software, gaining global attention and industry acclaim. Now we are developing the next generation of artificial intelligence compute with systems that will allow AI researchers to develop more advanced models, help scientists unlock exciting new discoveries, and power companies around the world as they put AI at the heart of their business. Graphcore recently joined SoftBank Group – bringing large and ongoing investment from one of the world’s leading backers of innovative AI companies.
Job Summary
As the engineering Director for Platform Management & Observability, you will be responsible for building, managing and guiding a team of talented engineers focussed on the architecture, implementation and deployment of highly scalable management solutions for AI infrastructure built using our next-generation products. Covering monitoring, observability, control, and data centre infrastructure management, you will work closely with software, cloud and customer-facing teams, to establish first-hand knowledge of these solutions, enabling the creation of proof-of-concepts, reference designs and integrations with third-party tooling.
You team will work closely with product, architecture and other delivery teams to ensure that functionally complete, simple to deploy, and easy to use solutions are deployed internally to support engineering efforts and supply reference designs to our customers.
Responsibilities and Duties
- Manage a team contributing to all phases of overall product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.
- Deliver and manage the operation of an internal management & observability service for use by engineering teams to aid debugging, performance analysis, benchmarking, test/QA, etc. of our systems from system bring-up thru customer release, at all scales.
- Foster evaluations of new technologies and innovation to both anticipate future customer needs and develop a strategy for Graphcore data center management solutions.
- Take ownership of rapidly prioritizing team objectives in response to dynamic business objectives.
- Identify and act on opportunities for process improvement within the team, leading initiatives to enhance team efficiency and improve quality.
- Work with product management, other engineering team leads, our customer-facing teams, and internal customers to ensure timely delivery of team deliverables.
- Champion quality by ensuring solutions are continually and thoroughly tested.
- Work with senior management to establish strategic plans and objectives.
- Mentor and guide junior engineers; coach managers and team leads.
- Foster a culture of continuous learning and improvement.
Skills and Experience
- BSc or MSc degree in Computer Engineering, Computer Science, or related degree or equivalent experience.
- Proven experience with over 10 years of managing engineering teams.
- Comfortable working on complex issues where problems are not clearly defined and where fundamental principles do not fully apply.
- Detail-oriented and comfortable with multitasking in a dynamic environment with shifting priorities and changing requirements.
- Strong analytical, creative, and problem-solving skills.
- Excellent written and verbal communication skills.
- Experience in the use of Jira and Confluence for project management.
Desirable:
- 14+ years of relevant post-degree experience.
- Familiarity with component technologies, such as Prometheus, Grafana, OpenTelemetry, Clickhouse, Kafka, Superset, in addition to common integrated stacks such as Elastic Stack, Better Stack, LGTM.
- Familiarity with commercial observability solutions like Datadog, Dynatrace and Splunk.
Benefits
In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications
Apply for this job
*
indicates a required field