Reliability Engineering Intern - Datacenter RAS
About Celestial AI
As Generative AI continues to advance, the performance drivers for data center infrastructure are shifting from systems-on-chip (SOCs) to systems of chips. In the era of Accelerated Computing, data center bottlenecks are no longer limited to compute performance, but rather the system’s interconnect bandwidth, memory bandwidth, and memory capacity. Celestial AI’s Photonic Fabric™ is the next-generation interconnect technology that delivers a tenfold increase in performance and energy efficiency compared to competing solutions.
The Photonic Fabric™ is available to our customers in multiple technology offerings, including optical interface chiplets, optical interposers, and Optical Multi-chip Interconnect Bridges (OMIB). This allows customers to easily incorporate high bandwidth, low power, and low latency optical interfaces into their AI accelerators and GPUs. The technology is fully compatible with both protocol and physical layers, including standard 2.5D packaging processes. This seamless integration enables XPUs to utilize optical interconnects for both compute-to-compute and compute-to-memory fabrics, achieving bandwidths in the tens of terabits per second with nanosecond latencies.
This innovation empowers hyperscalers to enhance the efficiency and cost-effectiveness of AI processing by optimizing the XPUs required for training and inference, while significantly reducing the TCO2 impact. To bolster customer collaborations, Celestial AI is developing a Photonic Fabric ecosystem consisting of tier-1 partnerships that include custom silicon/ASIC design, system integrators, HBM memory, assembly, and packaging suppliers.
ABOUT THE ROLE
We are seeking a highly motivated Reliability Engineering Intern to join our Datacenter RAS (Reliability, Availability, and Serviceability) team, with a focus on Silicon Photonics integration. This role is ideal for students interested in the intersection of hardware reliability, optical interconnects, and large-scale system performance.
You will work on evaluating and improving the reliability of silicon photonics components and subsystems deployed in hyperscale datacenter environments, contributing to the long-term uptime and serviceability of next-generation compute and networking infrastructure.
ESSENTIAL DUTIES AND RESPONSIBILITIES
- Support the development and execution of RAS strategies for silicon photonics-based interconnects in datacenter systems.
- Assist in reliability testing, lifetime modeling, and failure mode analysis of photonic components (e.g., lasers, modulators, photodetectors, optical transceivers).
- Analyze field return data and lab test results to identify trends, root causes, and opportunities for design or process improvements.
- Collaborate with cross-functional teams (hardware, packaging, systems, and software) to ensure RAS requirements are met for photonic integration.
- Contribute to the development of monitoring and diagnostics tools for early detection of photonic degradation or failure in deployed systems.
- Help build or enhance data pipelines and dashboards for tracking reliability metrics and system health indicators.
- Document findings and present recommendations to engineering and leadership teams.
QUALIFICATIONS
- Pursuing a Bachelor’s or Master’s degree or Doctorate in Electrical Engineering, Optical Engineering, Computer Engineering, or a related field.
- Knowledge of silicon photonics, optical communication systems, or semiconductor device physics.
- Familiarity with RAS principles in large-scale systems or datacenter environments is a strong plus.
- Experience with data analysis tools (e.g., Python, MATLAB, JMP) and database systems.
- Exposure to optical test equipment and reliability testing standards (e.g., Telcordia, JEDEC) is a plus.
- Strong analytical, communication, and documentation skills.
- Passion for solving complex problems at the intersection of hardware reliability and system-level performance.
WHAT YOU'LL GAIN
- Hands-on experience with cutting-edge silicon photonics technologies in real-world datacenter applications.
- Exposure to RAS methodologies and system-level reliability engineering.
- Mentorship from industry experts and opportunities to present your work to technical leaders.
- A chance to contribute to the future of scalable, high-speed, and energy-efficient data infrastructure.
LOCATION: Santa Clara, CA
This paid summer internship offers a competitive hourly rate of $40.00. Please note that as an intern, you will not be eligible for company-sponsored benefits, including paid time off, health insurance, life insurance, stock options, or retirement plans.
We offer great benefits (health, vision, dental and life insurance), collaborative and continuous learning work environment, where you will get a chance to work with smart and dedicated people engaged in developing the next generation architecture for high performance computing.
Celestial AI Inc. is proud to be an equal opportunity workplace and is an affirmative action employer.
#LI-Onsite
Create a Job Alert
Interested in building your career at Celestial AI? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field