DC Server Deployment Engineer
About xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.
We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.
All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
About the Role
This is a full-time, onsite role based at our Memphis data center, where you will work alongside our Site Operations (SiteOps) engineers.
xAI is looking for a skilled Server Deployment Engineer with a strong foundation in data center server hardware and system software. We seek someone eager to expand their expertise in server deployment and optimization within a high-performance environment. As a Server Deployment Engineer at xAI, you will play a critical role in validating, testing, integrating, and provisioning server hardware in our data center. Collaborating closely with internal system software teams and external vendors, you will ensure server quality, system health, and the efficiency of server intake and provisioning processes. You are detail-oriented, quick to learn, and excel at executing tasks with precision. Your deep understanding of the equipment allows you to identify and drive efficiencies, optimizing server intake and repair processes to enhance overall performance.
Responsibilities
- Server Testing and Integration: Execute comprehensive server testing, integration, and provisioning within xAI data centers to ensure seamless deployment and operation of high-performance computing environments.
- System Diagnostics and Remediation: Diagnose and troubleshoot system faults in collaboration with vendors, implementing effective solutions to maintain optimal system performance.
- Hardware Management: Maintain a high throughput of compute and storage hardware intake, ensuring efficient processing, deployment, and integration of new hardware components.
- Automation and Tool Development: Develop, optimize, and maintain scripts and tools to automate processes, enhance system monitoring, and improve overall data center operations.
- Vendor and Team Collaboration: Lead and facilitate technical discussions with external vendors and internal teams to ensure alignment on system requirements, performance standards, and issue resolutions.
Basic Qualifications
- High school diploma or equivalency certificate
- 3+ years of hands-on experience working with server, storage, compute, and network hardware, including troubleshooting, maintenance, and repair of servers and networking infrastructure
Preferred Skills and Experience
- Technical Expertise in Linux/Unix: Extensive experience in Linux/Unix environments, with deep knowledge of various Linux distributions, either as a system administrator or developer, including familiarity with Linux boot processes and core system engineering principles.
- Scripting and Automation Skills: Proficiency in scripting languages such as Python, Bash, or other relevant tools, with the ability to develop scripts for automation, monitoring, and system optimization.
- Networking and Distributed Systems: Strong understanding of Ethernet networking at scale, including experience with distributed systems and network configuration in complex environments.
- Advanced Troubleshooting Skills: Demonstrated ability to diagnose complex hardware and software issues, apply systematic problem-solving approaches, and implement effective resolutions.
- Strong Communication and Collaboration: Excellent communication and interpersonal skills with the ability to work effectively with cross-functional teams and external vendors.
- Adaptability and Commitment: Highly motivated, with a strong commitment to working in a fast-paced and dynamic environment, demonstrating a proactive and hands-on approach to challenges.
Additional Requirements
- Ability to work for extended periods when necessary, including tasks that require standing or moving hardware components.
- Willingness to work evenings, weekends, or extended hours as needed to support critical operations and meet project deadlines.
- Must comply with pre-employment and ongoing random drug and alcohol testing, in accordance with company policies.
- Comfortable working in an environment requiring exposure to noise
Why Join Us?
Join a pioneering team at the forefront of AI and data center innovation, where your work will directly impact the development of next-generation technologies. Thrive in a fast-paced, dynamic workplace that encourages creativity, continuous learning, and personal development, offering ample opportunities to advance your skills and career. Work alongside top experts and thought leaders in the industry, collaborating on cutting-edge technologies that are redefining the landscape of AI, data centers, and high-performance computing.
xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws, including the San Francisco Fair Chance Ordinance, Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act.
For Los Angeles County (unincorporated) Candidates:
xAI reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of a conditional offer of employment:
- Access to information technology systems and confidential information, including proprietary and trade secret information, and/or user data;
- Interacting with internal and/or external clients and colleagues; and
- Exercising sound judgment.
Apply for this job
*
indicates a required field