Senior Infrastructure Engineer (Bare Metal)
About Telnyx
Telnyx is an industry leader that's not just imagining the future of global connectivity—we're building it. From architecting and amplifying the reach of a private, global, multi-cloud IP network, to bringing hyperlocal edge technology right to your fingertips through intuitive APIs, we're shaping a new era of seamless interconnection between people, devices, and applications.
We're driven by a desire to transform and modernize what's antiquated, automate the manual, and solve real-world problems through innovative connectivity solutions. As a testament to our success, we're proud to stand as a financially stable and profitable company. Our robust profitability allows us not only to invest in pioneering technologies but also to foster an environment of continuous learning and growth for our team.
Our collective vision is a world where borderless connectivity fuels limitless innovation. By joining us, you can be part of laying the foundations for this interconnected future. We're currently seeking passionate individuals who are excited about the opportunity to contribute to an industry-shaping company while growing their own skills and careers.
We are currently seeking engineers passionate about bare metal infrastructure, AI/HPC platforms, Kubernetes-native systems, and high-performance networking technologies.
You will deploy and maintain our edge data centers for compute, AI, and storage workloads, contribute to the installation, migration, updates, and integration of Kubernetes infrastructure services and modules, and participate in the development of next-generation distributed infrastructure platforms.
Responsibilities:
- Design, deploy, and manage highly available, scalable, and secure infrastructure solutions, including Kubernetes on bare metal and Rook-managed Ceph storage platforms.
- Design and maintain Kubernetes and Rook/Ceph platforms for engineering team consumption.
- Deploy and operate GPU-accelerated infrastructure for AI and high-performance compute workloads using NVIDIA and AMD datacenter GPUs, including H200, B200/B300, and AMD MI300 series hardware.
- Architect and maintain high-performance networking stacks leveraging RoCE, InfiniBand, NVLink, Mellanox SR-IOV, virtual functions (VFs), and advanced NIC technologies.
- Design and operate high-performance storage architectures leveraging NVMe-oF (NVMe over Fabrics) technologies for low-latency distributed storage workloads.
- Develop and operate storage infrastructure using Rook for Ceph lifecycle management within Kubernetes environments.
- Build and maintain Kubernetes-native infrastructure platforms using KubeVirt, software-defined networking (SDN), WireGuard, and container networking technologies such as Calico, Flannel, and Cilium with eBPF.
- Design and implement Kubernetes network policies to isolate workloads, secure east-west traffic, and uphold infrastructure security best practices.
- Develop Kubernetes Operators, controllers, and automation services for infrastructure lifecycle management and platform orchestration.
- Contribute to infrastructure software engineering efforts focused on infrastructure reconciliation, idempotent automation, and declarative systems management.
- Develop internal tooling, APIs, and automation frameworks to support large-scale bare metal and AI infrastructure operations.
- Manage Linux kernel-level performance tuning, hardware enablement, and low-level systems troubleshooting.
- Deploy and maintain GPU, networking, and hardware drivers using Kubernetes Operators and containerized lifecycle management techniques.
- Evaluate and recommend new technologies and tools to improve the efficiency, performance, and scalability of infrastructure platforms.
- Ensure the reliability, performance, and scalability of our edge data centers.
- Troubleshoot and resolve complex infrastructure issues across compute, networking, storage, and Kubernetes layers.
- Participate in architecture design, technical planning, and documentation for new infrastructure initiatives.
Essential Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 3–5 years of production infrastructure engineering experience.
- Strong Kubernetes production experience, preferably on bare metal environments.
- Experience developing Kubernetes Operators, controllers, or other Kubernetes-native automation components.
- Strong programming and software engineering experience focused on infrastructure automation, reconciliation loops, and idempotent systems management.
- Experience building distributed systems, infrastructure automation, or platform engineering tooling.
- Experience deploying and operating Rook-managed Ceph clusters and high-performance NVMe-backed storage platforms in production Kubernetes environments.
- Strong Linux systems administration and Linux kernel troubleshooting knowledge.
- In-depth knowledge of Linux networking and distributed systems.Experience with container-native virtualization platforms such as KubeVirt.
- Experience with SDN technologies, WireGuard, and container networking technologies including Calico, Flannel, and/or Cilium eBPF.
- Experience implementing Kubernetes network policies and workload isolation strategies.
- Experience deploying and managing GPU-enabled infrastructure and associated drivers/operators in Kubernetes environments.
- Strong understanding of high-performance networking technologies including RoCE, InfiniBand, Mellanox SR-IOV, and virtual function (VF) networking.
- Strong problem-solving and troubleshooting skills.
Preferred Qualifications
- Experience with NVIDIA datacenter GPUs including H200 and B200/B300 platforms.
- Experience with AMD MI300 series accelerators.
- Familiarity with NVLink and GPU interconnect technologies.
- Experience with NVMe-oF (NVMe over Fabrics) architectures and high-performance storage networking.
- Experience with VXLAN, software-defined networking, SR-IOV technologies, and advanced NIC offloading.
- Familiarity with the BGP protocol and configuring FRR or Bird.
- Experience contributing to open-source infrastructure or Kubernetes ecosystem projects.
- Experience designing or operating AI/HPC infrastructure platforms.
- Familiarity with eBPF-based observability, networking, or security tooling.
- Experience with GitOps workflows and Kubernetes operational models.
#LI-Brazil
#LI-ARGENTINA
Create a Job Alert
Interested in building your career at Telnyx? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
