Back to jobs
New

Head of IT Engineering – HPC, AI Infrastructure & Low-Latency Systems (Chicago)

Chicago, Illinois, United States

ThinkMarkets is a global financial technology company, specializing in providing multi-asset trading solutions to thousands of clients around the world. With our flagship ThinkTrader platform, we make it available for our clients to trade the world 24-hours a day. Our mission is to bridge the gap between traders, investors, and platforms by allowing access to global markets and thousands of products thus providing our clients the ability to trade the world in the palm of their hand. We use the latest technologies to give traders seamless access to our proprietary trading platforms. 

At the core of our infrastructure is a commitment to performance, reliability, and innovation. As we scale, our focus on ultra-low latency, HPC environments, and AI-driven infrastructure continues to shape the future of high-frequency, high-throughput financial systems. 

We are seeking a Head of IT Engineering with deep cross-functional expertise across network engineering, systems architecture, and high-performance compute environments. You will lead the design, deployment, and ongoing optimization of a globally distributed, ultra-low latency infrastructure supporting advanced trading and AI workloads. 

This is a hands-on leadership role requiring both strategic oversight and deep technical fluency in performance tuning, automation, and fault-tolerant design. You’ll lead a global team while collaborating closely with product engineering, AI/ML teams, security, and operations. 

Key Responsibilities:  

  • Lead a cross-functional team of 10–15 engineers (network, systems, dev ops, application), including hiring, coaching, and performance management 
  • Architect and manage ultra-low latency infrastructure to support mission-critical trading platforms and AI workloads 
  • Drive the deployment and scaling of high-performance computing servers, GPU clusters, and AI/ML pipelines 
  • Implement and maintain performance-optimized hybrid cloud environments (bare-metal, on-prem, private, and public cloud) 
  • Oversee low-latency networking architectures using technologies such as DPDK, SR-IOV, RoCE, and PTP for deterministic timing 
  • Develop and refine comprehensive observability solutions using Prometheus, Grafana, ELK, Zabbix, and APM tooling 
  • Lead root cause analysis and post-mortem processes for complex incidents involving performance degradation or systems failure 
  • Drive adoption of infrastructure-as-code (IaC) and automation frameworks (Terraform, Ansible, Python, CI/CD pipelines) 
  • Collaborate with InfoSec and compliance teams to ensure alignment with frameworks like ISO 27001, NIST 800-53, and PCI DSS 
  • Manage global infrastructure operations with a 24/7 on-call rotation for system availability and resilience 
  • Contribute to and own long-term infrastructure roadmap planning, including AI/ML compute capacity and global data center expansion 

Requirements 

  • 10+ years of hands-on experience in systems, infrastructure, or network engineering 
  • 3+ years in a leadership or management role overseeing complex infrastructure at scale 
  • Proven experience managing ultra-low latency systems in trading, fintech, or HPC environments 
  • Deep technical expertise in Linux (RHEL/CentOS/Ubuntu) and Windows Server, including kernel tuning and system hardening 
  • Expert knowledge of low-latency networking, including multicast, BGP, QoS, hardware offloading (SR-IOV), and time sync protocols (e.g., PTP) 
  • Hands-on experience with HPC infrastructure, AI clusters (NVIDIA CUDA, Slurm, Kubernetes), and GPU server management 
  • Proficiency in scripting and automation (Python, Bash) and IaC tools (Terraform, Ansible, Jenkins) 
  • Familiarity with virtualization and orchestration platforms (vSphere, Nutanix AHV, Docker, Kubernetes) 
  • Strong understanding of security frameworks and compliance standards (ISO, NIST, PCI DSS) 
  • Practical knowledge of performance observability, tracing, and telemetry at scale
  • Experience working in Agile or DevSecOps environments, with strong stakeholder engagement skills 

Our Benefit Offerings:

    • Medical, Dental, and Vision coverage available
    • Employer-Paid Short-Term Disability and Life Insurance
    • 401k and Roth 401k (retirement planning) options available and competitive company match
    • Generous Paid Holidays

 

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Education

Select...
Select...
Select...