SDE IV - GPU Engineer
Glance AI is an AI commerce platform shaping the next wave of e-commerce with inspiration-led shopping, less about searching for what you want and more about discovering who you could be. Operating in 140 countries, Glance AI transforms every screen into a stage for instant, personal, and joyful discovery, where inspiration becomes something you can explore, feel, and shop in the moment.
Its proprietary models, seamlessly integrated with Google’s most advanced AI platforms Gemini and Imagen on Vertex AI, deliver hyper-realistic, deeply personal shopping experiences across fashion, beauty, travel, accessories, home décor, pets, and more. With an open architecture designed for effortless adoption across hardware and software ecosystems, Glance AI is building a platform that can become a staple in everyday consumer technology.
Glance AI partners with the world’s leading smartphone makers, connected TV manufacturers, telecom providers and global brands, meeting people where they are: on mobile, smart TVs and brand websites. Part of the InMobi Group, a global technology and advertising leader reaching over 2 billion devices and serving more than 30,000 enterprise brands worldwide, Glance AI is backed by Google, Jio Platforms and Mithril Capital.
About the Role
As a GPU Systems Engineer, you’ll lead design and optimization efforts across our GPU inference stack.
You will architect the libraries and runtime systems that enable Stable Diffusion, multimodal transformers, and emerging video generation models to run efficiently at scale.
You’ll guide cross-functional teams, influence hardware selection, and set the technical vision for GPU optimization practices across the company.
Key Responsibilities
- Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
- Lead investigations into cross-GPU performance bottlenecks, communication overheads, and scheduling inefficiencies.
- Drive multi-GPU parallelism strategies — model, pipeline, and tensor parallelization.
- Establish company-wide GPU optimization standards, tooling, and SLIs.
- Collaborate with research to design scalable implementations of novel architectures.
- Mentor engineers in profiling, tuning, and low-level optimization.
- Partner with hardware vendors and infra teams to maximize cluster utilization.
Required Qualifications
- 5+ years in high-performance computing, GPU runtime systems, or ML infrastructure.
- Proven expertise in CUDA / Triton / C++, with deep understanding of GPU scheduling, occupancy, register usage, and tensor cores.
- Experience building and maintaining distributed inference or training systems.
- Ability to design abstractions balancing flexibility and performance.
- Strong knowledge of NCCL, NVLink, PCIe, and interconnects.
- Familiar with profiling automation and performance dashboards.
- Excellent technical leadership and mentoring capabilities.
Preferred Qualifications
- Background in compiler-aided optimization (TVM, XLA, MLIR, Triton).
- Experience tuning Stable Diffusion or transformer inference pipelines.
- Exposure to heterogeneous compute backends (AMD ROCm, TPU, ASICs).
- Experience working with hardware–software co-design initiatives.
- Open-source or research contributions in GPU optimization
"Glance collects and processes personal data such as your name, contact details, resume and other information that may contain personal data for the purpose of processing your application. Glance utilizes Greenhouse, a third-party platform. Please review Greenhouse's Privacy Policy to understand how the data collected from you is processed and managed. By clicking on 'Submit Application', you acknowledge and agree to the above privacy terms. Should you have any privacy concerns, you may contact us through the details mentioned in your application confirmation email."
Apply for this job
*
indicates a required field