Back to jobs
Member of Technical Staff - Pretraining / Inference Optimization
Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently seeking a strong researcher / engineer to work closely with our research team on pretraining and inference optimization.
Role:
- Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads
- Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers
- Reasoning about the speed and quality trade-offs of quantization for model inference
- Developing and improving low-level kernel optimizations for state-of-the-art inference and training
- Innovating new ideas that bring us closer to the limits of a GPU
Ideal Experiences:
- Being familiar with the latest and the most effective techniques in optimizing inference and training workloads
- Optimizing for both memory-bound and compute-bound operations
- Understanding GPU memory hierarchy and computation capabilities
- Deep understanding of efficient attention algorithms
- Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors
- Using, for example, pybind to integrate custom-written kernels into a PyTorch framework
Nice to have:
- Experience with Diffusion and Autoregressive models
- Experience in low-level CUDA kernel optimizations
Apply for this job
*
indicates a required field