Member of Technical Staff - Image / Video Generation
We're the team behind Latent Diffusion, Stable Diffusion, and FLUX—foundational technologies that changed how the world creates images and video. We’re creating the generative models that power how people make images and video—tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we're just getting started.
Headquartered in Freiburg, Germany with a growing presence in San Francisco, we're scaling fast while staying true to what makes us different: research excellence, open science, and building technology that expands human creativity.
What You'll Work On
You'll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn't about following established recipes—it's about running the experiments that clarify which architectural choices matter and which are less impactful.
You'll be the person who:
- Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters
- Rigorously ablates design choices—running experiments that isolate variables, control for confounds, and produce insights you can actually trust—then communicating those results to shape our research direction
- Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously
- Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren't enough
Questions We're Wrestling With
- Which architectural choices actually matter for image and video quality, and which are just expensive distractions?
- How do you design ablation studies that isolate the signal from the noise at billion-parameter scale?
- What are the real speed-quality tradeoffs for different architectures—and how do they change with scale?
- When does fine-tuning a foundation model work better than training from scratch, and why?
- How do you evaluate generative models in ways that correlate with what users actually care about?
- Which training techniques (FSDP configurations, precision strategies, parallelism approaches) matter for model quality versus just training speed?
These aren't solved problems—they're questions we're actively figuring out through rigorous experimentation.
What we are looking for
You've trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You're comfortable debugging distributed training issues and presenting research findings to the team.
You likely have:
- Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training
- Experience fine-tuning diffusion models for specialized applications—upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture
- Deep understanding of how to effectively evaluate image and video generative models—knowing which metrics correlate with quality and which are just convenient proxies
- Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning
- Solid understanding of distributed training techniques—FSDP, low precision training, model parallelism—because our models don't fit on one GPU and training decisions impact research outcomes
We'd be especially excited if you:
- Have experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors
- Bring proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers
- Know the performance characteristics of different architectural choices at scale
- Have published research that contributed to how people think about generative models
What We're Building Toward
We're not just training models—we're working to better understand what matters in generative AI through rigorous experimentation. Each ablation study helps uncover assumptions we didn't know we were making. Each architecture decision teaches us more about the tradeoffs that matter. Each training run at scale adds insights that don't show up at smaller scales. If that sounds more compelling than following established approaches, we should talk.
Create a Job Alert
Interested in building your career at Black Forest Labs? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
.png?1754920013)
