Back to jobs
Member of Technical Staff - Model Serving / API Backend
Remote | Germany | USA
Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently a strong candidate to join us in developing and improving our API / model serving backend and services.
Role:
- Develop and maintain robust APIs for serving machine learning models
- Transform research models into production-ready demos and MVPs
- Optimize model inference for improved performance and scalability
- Implement and manage user preference data acquisition systems
- Ensure high availability and reliability of model serving infrastructure
- Collaborate with ML researchers to rapidly prototype and deploy new models
Ideal Experience:
- Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development
- Extensive experience with RESTful API development and deployment for ML tasks
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
- Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services
- Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio
- Experience with distributed task queues and scalable model serving architectures
- Understanding of monitoring, logging, and observability best practices for ML systems
Nice to have:
- Experience with frontend development frameworks (e.g., Vue.js, Angular, React)
- Familiarity with MLOps practices and tools
- Knowledge of database systems and data streaming technologies
- Experience with A/B testing and feature flagging in production environments
- Understanding of security best practices for API development and ML model serving
- Experience with real-time inference systems and low-latency optimizations
- Knowledge of CI/CD pipelines and automated testing for ML systems
- Expertise in ML inference optimizations, including techniques such as:
- Reducing initialization time and memory requirements
- Implementing dynamic batching
- Utilizing reduced precision and weight quantization
- Applying TensorRT optimizations
- Performing layer fusion and model compilation
- Writing custom CUDA code for performance enhancements
Apply for this job
*
indicates a required field