Member of Technical Staff, Multimedia
About Us:
Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.
The Role:
We are looking for a highly motivated Member of Technical Staff with expertise in speech and audio modeling to join our research and engineering team. This role will contribute to advancing our capabilities across key product areas including automatic speech recognition (ASR), text-to-speech (TTS), end-of-utterance detection, diarization, and speech-to-speech systems. You’ll be responsible for both conducting cutting-edge research and contributing production-level code to deploy models at scale.
Key Responsibilities:
- Conduct research in speech and speech-language model modalities, including data collection, training, and experimentation.
- Design, train, and implement machine learning models for ASR, TTS, and other audio-related applications.
- Write high-quality, maintainable code in Python for both experimental pipelines and production deployment.
- Collaborate with cross-functional teams to bring research innovations into real-world products.
- Contribute to building scalable infrastructure to support training and inference for speech models.
- Analyze model performance, run evaluations, and drive improvements based on empirical findings.
Minimum Qualifications:
- Bachelor's degree in Computer Science, Electrical Engineering, or a related field.
- 5 years of experience in machine learning, with a focus on speech/audio processing.
- Proficiency in Python and experience working with machine learning frameworks (e.g., PyTorch or TensorFlow).
- Demonstrated ability to write production-quality code and work across research and engineering domains.
- Familiarity with core speech processing techniques (e.g., ASR, TTS, or related).
Preferred Qualifications:
- Master’s or PhD in a relevant technical field with research experience in speech or multimodal modeling.
- Experience deploying speech models into production environments.
- Familiarity with end-to-end speech-to-speech systems and end-of-utterance detection.
- Experience pre-training and/or fine-tuning models with a focus on speech/audio processing
- Contributions to open-source projects or published research in top-tier conferences
- Experience working in fast-paced, cross-disciplinary teams.
Why Fireworks AI?
- Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
- Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
- Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
- Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.
Apply for this job
*
indicates a required field