Staff Machine Learning Engineer, Data Collections AI & ML
At PitchBook, a Morningstar company, we are always looking forward. We continue to innovate, evolve, and invest in ourselves to bring out the best in everyone. We’re deeply collaborative and thrive on the excitement, energy, and fun that reverberates throughout the company.
Our extensive learning programs and mentorship opportunities help us create a culture of curiosity that pushes us to always find new solutions and better ways of doing things. The combination of a rapidly evolving industry and our high ambitions means there’s going to be some ambiguity along the way, but we excel when we challenge ourselves. We’re willing to take risks, fail fast, and do it all over again in the pursuit of excellence.
If you have a good attitude and are willing to roll up your sleeves to get things done, PitchBook is the place for you.
About the Role:
The Data Collection AI/ML team builds intelligent systems that scale and improve PitchBook’s data extraction, enrichment, and validation processes. The team applies advanced ML including classification, entity/relationship extraction, LLM-based parsing, OCR, and anomaly detection to ensure high accuracy, coverage, and timeliness of our proprietary datasets.
The Staff MLE role is a force multiplier for the team, partnering with technical leadership to set best practices and design reusable ML architectures that support rapid innovation and operational excellence.
As a Staff Machine Learning Engineer on the Data Collection AI/ML team, you will serve as the senior technical expert responsible for designing, architecting, and deploying advanced AI and machine learning systems that power PitchBook’s data collection, extraction, and enrichment workflows. You will play a pivotal role in elevating the technical bar of the organization by setting engineering standards, driving architectural decisions, and supporting teams to build scalable, production-grade ML systems.
Your work will focus on automating and enhancing PitchBook’s ingestion and data quality pipelines across a wide variety of structured and unstructured sources, drawing from domain areas such as document understanding, OCR, natural language processing, entity resolution, multimodal modeling, retrieval systems, and LLM-driven extraction. You will collaborate closely with Engineering, Product, and Data Operations partners to translate business requirements into robust, high-impact AI solutions.
This role is ideal for someone who thrives as a deeply technical IC and wants to push the boundaries of document AI and data extraction technology, shape long-term architectural direction, and materially influence the future of data automation at PitchBook.
In addition to driving product impact, this role offers an opportunity to shape PitchBook’s growing presence and technical reputation in the AI and ML space. We are looking for individuals who are active contributors to the broader AI community through peer-reviewed research, technical publications, or open-source initiatives. Candidates who have authored conference papers or patents and who are excited to explore the frontiers of generative AI, LLMs, and applied NLP will be well-positioned to help us both advance our internal capabilities and deepen trust with our customers through thought leadership
Primary Job Responsibilities:
- Serve as the key technical leader shaping system design, ML architectures, model lifecycles, and scalable infrastructure for data extraction, document understanding, and structured data enrichment
- Architect reusable frameworks and services for LLM-powered extraction, entity recognition and resolution models, and multimodal document processing
- Partner with engineering leaders to ensure our systems meet the highest standards of reliability, performance, and cost efficiency
- Design and build state-of-the-art ML models using transformers, LLMs, generative models, graph-based approaches, and OCR/Document AI frameworks
- Identify opportunities to advance automation and accuracy across our ingestion stack, including entity linking, relationship inference, classification, and anomaly detection
- Translate emerging research into practical, production-ready capabilities
- Contribute to PitchBook’s growing technical reputation through experimentation, publication, or open-source work
- Work closely with Product, Engineering, and Data Operations to ensure AI systems integrate smoothly into human-in-the-loop workflows and downstream pipelines
- Provide technical expertise during prioritization discussions, roadmap planning, and long-term strategic design
- Elevate engineering excellence through code reviews, design reviews, and technical guidance for ML engineers and scientists
- Act as a multiplier by shaping best practices for experimentation, model evaluation, responsible AI, and scalable ML engineering
- Guide teams across the organization toward cohesive, reusable, and standards-aligned architectures
- Own the lifecycle of mission-critical ML systems from data preparation to deployment, monitoring, and continuous improvement
- Ensure strong standards for model governance, explainability, and data integrity across the AI/ML stack.
- Partner with ML Ops and Platform Engineering teams, along with other partner engineering groups, to maintain high availability, reliability, and robustness for production ML systems
Skills and Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Mathematics, Data Science, or a related technical discipline (Master’s degree preferred)
- 8+ years of experience in machine learning, data science, or AI-focused engineering, with at least 4+ years of experience leading technical teams
- Proven success delivering AI-driven data extraction, enrichment, or document understanding systems at scale. Hands-on experience with parameter-efficient fine-tuning methods and expertise in document classification optimization preferred
- Deep expertise in natural language processing, document AI, OCR, entity resolution, and large-scale data automation
- Strong understanding of modern ML frameworks and infrastructure (e.g., PyTorch, TensorFlow, Hugging Face, LangChain, MLFlow)
- Demonstrated ability to define and execute multi-year AI roadmaps with measurable business impact
- Strong knowledge of cloud-native architecture, distributed computing, and scalable model deployment
- Excellent communication, collaboration, and influencing skills including experience presenting to executive and cross-functional leadership
- A track record of fostering technical excellence and innovation across global, multidisciplinary teams
- Experience in fintech, data platforms, or large-scale information extraction systems preferred
- Contributions to the AI/ML research community (e.g., publications, patents, or open-source projects) are strongly preferred
Benefits + Compensation at PitchBook:
Physical Health
- Comprehensive health benefits
- Additional medical wellness incentives
- STD, LTD, AD&D, and life insurance
Emotional Health
- Paid sabbatical program after four years
- Paid family and paternity leave
- Annual educational stipend
- Ability to apply for tuition reimbursement
- CFA exam stipend
- Robust training programs on industry and soft skills
- Employee assistance program
- Generous allotment of vacation days, sick days, and volunteer days
Social Health
- Matching gifts program
- Employee resource groups
- Subsidized emergency childcare
- Dependent Care FSA
- Company-wide events
- Employee referral bonus program
- Quarterly team building events
Financial Health
- 401k match
- Shared ownership employee stock program
- Monthly transportation stipend
*Please be aware the above PitchBook benefit and perk offerings are subject to corresponding plan and policy documents and may change during the course of your employment.
Compensation
- Annual base salary: $260,000-$325,000
- Target annual bonus percentage: 20%
Working Conditions:
At the heart of our company is a belief in the power of in-person collaboration. Being together in the office fuels our creativity, strengthens our connections, and drives the innovation that sets us apart. Our culture is built on spontaneous moments—those hallway conversations, whiteboard brainstorms, and shared celebrations in each of our global offices—that simply can’t be replicated remotely. This role is expected to be in the office 5 days a week.
The job conditions for this position are in a standard office setting. Employees in this position use PC and phone on an on-going basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Life At PB:
We are consistently recognized as a Best Place to Work and our culture is at the heart of our success. It’s our fundamental belief that people do and create great things and that people are the cornerstone of prosperity. We believe that proactively seeking out different points of view, listening to others, learning, and reflecting on what we’ve heard creates a sense of belonging within PitchBook and strengthens the PitchBook community.
We are excited to get to know you and your background. Concerned that you might not meet every requirement? We encourage you to still apply as you might be the right candidate for the role or other roles at PitchBook.
#LI-
#LI-Onsite
Create a Job Alert
Interested in building your career at PitchBook Data? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
