New

Director of Engineering, AI & ML, Data Collections

New York, New York, United States

At PitchBook, a Morningstar company, we are always looking forward. We continue to innovate, evolve, and invest in ourselves to bring out the best in everyone. We’re deeply collaborative and thrive on the excitement, energy, and fun that reverberates throughout the company. 

Our extensive learning programs and mentorship opportunities help us create a culture of curiosity that pushes us to always find new solutions and better ways of doing things. The combination of a rapidly evolving industry and our high ambitions means there’s going to be some ambiguity along the way, but we excel when we challenge ourselves. We’re willing to take risks, fail fast, and do it all over again in the pursuit of excellence.

If you have a good attitude and are willing to roll up your sleeves to get things done, PitchBook is the place for you. 

About the Role:

The Data Collection AI/ML team sits at the intersection of automation, AI, and data quality. The team’s mission is to accelerate and scale PitchBook’s data coverage by applying advanced ML models to identify, extract, and validate entities, relationships, and key insights from vast collections of structured and unstructured sources, including filings, PDFs, and market documents. 

As the Director of the Data Collection AI/ML team, you will partner closely with Data Operations, Data Collections Product and Engineering teams to deliver seamless, intelligent data pipelines that transform raw inputs into high-quality, actionable data. 

As the Director of Data Collection AI & ML you will lead the strategy, vision, and execution of AI and ML initiatives focused on automating PitchBook’s data extraction, enrichment, and validation workflows. Your organization will be responsible for building intelligent systems that ensure PitchBook’s data collection is as accurate, comprehensive, and timely as possible, leveraging cutting-edge document AI, OCR, and entity resolution techniques applied to financial documents and other proprietary data sources. 

In this highly visible leadership role, you will manage a global team of 15+ data scientists and machine learning engineers, driving both innovation and operational excellence. You will define the roadmap for automation and enrichment across multiple data domains and ensure scalable, production-grade AI systems that directly power PitchBook’s data ingestion pipelines and platform accuracy. 

Your leadership will guide the design and deployment of AI-driven extraction and enrichment models, including document classification, named entity recognition, relationship extraction, and quality validation systems. You will also play a key role in shaping cross-functional data and AI strategy in collaboration with Product, Data Operations, and the client-facing Insights AI/ML teams, ensuring consistency, reliability, and alignment with enterprise data objectives. 

This role demands a visionary yet hands-on leader — one who can balance strategic direction with technical credibility, and who can influence across teams and executive levels while ensuring the continued professional development of a globally distributed technical team. 

In addition to driving product impact, this role offers an opportunity to shape PitchBook’s growing presence and technical reputation in the AI and ML space. We are looking for individuals who are active contributors to the broader AI community through peer-reviewed research, technical publications, or open-source initiatives. Candidates who have authored conference papers or patents and who are excited to explore the frontiers of generative AI, LLMs, and applied NLP will be well-positioned to help us both advance our internal capabilities and deepen trust with our customers through thought leadership.

Primary Job Responsibilities:

  • Define and execute the AI & ML strategy for data collection, extraction, and enrichment automation aligned with PitchBook’s long-term data strategy 
  • Partner with senior leadership to identify high-impact opportunities for AI-driven automation and cost reduction in data collection workflows
  • Establish success metrics and operational KPIs for automation accuracy, throughput, and coverage improvement 
  • Lead, hire, and develop a high-performing global team of data scientists and ML engineers; define team structure, roles, and growth paths that align with organizational goals 
  • Foster a culture of innovation, accountability, inclusion, and continuous improvement across distributed offices
  • Champion hiring, mentorship, and professional development initiatives to grow internal AI/ML talent 
  • Elevate engineering excellence through code reviews, design reviews, and technical guidance for ML engineers and scientists
  • Act as a multiplier by shaping best practices for experimentation, model evaluation, responsible AI, and scalable ML engineering
  • Guide teams across the organization toward cohesive, reusable, and standards-aligned architectures
  • Collaborate closely with Engineering, Product Management, and Data Operations to ensure the successful operationalization of AI/ML solutions into data pipelines and collection processes 
  • Partner with data quality and enrichment teams to align ML outputs with domain-specific validation frameworks
  • Serve as a trusted technical and strategic advisor to stakeholders across Product, Engineering, and Data Operations 
  • Oversee the end-to-end lifecycle of ML systems, from research and experimentation to deployment, monitoring, and optimization  
  • Ensure high availability, reliability, and performance of production AI/ML systems 
  • Implement and maintain strong standards of data integrity, security, and compliance in all models
  • Support the vision and values of the company through role modeling and encouraging desired behaviors 
  • Participate in various company initiatives and projects as requested 

Skills and Qualifications:

  • Bachelor’s or Master’s degree in Computer Science, Mathematics, Data Science, or a related technical discipline (Master’s degree preferred) 
  • 12+ years of experience in machine learning, data science, or AI-focused engineering, including 7+ years leading technical teams; experience managing managers and geographically distributed teams is strongly preferred 
  • Proven success delivering AI-driven data extraction, enrichment, or document understanding systems at scale
  • Deep expertise in natural language processing, document AI, OCR, entity resolution, large-scale data automation, optimizing large document workflows, and addressing latency in retrieval-based architectures 
  • Familiarity with agentic AI frameworks (MCP, A2A) and orchestration of multi-agent systems 
  • Strong understanding of modern ML frameworks and infrastructure (e.g., PyTorch, TensorFlow, Hugging Face, LangChain) 
  • Demonstrated ability to define and execute multi-year AI roadmaps with measurable business impact
  • Strong knowledge of cloud-native architecture, distributed computing, and scalable model deployment
  • Excellent communication, collaboration, and influencing skills — including experience presenting to executive and cross-functional leadership 
  • A track record of fostering technical excellence and innovation across global, multidisciplinary teams
  • Experience in fintech, data platforms, or large-scale information extraction systems preferred
  • Contributions to the AI/ML research community (e.g., publications, patents, or open-source projects) are strongly preferred
  • Must be authorized to work in the United States without the need for visa sponsorship now or in the future

Benefits + Compensation at PitchBook:

Physical Health            

  • Comprehensive health benefits
  • Additional medical wellness incentives 
  • STD, LTD, AD&D, and life insurance

Emotional Health 

  • Paid sabbatical program after four years
  • Paid family and paternity leave 
  • Annual educational stipend
  • Ability to apply for tuition reimbursement
  • CFA exam stipend 
  • Robust training programs on industry and soft skills 
  • Employee assistance program
  • Generous allotment of vacation days, sick days, and volunteer days 

Social Health 

  • Matching gifts program
  • Employee resource groups
  • Subsidized emergency childcare  
  • Dependent Care FSA
  • Company-wide events
  • Employee referral bonus program  
  • Quarterly team building events

Financial Health 

  • 401k match
  • Shared ownership employee stock program 
  • Monthly transportation stipend  

*Please be aware the above PitchBook benefit and perk offerings are subject to corresponding plan and policy documents and may change during the course of your employment.

Compensation

  • Annual base salary: $260,000-$340,000
  • Target annual bonus percentage: 25%

Working Conditions:

At the heart of our company is a belief in the power of in-person collaboration. Being together in the office fuels our creativity, strengthens our connections, and drives the innovation that sets us apart. Our culture is built on spontaneous moments—those hallway conversations, whiteboard brainstorms, and shared celebrations in each of our global offices—that simply can’t be replicated remotely. This role is expected to be in the office 5 days a week.

The job conditions for this position are in a standard office setting. Employees in this position use PC and phone on an on-going basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.

We are excited to get to know you and your background. Concerned that you might not meet every requirement? We encourage you to still apply as you might be the right candidate for the role or other roles at PitchBook.

#LI-MS1

#LI-Onsite

Create a Job Alert

Interested in building your career at PitchBook Data? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Education

Select...
Select...

Select...
Select...
Select...
Select...
Select...
Select...
Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in PitchBook Data’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.