Member of Technical Staff, Data
About FirstPrinciples:
FirstPrinciples is a non-profit organization building an autonomous AI Physicist to understand the nature of reality: the underlying structure, governing principles, and fundamental laws of our universe. We're developing an intelligent system that can explore theoretical frameworks, reason across disciplines, and generate novel insights to tackle the deepest unsolved problems in physics. By combining AI, symbolic reasoning, and autonomous research capabilities, we're developing a platform that goes beyond analyzing existing knowledge to actively contribute to physics research. Our goal is to accelerate progress on the questions that have captivated humanity for centuries.
Job Description:
FirstPrinciples is seeking a skilled and detail-oriented Member of Technical Staff, Data to play a crucial role in our data pipeline development. In this position, you will lead projects to design and implement data extraction processes from various structured and unstructured sources, create robust parsing mechanisms, and develop sophisticated logic to extract meaningful features from raw data. Working in an agile environment, you'll iteratively refine extraction methods based on on-going feedback.
Key Responsibilities:
Project Leadership:
- Investigate and evaluate new data sources.
- Create comprehensive extraction plans and strategies for each data source.
- Lead the full lifecycle of data extraction projects from planning to implementation.
- Work closely with peers and managers to iterate quickly and refine various approaches.
- Progressively scale extraction processes from small test batches to full implementation.
Data Source Integration:
- Develop and maintain parsers for diverse data sources including APIs, databases, web content, PDFs, and scientific literature.
- Create reliable ETL processes to ensure data quality and consistency, including LLM-based extraction pipelines.
- Design and refine prompts for LLMs to extract structured information from unstructured data sources, including text, images, and other multimodal inputs.
- Implement error handling and logging systems to maintain data pipeline reliability.
Feature Engineering:
- Identify and extract valuable features from complex raw data sets.
- Develop logic and algorithms to transform unstructured information into structured, analyzable formats.
- Create reproducible processes for data normalization and standardization.
Pipeline Architecture:
- Design scalable data transformation workflows.
- Optimize parsing procedures for performance and accuracy.
- Document data lineage and transformation processes for transparency.
Collaboration:
- Work closely with cross-functional teams to understand feature requirements.
- Coordinate with engineering team to integrate data pipelines into broader systems.
- Communicate technical concepts clearly to non-technical stakeholders.
- Engage directly with third party data vendors to obtain technical specifications and integration details.
- Demonstrate ability to work effectively both as part of a collaborative team and independently on self-directed tasks.
Qualifications:
- Educational Background: Bachelor's, Master's or Phd degree in computer science, data science, information systems, physics, or related field.
- Experience: 1-5+ years of experience working with data transformation, ETL processes, or similar roles.
- Project Management Skills:
- Experience managing small to medium-sized data projects from conception to completion.
- Demonstrated ability to create technical plans and roadmaps for data extraction.
- Experience working in agile environments with iterative development cycles.
- Technical Skills:
- Proficiency in Python and/or similar languages for data processing.
- Experience with data parsing libraries and frameworks.
- Knowledge of data storage systems and formats (SQL, JSON, etc.)
- Familiarity with regular expressions and text processing techniques.
- Experience with prompt engineering for LLMs and AI-assisted data extraction.
- Analytical Skills: Strong problem-solving abilities and attention to detail.
- Communication: Ability to document processes clearly and communicate technical concepts.
- Bonus Skills:
- Experience with natural language processing.
- Familiarity with cloud-based data processing.
- Demonstrated passion for physics and for making scientific knowledge accessible and impactful.
Application Process:
- Interested candidates are invited to submit their resume, a cover letter detailing their qualifications and vision for the role, and references. Please include "Member of Technical Staff, Data" in the cover letter.
Join us at FirstPrinciples and be a part of a transformative journey where science drives progress and unlocks the potential of humanity.
Create a Job Alert
Interested in building your career at FirstPrinciples? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
