Principal Data Engineer
About Us
Wizard is revolutionizing the shopping experience using the power of generative AI and rich messaging technologies to build a personalized shopping assistant for every consumer. We scour the entire internet of products and ratings across brands and retailers to find the best products for every consumer’s personalized needs. Using an effortless text-based interface, Wizard AI is always just a text away. The future of shopping is here. Shop smarter with Wizard.
The Role
We are seeking a Principal Data Engineer with deep expertise in Spark to lead the design and evolution of Wizards data infrastructure. This is a senior level hands on technical role, ideal for someone passionate about building scalable data systems, mentoring engineers and helping shape data strategy. As a thought leader on our data engineering team you will architect systems that support high performance batch and real time data processing, power advanced analytics and drive our AI team forward.
Key Responsibilities:
- Own the architecture and strategic direction of scalable, distributed data infrastructure across cloud platforms.
- Design and build a data compilation system to normalize, match, and merge products, reviews, and editorial data from thousands of data sources
- Use the latest NLP, LLMs, and embedding models to generate the highest quality datasets with automated data auditing and reporting
- Implement real time and batch data processing systems to power AI/ML use cases
- Collaborate with engineering, AI and product teams to ensure data availability and reliability
- Develop backend data solutions that support microservices architecture and a rapidly scaling product environment
- Manage and extend integrations with third party e-commerce platforms to expand Wizard’s data ecosystem
- Mentor and support data engineers, establishing best practices
You
- 8+ years of software development and data engineering experience with demonstrated ownership of production grade data infrastructure
- Bachelor's degree in Computer Science or a related field, or equivalent practical experience.
- Deep expertise in building ETL pipelines using Apache Spark, Databricks, or Hadoop is required
- Strong understanding of distributed computing and modern data modeling techniques for scalable systems.
- Expert in Python with experience implementing software engineering best practices
- Solid understanding of distributed computing and data modeling for scalable systems.
- Hands-on experience with both relational (MySQL / PostgreSQL) and NoSQL (MongoDB, DynamoDB, Cassandra) databases
- Excellent communicator and collaborator, with a passion for mentoring, knowledge-sharing, and team growth
Nice to Have:
- Experience working in early-stage, high-growth environments
- Familiarity with MLOps pipelines and integrating ML models into data workflows.
- Passionate about problem-solving with a proactive approach to finding innovative solutions.
The expected salary for this role is $235,000 - $285,000 depending on experience and level
Please note you will only be considered for the position if you meet the minimum technical requirements. We offer a remote-friendly environment; however, employees must reside within the United States and be eligible to obtain or hold the legal right to work in this country.
Benefits
- Early-stage startup with massive growth potential and ability to grow as Wizard grows
- Competitive compensation packages, including equity
- Health
- Comprehensive, high-quality medical coverage
- Dental & vision insurance
- OneMedical memberships for you and dependents
- Spring health platform for mental healthcare personalized to your needs
- XP Health eyewear benefits ($180, 3x per year)
- Rightway Health Guide
- Wealth
- 401(k) Plan
- Life & Disability insurance covered by Wizard
- Work/Life
- Flexible PTO and sick time to take care of yourself and your family
- 12 paid holidays
- 16 weeks parental leave for primary and secondary caregivers
Apply for this job
*
indicates a required field