Software Engineer Intern - Fuzzy Distinct

At Dataiku, we're not just adapting to the AI revolution, we're leading it. Since our beginning in Paris in 2013, we've been pioneering the future of AI with a platform that makes data actionable and accessible. With over 1,000 teammates across 25 countries and backed by a renowned set of investors, we're the architects of Everyday AI, enabling data experts and domain experts to work together to build AI into their daily operations, from advanced analytics to Generative AI. 


Internship goal

Augment Dataiku data preparation by improving features on data records 

Detailed description

Today, Dataiku boasts a robust data preparation framework that functions admirably to process a vast amount of data, helping users to have clean databases with the right data (and only the right data) inside them. However, we believe that with your help, we can take it a step further!

In a world where databases can be filled by real humans, data is not always clean. Errors can happen, typos can be made, and sometimes, you want to merge two database tables containing the same information, but not quite in the same format. “Dataiku”, “dataiku”, “data\niku” refer to the same company, but will be considered different entries in your database.

The goal of this internship is to improve the capabilities of our “distinct” processor to support fuzzy matching (aka: matching data that looks almost the same). You will participate to help our customers clean up their database, detect duplicated information and reduce them to a single line.

Why Engineering at Dataiku? 

Dataiku’s on-premise, cloud, or SaaS-deployed platform connects many data science technologies,  and our technology stack reflects our commitment to quality and innovation. We integrate the best of data and AI tech, selecting tools that truly enhance our product. From the latest LLMs to our dedication to open source communities, you'll  work with a dynamic range of technologies and contribute to the collective knowledge of global tech innovators. You can find out even more about working in Engineering at Dataiku by taking a look here.

How you'll make an impact

  • Get familiar with Dataiku and its data preparation recipes as well as database schemas.

  • Participate to design a new component able to detect duplicate data

  • Develop the User Interface that helps the user understand the clusters of data

  • Help our users to reduce their data overload!


  • Python and Java for the backend side

  • JavaScript/Angular for the frontend part


What are you waiting for!
At Dataiku, you'll be part of a journey to shape the ever-evolving world of AI. We're not just building a product; we're crafting the future of AI. If you're ready to make a significant impact in a company that values innovation, collaboration, and your personal growth, we can't wait to welcome you to Dataiku! And if you’d like to learn even more about working here, you can visit our Dataiku LinkedIn page.
Our practices are rooted in the idea that everyone should be treated with dignity, decency and fairness. Dataiku also believes that a diverse identity is a source of strength and allows us to optimize across the many dimensions that are needed for our success. Therefore, we are proud to be an equal opportunity employer. All employment practices are based on business needs, without regard to race, ethnicity, gender identity or expression, sexual orientation, religion, age, neurodiversity, disability status, citizenship, veteran status or any other aspect which makes an individual unique or protected by laws and regulations in the locations where we operate. This applies to all policies and procedures related to recruitment and hiring, compensation, benefits, performance, promotion and termination and all other conditions and terms of employment. If you need assistance or an accommodation, please contact us at:

Apply for this job


indicates a required field

,,Google Drive,or

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter
,,Google Drive,or

Accepted file types: pdf, doc, docx, txt, rtf


Demographic Questionnaire (Non USA locations)

We invite you to complete this optional survey to help us evaluate our diversity and inclusion efforts. Submission of the information on this form is strictly voluntary and refusal to provide it will not subject you to any adverse treatment or affect your job application. Information obtained will be kept separate from your name or job application. This information will be kept secure and confidential and will be used solely to evaluate our diversity and inclusion efforts.