Back to jobs

Language Data Manager

Bengaluru

About Karya:

Why was Karya on the cover of the Time Magazine , highlighted by Satya Nadella , and chosen by Google as its partner for Project Vaani

In part, because Karya is on a mission to provide AI enabled earning and learning opportunities to economically underserved communities, thereby building a pathway out of poverty for them. Karya achieves this while also delivering high quality, timely, and price competitive data to its clients.

Karya’s workers make at least 20 times the Indian minimum wage and through our one-of-a-kind digital work platform, we have delivered over 40 million digital tasks and are poised to positively impact over 100 thousand workers by the end of the year. In the coming years, our goal is to rapidly scale our impact by bringing economic opportunities to millions of underserved users in India.

We are looking for a Language Data Manager to join our data team that manages and oversees the company’s language datasets. This role will be crucial in ensuring the proper collection, organization, formatting, and storage of linguistic data. In addition to dataset management, the role will also involve significant analysis of the language data to support our research activities. The ideal candidate will have experience with language data management, including data processing and annotation, and experience with scripting tools.

Key Responsibilities:

- Manage and maintain the company’s language datasets, ensuring they are accurate, well-structured, and accessible to relevant teams.
- Oversee the collection, annotation, and processing of linguistic data for various projects.
- Develop and implement best practices for data processing, cleaning, and formatting specific to language datasets.
- Work closely with language experts and technical teams to ensure linguistic data meets the necessary quality standards.
- Support the development of tools and workflows that streamline language data preparation and analysis.
- Provide regular updates and reports on the status and quality of language datasets to stakeholders.
- Assist in the creation of metadata and documentation for datasets to ensure they are well-documented and reusable.

Must-Have Skills & Qualifications:

- Experience in managing and processing language datasets, including working with large-scale text or speech corpora.
- Proficiency in programming languages like Python, R, or similar for data processing.
- Familiarity with linguistic annotation standards and techniques.
- Experience with tools and platforms commonly used for language data management (e.g., linguistic annotation tools, NLP libraries).
- Strong problem-solving skills and attention to detail.
- Excellent communication skills and the ability to collaborate with cross-functional teams.
- Bachelor's degree in Linguistics, Computer Science, Data Science, or a related field.

 

Nice-to-Have Skills:

- Experience working with speech or text-based datasets for NLP and AI applications.
- Familiarity with machine learning models and datasets used in language technologies.
- Experience with cloud-based platforms and tools for data management.

People matter at Karya and these are some of the perks and benefits we created for our team:

  • Flexible vacation and leave policy 
  • Flexible work options
  • Insurance as per industry standards
  • Access to industry stalwarts and networking opportunities 

Qualified applicants will receive consideration without regard to their race, colour, religion, sex, sexual orientation, gender identity and disability. 

Karya invites all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search to apply for a career opportunity, reach out at hr@karya.in

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf