ML / AI Data Engineer (Contract)
About us:
Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.
This role focuses on building and optimising large-scale video and multimodal data systems, enabling high-throughput ingestion, processing, and model training across distributed cloud environments.
Key Responsibilities
- Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure.
- Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata).
- Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference.
- Develop high-throughput backend systems for video ingestion from desktop and mobile platforms.
- Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation.
- Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability.
- Translate ML and multimodal research into scalable, production-grade cloud architectures.
- Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers.
- Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows.
- 5+ years of experience in data engineering, ML pipelines, or distributed systems.
- Strong experience building scalable data pipelines for large datasets (video/audio preferred).
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Experience working with GPU-based environments and distributed computing.
- Strong programming skills in Python, Scala, or similar languages.
- Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar).
- Understanding of ML workflows, training pipelines, and inference systems.
- Experience designing fault-tolerant, high-availability systems.
- Strong knowledge of data storage systems (data lakes, object storage, distributed file systems).
- Ability to handle high-throughput, large-scale data ingestion and processing.
- Experience with multimodal AI (video, audio, NLP) systems.
- Familiarity with annotation tools and data labeling workflows.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Knowledge of cost optimization strategies for large-scale cloud workloads.
Tech Holding is proud to be an Equal Opportunity Employer and is committed to fostering a diverse and inclusive workplace. We welcome applicants from all backgrounds and experiences, and we consider qualified applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, disability, veteran status, or any other legally protected characteristic. If you require accommodation in the application process, please contact our HR
Apply for this job
*
indicates a required field