Tech Lead, Data Engineering
About the Company
At 21.co Technologies, our mission centers on building scalable bridges into the world of cryptocurrency. By creating DeFi accessibility through traditional financial standards, we bring ourselves one step closer to the equitable financial future we all believe in.
About the Role
We are seeking a highly motivated and skilled Data Engineer with a focus on MLOps and Large Language Models (LLMs) to join our team as Technical Lead and help us design, build, and maintain robust data pipelines and infrastructure. As a Data Engineer with expertise in LLMs, you will be responsible for ensuring data is accessible, reliable, and optimally structured to support analytics, machine learning, and LLM-driven applications. You will work on cutting-edge technologies and collaborate closely with cross-functional teams, enabling you to make a significant impact on our data-driven and AI-focused strategies.
This role integrates core data engineering principles with MLOps practices to support the full lifecycle of LLM-driven applications, from data preparation to production monitoring. This role offers opportunities for growth, innovation, and learning in a dynamic and fast-paced environment.
Responsibilities and Scope
- Design and maintain scalable data pipelines tailored to LLM requirements, including preprocessing unstructured text data from various sources, implementing chunking strategies, and optimizing embedding generation for vector databases.
- Build and manage data infrastructure, including data warehouses, data lakes, and streaming solutions, specifically optimized for LLM workflows.
- Deploy LLMs into production environments using containerization (Docker) and orchestration tools (Kubernetes).
- Automate CI/CD pipelines for model versioning, A/B testing, and rollback procedures, ensuring seamless updates to fine-tuned models.
- Optimize data systems for performance, reliability, and scalability, particularly for real-time inference for applications like chatbots or document analysis.
- Implement MLOps-driven model deployment and monitoring, tracking key metrics such as inference latency, token usage costs, and output quality drift.
- Manage vector databases (e.g., Qdrant, Pinecone, FAISS) and design indexing strategies for Retrieval-Augmented Generation (RAG) architectures.
- Collaborate with data scientists/analysts, and other stakeholders to understand data and LLM requirements and deliver solutions.
- Create and maintain documentation for all data-related processes, procedures, and workflows, including LLM-specific pipelines and deployments.
- Research and stay up-to-date with the latest trends, technologies, and best practices in data engineering, MLOps, and LLM technologies.
- Mentor junior engineers, conduct technical reviews and provide active guidance.
- Contribute to technical roadmap planning, architectural decision-making and lead technical initiatives.
- Implement data governance best practices, establish and enforce data quality standards across teams and projects.
- Identify and mitigate technical risks in data infrastructure and LLM deployments.
What You Will Need To Be Great In This Role
- 8+ years of experience as a Data Engineer with 3+ years focused on MLOps.
- Strong proficiency in Python, SQL, and data orchestration tools (e.g., Airflow).
- Experience with cloud platforms like AWS (SageMaker), Google Cloud Platform (Vertex AI), or Azure Machine Learning for managed LLM deployments.
- Familiarity with data warehouse solutions such as Snowflake or BigQuery.
- Experience with big data technologies like Spark, Hadoop, or Kafka.
- Understanding of data modeling and schema design (e.g., dimensional modeling).
- Proficiency with version control systems like Git.
- Excellent problem-solving and debugging skills.
- Strong communication skills and the ability to work collaboratively with cross-functional teams.
- Experience working in Agile development environments.
- Hands-on experience with Hugging Face Transformers, LangChain for prompt engineering, and LlamaIndex for document indexing.
- Portfolio demonstrating production LLM applications with performance metrics.
- Previous experience leading technical teams or mentoring engineers.
Our Stack
- Languages: Python, SQL, Go
- Tools: Apache Airflow, Kafka (MSK, RedPanda), LangChain, Langsmith
- Cloud Platforms: AWS (S3, Databricks)
- Databases: Postgres, MongoDB, Vector Databases (Qdrant)
- Version Control: Git
Preferred
- Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
- Familiarity with modern data streaming tools (e.g., Kafka, Kinesis).
- Familiarity with Natural Language Processing (NLP) / LLM.
- Familiarity with chunking & data transformation for LLMs.
- Familiarity with Vector Databases / Embedding Stores.
- Hands-on experience with real-time analytics or machine learning pipelines.
- Exposure to or interest in data visualization tools like Tableau, Looker, or Streamlit.
- Experience with specialized LLM techniques and RAG.
- Implementation of OpenTelemetry for distributed tracing and integration with Betterstack/Grafana dashboards.
This role will be based in Zurich and will be expected to work from our Zurich office in a hybrid capacity (Monday-Wednesday in the office.)
Apply for this job
*
indicates a required field