Senior Data Engineer

ABOUT US

Africa has an untapped talent pool that is increasing exponentially, with a projected workforce that will exceed India and China by 2035. Fuelled by this, our mission is to create work opportunities for millions of Africa’s youth and solve the world’s shortage of technology talent. 

Our company SAND has 450+ staff members from 48+ countries with plans to acquire several companies around the world to become a multi-billion dollar global technology services provider that employs over 100K employees across the globe.

We have 2 brands represented in our ecosystem:

  • With ALX, we're cultivating the fastest-growing pool of technology talent globally, training aspiring professionals in software engineering, cloud computing, data science, and Salesforce across over 60 countries. ALX fosters a vibrant community for top tech talent, igniting transformative careers by connecting top technology talent with extraordinary opportunities for impact.
  • Through Sand Technologies we support enterprises and scale-ups around the world to develop world class technology products, build great technology teams, generate more revenue, and deliver outstanding customer experience. Our clients include one of the largest cloud computing providers in the world as well as Bestseller A/S (Denmark), Create Prime (New Zealand), Stanbic Bank (Kenya), and Tamara (Dubai). 

With varying levels of expertise in software development, data, cloud, machine learning, artificial intelligence, UX design, web development, etc, we provide unparalleled opportunities to technology talent worldwide while reshaping industries, disrupting traditional business models, and creating new opportunities for innovation and growth.

We do hard things!

ABOUT THE ROLE

Sand Technologies focuses on cutting-edge cloud-based data projects, leveraging tools such as Databricks, DBT, Docker, Python, SQL, and PySpark to name a few. We work across a variety of data architectures such as Data Mesh,  lakehouse, data vault and data warehouses. Our data engineers create pipelines that support our data scientists and power our front-end applications. This means we do data-intensive work for both OLTP and OLAP use cases. Our environments are primarily cloud-native spanning AWS, Azure and GCP, but we also work on systems running self-hosted open source services exclusively. We strive towards a strong code-first, data as a product mindset at all times, where testing and reliability with a keen eye on performance is a non-negotiable.

JOB SUMMARY

A Senior Data Engineer, has the primary role of designing, building, and maintaining scalable data pipelines and infrastructure to support data-intensive applications and analytics solutions. In this role, you will be responsible for not only developing data pipelines but also designing data architectures and overseeing data engineering projects. You will work closely with cross-functional teams and contribute to the strategic direction of our data initiatives.

RESPONSIBILITIES

  1. Data Pipeline Development: Lead the design, implement, and maintain scalable data pipelines for ingesting, processing, and transforming large volumes of data from various sources using tools such as databricks, python and pyspark.
  2. Data Architecture: Architect scalable and efficient data solutions using the appropriate architecture design, opting for modern architectures where possible.
  3. Data Modeling: Design and optimize data models and schemas for efficient storage, retrieval, and analysis of structured and unstructured data.
  4. ETL Processes: Develop, optimize and automate ETL workflows to extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses.
  5. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics.
  6. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services for data storage, processing, and analytics.
  7. Data Quality and Governance: Implement and oversee data governance, quality, and security measures.
  8. Monitoring, Optimization and Troubleshooting: Monitor data pipelines and infrastructure performance, identify bottlenecks and optimize for scalability, reliability, and cost-efficiency. Troubleshoot and fix data-related issues.
  9. DevOps: Build and maintain basic CI/CD pipelines, commit code to version control and deploy data solutions. 
  10. Collaboration: Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand requirements, define data architectures, and deliver data-driven solutions.
  11. Documentation: Create and maintain technical documentation, including data architecture diagrams, ETL workflows, and system documentation, to facilitate understanding and maintainability of data solutions.
  12. Best Practices: Stay current with emerging technologies and best practices in data engineering, cloud architecture, and DevOps.
  13. Mentoring: Mentor and guide junior and mid-level data engineers.
  14. Technology Selection: Evaluate and recommend technologies, frameworks, and tools that best suit project requirements and architecture goals.
  15. Performance Optimization: Optimize software performance, scalability, and efficiency through architectural design decisions and performance tuning.

QUALIFICATIONS

  • Proven experience as a Senior Data Engineer, or in a similar role, with hands-on experience building and optimizing data pipelines and infrastructure, and designing data architectures.
  • Proven experience working with Big Data and tools used to process Big Data
  • Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues.
  • Excellent understanding of data engineering principles and practices.
  • Excellent communication and collaboration skills to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders.
  • Ability to adapt to new technologies, tools, and methodologies in a dynamic and fast-paced environment.
  • Ability to write clean, scalable, robust code using python or similar programming languages. Background in software engineering a plus.
  • Knowledge of data governance frameworks and practices.
  • Understanding of machine learning workflows and how to support them with robust data pipelines.

DESIRABLE LANGUAGES/TOOLS

  • Proficiency in programming languages such as Python, Java, Scala, or SQL for data manipulation and scripting.
  • Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling.
  • Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink.
  • Experience in using modern data architectures, such as lakehouse.
  • Experience with CI/CD pipelines, version control systems like Git, and containerization (e.g., Docker).
  • Experience with ETL tools and technologies such as Apache Airflow, Informatica, or Talend.
  • Strong understanding of data governance and best practices in data management.
  • Experience with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions.
  • Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues.
  • SQL (for database management and querying)
  • Apache Spark (for distributed data processing)
  • Apache Spark Streaming, Kafka or similar (for real-time data streaming) 
  • Experience using data tools in at least one cloud service - AWS, Azure or GCP (e.g. S3, EMR, Redshift, Glue, Azure Data Factory, Databricks, BigQuery, Dataflow, Dataproc)

Would you like to join us as we work hard, have fun and make history?

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

(max 200 words)

Select...
Select...