Contract Data Engineer
Contract Duration: 6 weeks
Hours: Part-time, 15–20 hours/week
Start Date: ASAP
Reports To: Chief Data Scientist, Backstroke.com
About the Role
Backstroke.com is seeking a part-time contract Data Engineer to support critical data engineering work powering our predictive modeling efforts. In this 6-week engagement, you’ll help bring raw data into production-grade pipelines, improve data reliability and observability, and help maintain a large-scale dataset used for machine learning and embedding-based predictive models.
This role is hands-on and execution-focused, working closely with the Chief Data Scientist to accelerate modeling throughput and strengthen the stability and usability of our data foundation.
Key Responsibilities
- Ingest raw data into production data pipelines used for data science modeling (batch and/or near real-time as needed)
- Build and enhance AWS-based data workflows, leveraging best practices for scalability and security
- Set up alerts and notifications in AWS to monitor pipeline health, failures, latency, and data quality issues
- Create and manage a database layer that stores transformed data, including embeddings used for predictive models
- Support management of a large-scale dataset, including movement, cleaning, normalization, and maintaining consistency for modeling use
Required Qualifications
- Strong experience as a Data Engineer supporting machine learning or data science teams
- Deep working knowledge of AWS services, such as (or similar):
- S3, IAM, Lambda, CloudWatch, SNS, EventBridge
- Glue, ECS/EKS, Step Functions (nice to have)
- Experience building data pipelines (e.g., Python, SQL, Spark, dbt, Airflow, Dagster, Prefect, or similar tools)
- Experience designing and maintaining databases for ML workflows, including embedding stores and feature-like datasets
- Comfort working with large datasets and ensuring performance, reliability, and correctness
- Ability to work independently, communicate clearly, and deliver quickly in a contractor environment
Preferred / Nice-to-Have
- Familiarity with vector databases and embedding storage patterns (e.g., pgvector, OpenSearch, Pinecone, FAISS, etc.)
- Exposure to MLOps concepts (feature pipelines, training dataset versioning, model monitoring)
- Experience with data quality tooling (e.g., Great Expectations, Monte Carlo, custom checks)
Deliverables & Outcomes (6-Week Goals)
- Reliable ingestion of raw data into modeling pipelines
- Monitoring and alerting for critical pipeline workflows in AWS
- Operational database/storage system for embedding-ready transformed data
- Improved processes for handling and cleaning a large dataset used in predictive models
- Clear documentation of pipeline architecture and handoff notes for the internal team
Working Style
You’ll collaborate directly with the Chief Data Scientist and contribute to a fast-moving, data-driven team. We value pragmatic engineering, clear documentation, and systems that are reliable and easy to operate.
Create a Job Alert
Interested in building your career at Backstroke? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
