SENIOR DATA ENGINEER - BEES DATA
About us
AB InBev is the leading global brewer and one of the world’s top 5 consumer product companies. With over 500 beer brands, we’re number one or two in many of the world’s top beer markets, including North America, Latin America, Europe, Asia, and Africa.
About AB InBev Growth Group
Created in 2022, the Growth Group unifies our business-to-business (B2B), direct-to-consumer (DTC), Sales & Distribution, and Marketing teams. By bringing together global tech and commercial functions, the Growth Group allows us to fully leverage data and drive digital transformation and organic growth for AB InBev around the world.
In addition to supporting well-known global beer brands like Corona, Budweiser and Michelob Ultra, the Growth Group is home to a robust suite of digital products, including our B2B digital commerce platform BEES, on-demand delivery services Ze Delivery and TaDa Delivery, and table-top beer keg PerfectDraft.
We are an exceptional team, focused on understanding and supporting consumer and customer needs, harnessing new technology, and scaling growth opportunities.
What you do
- Implement and maintain individual components of the data platform—for example, ingestion jobs, dbt models, Spark transformations, CDC tasks, matching rules, or deduplication logic.
- Make implementation decisions within a component: schema mapping, transformation logic, join strategy, and similar choices bounded to that unit of work.
- Fix defects in transformations, ingestion jobs, or entity resolution logic when issues are identified.
- Ensure component outputs match the expected schema, data contracts, and downstream expectations.
- Improve a component’s performance, data quality checks, or reliability when gaps or incidents require it.
- Follow existing ETL and MDM standards and team patterns rather than inventing parallel approaches.
- Apply security and compliance expectations to your components: handle sensitive and personal data according to classification, retention, and minimization rules; avoid logging, samples, or exports that over-collect or expose regulated fields beyond what the use case requires.
- Use approved identity, access, and secrets patterns for jobs and services (for example, role-based access, managed identities, or vault-backed credentials)—not hard-coded secrets or ad hoc shared accounts.
- Support auditability of changes and data movement as the team defines it (for example, clear job ownership, metadata, lineage hooks, or evidence packs for controls) so security and compliance reviews can trace what the pipeline does.
Requirements and qualifications
- Bachelor's degree in Computer Science, Computer Engineering, Information Systems, Systems Analysis and Development, or similar.
- Intermediate English.
- Code quality: write clear, readable, modular code; follow team naming and formatting conventions; avoid unnecessary duplication in your own changes; prefer changes that can be understood without a verbal walkthrough.
- Verification: add required unit or transformation-level tests; validate schema assumptions and basic data quality conditions; ensure changes do not break existing behavior.
- Delivery: submit well-structured pull requests that include a clear description of the change, context, and expected impact, and evidence of testing.
- Stack (typical): Python, SQL, and data processing with PySpark and/or Scala as used in the team’s pipelines.
- Pipelines: practical experience building or maintaining batch/stream components with orchestration (for example, Apache Airflow, Databricks Workflows, or similar) and version control (Git).
- Data work: comfortable with transformation, cleansing, aggregation, and basic performance tuning for SQL and Spark workloads, given volume and complexity.
- Cloud: familiarity with services on a major provider (AWS, Azure, or Google Cloud) in the way the team deploys and runs jobs.
- Security baseline for data engineering: follow least-privilege IAM and service principals for pipelines; prefer encryption in transit and at rest where the platform provides it; keep dependencies and images within approved channels and address high-severity findings from scanners or security tooling when they affect your components.
- Compliance-aware delivery: When a change touches regulated data, new integrations, or new exports, document data purpose, flows, and safeguards in the PR or linked ticket so risk and compliance partners can assess impact without guesswork.
How you do
- Code review: review peers’ pull requests regularly (for example, daily when the team is active); call out obvious bugs, missing tests, and unclear logic constructively.
- Responsiveness: Address review feedback promptly and take responsibility for correcting mistakes in your changes.
- Judgment: do not merge or push unstable or partially validated code; escalate when validation or scope is unclear.
- Communication: explain technical tradeoffs briefly in writing (PRs, tickets) so reviewers and stakeholders can follow intent and risk.
- Security partnership: flag policy gaps, risky shortcuts, or unclear ownership of sensitive data early; engage Security & Compliance or privacy stakeholders when requirements are ambiguous instead of improvising controls.
- Integrity of controls: do not bypass access reviews, change windows, segregation-of-duty, or other mandated controls to “ship faster”; propose a compliant path or escalate.
Nice to have
- Hands-on with transformation tooling and data contracts in a shared warehouse.
- APIs or event interfaces used for data exchange between systems.
- Infrastructure-as-code or CI/CD (for example, Azure DevOps, Terraform, GitHub Actions) for job deployment.
- Familiarity with data governance tooling (catalog, quality, policy tags) or vulnerability / secret scanning in CI for data repos and pipelines.
Apply for this job
*
indicates a required field