AWS Data Engineer (Architect)
About Us
“Capco, a Wipro company, is a global technology and management consulting firm. Awarded with Consultancy of the year in the British Bank Award and has been ranked Top 100 Best Companies for Women in India 2022 by Avtar & Seramount. With our presence across 32 cities across globe, we support 100+ clients across banking, financial and Energy sectors. We are recognized for our deep transformation execution and delivery.
WHY JOIN CAPCO?
You will work on engaging projects with the largest international and local banks, insurance companies, payment service providers and other key players in the industry. The projects that will transform the financial services industry.
MAKE AN IMPACT
Innovative thinking, delivery excellence and thought leadership to help our clients transform their business. Together with our clients and industry partners, we deliver disruptive work that is changing energy and financial services.
#BEYOURSELFATWORK
Capco has a tolerant, open culture that values diversity, inclusivity, and creativity.
CAREER ADVANCEMENT
With no forced hierarchy at Capco, everyone has the opportunity to grow as we grow, taking their career into their own hands.
DIVERSITY & INCLUSION
We believe that diversity of people and perspective gives us a competitive advantage.
MAKE AN IMPACT
Job Title: Senior Data Engineering - AWS
Location: Bangalore
Experience: 12+ Years
-
The data management platform is required to gather key data elements (payments, notifications and status) from Kafka topics, and store that data in raw form in AWS S3 (likely using S3 Tables/Iceberg). From the raw form, the data needs to be curated via Glue to be available for queries via Athena. The target concept is an S3 Tables/Iceberg based Data Lakehouse.
From this Data Lakehouse, a number of data products will be created, including payments and statements support, enquiry and analysis capabilities.
The engagement has these main phases:
Analysis of As Is, particularly on data gaps in current offerings Data Product definition in partnership with product owners Definition of To Be HLSD Multiple LLSD Recruit and lead implementation team Deployment and integration The target delivery dates are during 2026, with a start date in 2025.
Email from Manish Tiwari below.
Thanks Nathan Our base job requirement is around this , we will interview all profiles shared
Data Hands-On Architect — Payments Data Products (BDD)
Mission
Build the data backbone for Bankline Direct. Ingest ISO 20022 lifecycle events (VPM/PMN/PSN) into an AWS lakehouse, deliver sub-10 ms status queries for channel ops, and 7+ year analytics with strong governance, observability, and cost control.Own / Deliver
- Data products (To-Be): Channel Ops Warehouse (~30-day high-perf layer) and Channel Analytics Lake (7+ yrs). Expose status and statements APIs with clear SLAs.
- Platform architecture: S3/Glue/Athena/Iceberg lakehouse, Redshift for BI/ops. QuickSight for PO/ops dashboards. Lambda/Step Functions for stream processing orchestration.
- Streaming & ingest: Kafka (K4/K5/Confluent) and AWS MSK/Kinesis; connectors/CDC to DW/Lake. Partitioning, retention, replay, idempotency. EventBridge for AWS-native event routing.
- Event contracts: Avro/Protobuf, Schema Registry, compatibility rules, versioning strategy.
- As-Is → To-Be: Inventory APIs/File/SWIFT feeds and stores (Aurora Postgres, Kafka). Define migration waves, cutover runbooks.
- Governance & quality: Data-as-a-product ownership, lineage, access controls, quality rules, retention.
- Observability & FinOps: Grafana/Prometheus/CloudWatch for TPS, success rate, lag, spend per 1M events. Runbooks + actionable alerts.
- Scale & resilience: Tens of millions of payments/day, multi-AZ/region patterns, pragmatic RPO/RTO.
- Security: Data classification, KMS encryption, tokenization where needed, least-privilege IAM, immutable audit.
- Hands-on build: Python/Scala/SQL; Spark/Glue; Step Functions/Lambda; IaC (Terraform); CI/CD (GitLab/Jenkins); automated tests.
Must-Have Skills
- Streaming & EDA: Kafka (Confluent) and AWS MSK/Kinesis/Kinesis Firehose; outbox, ordering, replay, exactly/at-least-once semantics. EventBridge for event routing and filtering.
- Schema management: Avro/Protobuf + Schema Registry (compatibility, subject strategy, evolution).
- AWS data stack: S3/Glue/Athena, Redshift, Step Functions, Lambda; Iceberg-ready lakehouse patterns. Kinesis→S3→Glue streaming pipelines; Glue Streaming; DLQ patterns.
- Payments & ISO 20022: PAIN/PACS/CAMT, lifecycle modeling, reconciliation/advices; API/File/SWIFT channel knowledge.
- Governance: Data-mesh mindset; ownership, quality SLAs, access, retention, lineage.
- Observability & FinOps: Build dashboards, alerts, and cost KPIs; troubleshoot lag/throughput at scale.
- Delivery: Production code, performance profiling, code reviews, automated tests, secure by design.
Data Architecture Fundamentals (Must-Have)
- Logical data modeling: Entity-relationship diagrams, normalization (1NF through Boyce-Codd/BCNF), denormalization trade-offs; identify functional dependencies and key anomalies.
- Physical data modeling: Table design, partitioning strategies, indexes; SCD types; dimensional vs. transactional schemas; storage patterns for OLTP vs. analytics.
- Normalization & design: Normalize to 3NF/BCNF for OLTP; understand when to denormalize for queries; trade-offs between 3NF, Data Vault, and star schemas.
- CQRS (Command Query Responsibility Segregation): Separate read/write models; event sourcing and state reconstruction; eventual consistency patterns; when CQRS is justified vs. overkill.
- Event-Driven Architecture (EDA): Event-first design; aggregate boundaries and invariants; publish/subscribe patterns; saga orchestration; idempotency and at-least-once delivery.
- Bounded contexts & domain modeling: Core/supporting/generic subdomains; context maps (anti-corruption layers, shared kernel, conformist, published language); ubiquitous language.
- Entities, value objects & repositories: Domain entity identity; immutability for value objects; repository abstraction over persistence; temporal/versioned records.
- Domain events & contracts: Schema versioning (Avro/Protobuf); backward/forward compatibility; event replay; mapping domain events to Kafka topics and Aurora tables.
Nice-to-Have
- QuickSight/Tableau; Redshift tuning; ksqlDB/Flink; Aurora Postgres internals.
- Edge/API constraints (Apigee/API-GW), mTLS/webhook patterns.
Operating Model
- Act as data product owner/architect with CX/Payments/CPO.
- Maintain backlog, SLAs, contracts.
- Publish runbooks, SLOs, and quarterly cost/quality reports.
Apply for this job
*
indicates a required field