Senior Staff Engineer - Kafka
About Nubank
Nu is the world’s largest digital banking platform outside of Asia, serving over 105 million customers across Brazil, Mexico, and Colombia. The company has been leading an industry transformation by leveraging data and proprietary technology to develop innovative products and services. Guided by its mission to fight complexity and empower people, Nu caters to customers’ complete financial journey, promoting financial access and advancement with responsible lending and transparency. The company is powered by an efficient and scalable business model that combines low cost to serve with growing returns. Nu’s impact has been recognized in multiple awards, including Time 100 Companies, Fast Company’s Most Innovative Companies, and Forbes World’s Best Banks. Learn more: https://international.nubank.com.br/careers/
About the Role
We are opening this position in the US to find an engineer with a highly specialized skill set that is critical to our current needs. The messaging team is responsible for maintaining and operating a high-throughput messaging platform built on Apache Kafka. This system processes trillions of messages every month and underpins mission-critical, low-latency systems used across clusters. This role is critical for the future success of the product.
As Nubank rapidly expands to support billions of customers across multiple countries, over 100 financial products, and 100+ business platforms, running on infrastructure spanning over multiple AWS accounts, we face significant scalability and operational challenges. Our current infrastructure includes approximately 200 Kafka clusters, 1500 brokers, more than 200k topics, and processes 300 TB of data daily. This role will directly address challenges such as:
- Operating and managing over 200 Kafka clusters efficiently.
- Supporting and monitoring over 1 million Kafka topics.
- Handling over 1PB of data flowing through Kafka daily, ensuring minimal latency and high reliability.
- Optimizing resource utilization and controlling costs in a smart, sustainable way.
- Extending asynchronous communication patterns beyond Kafka to meet future needs.
- Maintaining performance, reliability, and containing blast radius as message volume grows exponentially.
Our challenges:
- In this role, you will be responsible for tackling the key challenges associated with our rapidly expanding, high-throughput Apache Kafka messaging platform. Your main activities and goals will include:
- Operating and managing a large-scale Kafka infrastructure, including hundreds of clusters and thousands of brokers.
- Ensuring high reliability and minimal latency for systems processing petabytes of data daily.
- Developing strategies to efficiently support and monitor over a million Kafka topics.
- Optimizing resource utilization and implementing cost-effective solutions for our Kafka ecosystem across multiple AWS accounts.
- Driving the evolution of asynchronous communication patterns, potentially extending beyond Kafka to meet future needs.
- Implementing solutions to maintain performance and reliability and contain potential issues as message volume grows exponentially.
We are looking for a person who has
- Essential skills and experience in maintaining and operating high-throughput messaging platforms built on Apache Kafka.
- Proven expertise with Kafka in large-scale, mission-critical environments.
- Experience with infrastructure spanning multiple AWS accounts.
- Experience in companies that manage similar complex, large-scale data/messaging systems.
- A specialized skill set related to the challenges outlined above.
Benefits
- Remote work, with quarterly trips to Sao Paulo to build relationships with coworkers.
- Top Tier Medical Insurance
- Top Tier Dental and Vision Insurance
- 20 days time off, 14 company holidays, and great culture that emphasizes work life balance.
- Life Insurance and AD&D
- Extended maternity and paternity leaves
- Nucleo - Our learning platform of courses
- NuLanguage - Our language learning program
- NuCare - Our mental health and wellness assistance program
- Extended maternity and paternity leaves
- 401K
- Saving Plans - Health Saving Account and Flexible Spending Account
Role Location
- Our Nu Way of Working Our work model is hybrid and has cycles that can be from two to three months according to the business of expertise. For every eight or twelve weeks of remote work, one will be at the office.
Apply for this job
*
indicates a required field