Job Application for Principal Engineer

Company Overview

Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world’s most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrow’s challenges, anticipate the risks our clients encounter, and design advanced solutions to help our clients achieve transformational business outcomes.

Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation. Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities. We value intellectual curiosity, proactive ownership, and collaboration with colleagues, and we empower you to meaningfully contribute from day one and accelerate your professional development.

We are looking for an exceptional engineer to provide expert-level technical leadership for our Database Reliability Engineering (DBRE) platform. This is a hands-on individual contributor role that owns the architectural direction for our most complex database reliability challenges - high availability, disaster recovery, observability, and platform automation — across thousands of SQL Server, Aurora PostgreSQL, and Snowflake environments running mission-critical workloads for the world’s most sophisticated financial institutions.

What you’ll do:

Drive architectural direction for the database platform across SQL Server, Aurora PostgreSQL, and Snowflake — covering high availability, disaster recovery, replication, backup and recovery, capacity, performance, and security.
Own complex, cross-cutting initiatives such as cross-region disaster recovery, platform refresh orchestration, alerting redesign, and cost optimization, taking each from problem statement through to a deployed, owned solution.
Lead by example with exemplary code, design documents, RFCs, and runbooks, setting the standard for technical writing, code quality, and operational rigor across the DBRE team.
Reduce operational toil by engineering automation across provisioning, refresh, patching, scaling, failover, and decommissioning — treating manual operations as bugs to be eliminated.
Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality, partnering with application teams on alert ownership, attribution, and SLA design.
Drive incident response and root-cause analysis for the most complex production incidents, and convert RCAs into platform-level improvements that prevent recurrence.
Define reliability KPIs (availability, MTTR, alert sustainability, SLA adherence) and build the dashboards and reporting cadence to track them.
Partner with application engineering, infrastructure, and SRE teams on schema design, query performance, data lifecycle, and shared reliability patterns, and engage senior leadership on strategy, multi-quarter roadmaps, and budget trade-offs.

What you’ll need:

A bachelor’s or master’s degree in computer science, Engineering, or a related field with 9+ years of professional engineering experience, including significant time in a principal-level or equivalent individual contributor role.
Deep, hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication, HA/DR architectures, performance tuning, query optimization, and internals.
Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking, EC2, EBS, FSx, IAM, RDS/Aurora, and cross-region replication.
Strong programming skills in at least one of Python, PowerShell, Go, or T-SQL — capable of writing production-quality automation, not just scripts.
A proven track record designing and delivering large-scale reliability initiatives (HA/DR, observability, automation platforms) with measurable outcomes.
Experience leading complex incident response, root-cause analysis, and post-incident improvement programs in 24x7 environments.
Experience with observability platforms (Datadog, Prometheus, Grafana), modern alerting design, infrastructure-as-code (Terraform, CloudFormation), and CI/CD pipelines (GitLab CI, Jenkins).
Exceptional verbal and written communication skills, with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering, infrastructure, and business teams.
Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus.

Arcesium's Personal Data Privacy Notice for Candidates is linked here.

Recruiting Security
Emails from genuine Arcesium recruiters who are employees of the company will always come from the @arcesium.com domain. In some cases, you may also be contacted by independent search firms engaged to recruit on our behalf; emails from their employees should always come from their firm's applicable domain. We'll never ask for your banking information or any payment as part of the recruiting process. If something seems off or you're contacted by an unexpected third party, please reach out to us at careers@arcesium.com (US/UK), careers-india@arcesium.com (India) or careers-europe@arcesium.com (Portugal/Sweden).

Arcesium is an equal opportunity employer.

Create a Job Alert

Interested in building your career at Arcesium LLC? Get future opportunities sent straight to your email.

Principal Engineer - DBRE

Company Overview

Apply for this job