Job Application for Lead Site Reliability Engineer (SRE) at Optimal Market Technologies, LLC

About Us:

Optimal Market Technologies is a FINRA-registered broker-dealer with an innovative platform of wholesale options and execution solutions designed to meet the needs of today’s dynamic market participants. Optimal can quickly and seamlessly enhance its platform, adapt to regulatory changes, and offer a more efficient implementation that scales as clients grow. Optimal is disrupting the US Options Wholesale landscape by introducing a Competition for Order Flow model through which market makers compete for retail flow on our platform based on best execution quality. Additional offerings include an Institutional Alternative Trading System (ATS) for discreet execution and price improvement and Execution Services for institutional-grade routing and algorithmic execution designed to balance price capture, speed, and market impact across trading objectives.

This is an opportunity to work for a fintech start-up and to be an integral part of growing a broker-dealer from the ground up. As a small, hardworking team, we pride ourselves on our entrepreneurial culture and cross-team collaboration. We recently raised funding from several strategic partners and are in a very exciting stage of growth.

Position: Lead Site Reliability Engineer (SRE)
Company: Optimal Market Technologies, LLC
Location: Chicago or NYC, hybrid
Salary: $150,000 - $250,000 base salary + discretionary bonus (commensurate with experience)

Position Overview:

We're seeking a Lead Site Reliability Engineer to oversee our production systems administration. We are growing from hands-on, individual-knowledge work to an engineering-run discipline: automated, reliable, and built on published standards. You will build and lead our systems administration function, professionalize how we run our infrastructure, and reduce key-person risk, partnering with the development team to keep the firm running reliably and moving fast. This is a hands-on role; you will build and operate what you put in place. You will report to the CTO.

Our environment:
We run an automated trading system with single-digit-millisecond latency requirements on bare-metal Linux. We use Azure for development environments, storage, and offline studies, not for production execution. Systems are written in C++, Python and SQL. We are actively modernizing, upgrading technologies (e.g. CentOS7 to RHEL9), and have legacy and new systems running in parallel. You will lead rollout of a stream of technology changes.

Primary Responsibilities:

Infrastructure and systems administration:

Own how production runs across colocation and the cloud: deployment, capacity, and failover.
Build and lead the systems administration function: mentor existing staff, set how the function works, and hire as we grow.
Set and publish the engineering standards and strategy for how we run production.
Hands-on Linux and network administration; automate routine work through Infrastructure as Code.
Manage vendors and service agreements; advise on build-vs-contract-out.
Own infrastructure security: hardening, access control, recoverable backups, and security incident response.

Production support, incident response, resilience, and performance:

Assist first-line production support, reducing reliance on the development team.
Be accountable for production stability: track what breaks and why, and turn repeat firefighting into automation that prevents it.
Own incident response, on-call, and post-incident review; coverage is market-hours plus a support rotation.
Own recovery runbooks, and recovery drills.
Automate client self-service for common issues and access to their own data, reducing manual support work.
Partner with the development team on deployments, and on performance tracking and capacity planning.

What We're Looking For:

Strong scripting and automation skills.
Strong hands-on Linux and network administration.
Expertise with Infrastructure as Code (we are open on which tools).
Experience using AI tools, ideally Claude Code.
A track record owning production support and incident response.
Experience managing and developing technical staff.
The ability to bring structure, standards, and strategy to a function as it grows and matures.
Strong communication; effective with senior stakeholders and a small team.

Preferred Qualifications:

Experience at a start-up, or building a new line or function inside a larger firm; comfortable under resource constraints and automation-first by instinct.
Experience in real-time critical systems.

Familiarity with any of: FIX Protocol, PostgreSQL, middleware (ideally AERON), observability and monitoring tooling.Microsoft Azure, including Azure Virtual Desktop (AVD).Vendor management.

Salary Range

$150,000 - $250,000 USD

Optimal is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Lead Site Reliability Engineer (SRE)

Apply for this job