New

Peach Pilot — Principal QA Engineer

United States

Peach Pilot: Principal QA Engineer (AI Systems & Platform)

Fully Remote — US Based  |  Atlanta, GA a Plus  |  Full-Time  |  Early Engineering Team

The Company

Most AI companies sell tools. We transform how businesses run.

Peach Pilot builds a platform that ingests everything about how a company operates — every system, every process, every signal — and constructs a Company Brain: a living knowledge graph that connects people, decisions, and outcomes across the entire organization. We deploy 92 pre-built AI agents that work together across every business function, governed by humans at every critical step. The system gets smarter with every interaction.

We don't sell software licenses. We embed into a client's operation, learn their business in weeks, show them what's broken backed by their own data, and redesign their highest-impact business functions with AI. Our first vertical is insurance. Our first client engagement is already scoped and funded.

Peach Pilot is co-founded by Mario Montag (Predikto acquired by a Fortune 50; McKinsey, PwC) and JP James (Hive Financial Assets, Georgia Tech, TITAN 100). We have a working platform with live infrastructure and a proven data-to-insights methodology.

The Role

This is a founding QA hire. You will build and own the QA function from the ground up — writing test code, designing evaluation pipelines, and setting the quality bar before it reaches a client. We are not looking for someone who manages spreadsheets and delegates everything. We are looking for someone who can do the work, knows what good looks like, and builds a foundation that scales.

At Peach Pilot, quality is not just about whether buttons work. You are validating whether AI-generated analysis and agent recommendations are accurate enough to show to a CFO or CEO. One wrong finding in a Company X-Ray — the core deliverable that drives every client engagement — can break trust that took weeks to build. You are the last line of defense before our platform reaches a client's desk.

The Challenge: QA for AI Is a Different Problem

Traditional QA assumes deterministic outputs. AI agents don't give you that. You will be building a quality function from scratch in an environment where:

  • 92 AI agents coordinate across business functions — agent outputs must be accurate, auditable, and aligned with human-in-the-loop governance at every critical step.
  • Multi-model routing (Claude, GPT, and others) means the same input can produce different outputs depending on which model handled it — and all of them need to meet the same quality bar.
  • The Company X-Ray is our highest-stakes deliverable: a detailed analysis of a client's operations backed by their own data. Every finding must be reliable before it goes in front of a leadership team.
  • Your end users are CEOs and operations leaders who have never used a terminal. A confusing output or a wrong recommendation doesn't just create a bug ticket — it kills adoption.

What You Will Own & Build

Build the QA Foundation (First 90 Days)

  • Establish the testing framework from zero: unit, integration, end-to-end, and AI-specific evaluation pipelines using Playwright and Vitest.
  • Define quality standards, test coverage requirements, and documentation practices in partnership with the Lead Engineer.
  • Audit the existing platform and identify the highest-risk surfaces before the next client deployment.
  • Define the team structure you will need — onshore vs. offshore mix, roles, and a hiring roadmap — and begin executing against it.

AI Agent & Knowledge Graph Testing

  • Design evaluation frameworks for non-deterministic LLM outputs — including prompt regression testing, model drift detection, and output quality scoring.
  • Build automated test suites for the agent orchestration layer, including governance agent audit trail integrity and human-override behavior.
  • Validate the Company Brain (Memgraph + Qdrant) for data accuracy, retrieval quality, and failure modes under real enterprise data conditions — including entity resolution across systems and temporal data patterns.
  • Test the Analysis Engine pipeline that surfaces Company X-Ray findings — ensuring that insights are not just technically accurate but reliable enough to present to a client.

Platform & Integration Testing

  • Own end-to-end testing of the data ingestion pipelines that connect to client systems — CRM, email, calls, calendars, documents, financial systems — through Nango's 700+ connector integration layer.
  • Test multi-model routing logic to confirm cost-optimized task allocation behaves correctly across LLM providers via LiteLLM.
  • Validate streaming response handling, latency thresholds, and graceful degradation when a model is unavailable or slow.
  • Own file ingestion pipeline testing (Word, Excel, PowerPoint, PDF) including encryption, formatting edge cases, and audit trail continuity.

Build and Lead the QA Team

  • Recruit, hire, and onboard QA engineers as the team grows — setting clear expectations, working standards, and a bar for technical excellence from day one.
  • Mentor junior and mid-level QA engineers, building their ability to own test domains independently.
  • Act as the quality culture carrier across the full engineering team — QA is not a department, it is everyone's responsibility.
  • Report directly to the Lead Engineer and participate in product planning to ensure quality is designed in, not bolted on.

Who You Are

  • 7+ years of QA engineering experience, with at least 3 years in a senior or lead capacity where you shaped process and standards — not just executed them.
  • You have tested AI/LLM-powered applications. You understand prompt sensitivity, output variance, and how to build eval pipelines that catch regressions across model updates.
  • You write test code. Python is your primary tool. You have built and maintained CI/CD-integrated test suites and you don't wait for someone to file a bug to find one.
  • Hands-on experience with Playwright and Vitest in a production environment — and you've built automation frameworks from scratch, not just inherited them.
  • Comfortable testing complex API chains, async/streaming responses, and multi-service workflows. Data pipelines and knowledge graph outputs don't intimidate you.
  • You have built a QA function from the ground up in an early-stage environment. You know when to move fast and when to go deep.
  • You test for confusion and trust failure — not just broken functionality. Your end users are non-technical executives, and you advocate for them.

The Stack You'll Test Against

  • AI/LLM: Anthropic Claude, OpenAI GPT, LiteLLM (multi-model routing), custom agent orchestration with reinforcement learning
  • Backend: Python (FastAPI), async agent runtime, Pydantic
  • Data & Graph: Memgraph · Neo4j · Qdrant · PostgreSQL · Redis
  • Frontend: React/Next.js, TypeScript, Tailwind CSS
  • Integrations: Nango (700+ connectors)
  • Infrastructure: Google Cloud Platform (Cloud Run, GCE, Firebase) · Azure (Cosmos DB, AI Search) · GitHub Actions CI/CD · Docker

We are cloud-agnostic across GCP and Azure. The right hire will help shape how we deploy and scale across both.

  • Testing: Playwright, Vitest

Even Better If

  • You have experience with LLM evaluation frameworks (e.g., LangSmith, PromptFlow, or custom eval pipelines).
  • You have tested agent frameworks or orchestration layers in a production environment.
  • You have a background in a regulated industry (insurance, finance, healthcare) where audit trail integrity is non-negotiable.
  • You have worked alongside Forward Deployed or solutions engineering teams and understand field deployment risk.

What Makes This Different

You are building the QA function and the team — not inheriting either. Your decisions will define how this company ships software and delivers AI-generated insights to clients for years to come.

You will work directly with the founding engineering team. Your findings shape the roadmap, not a backlog queue. Real enterprise data, real client deployments, real consequences.

Compensation & Benefits

  • Compensation: 140,000 – 180,000 range depending on experience – with benefits
  • Structure: Full-Time
  • Location: Fully Remote — US Based (Atlanta, GA candidates welcome)

The clincher: Tell us about a quality failure — one you caught before it shipped, or one that got through. What did you build or change after it, and how did you make sure your team could catch the next one without you?

Create a Job Alert

Interested in building your career at Hive Financial Systems? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


How are you actively using AI tools in your QA work — whether that's building eval pipelines, using tools like Claude, ChatGPT, or OpenClaw to generate or review test cases, automating QA workflows, or testing AI-generated outputs? Be specific about what you built or automated and what the outcome was.

Select...

The team is based in Atlanta - are you comfortable and willing to have some time overlap? 

Have you built a QA function from scratch at an early-stage company? If yes, briefly describe what you built and what the team looked like when you left it.

Select...

Do you require any type of immigration sponsorship now or in the near future?

Tell us about a quality failure — one you caught before it shipped or one that got through. What did you build or change after it, and how did you make sure your team could catch the next one without you?