Job Application for AI Systems QA Engineer at Meltplan

MeltPlan is building the “planning engine” for the $14 Tn construction industry, an AI system designed specifically to optimize decisions before construction begins. While design software optimizes use and aesthetics and construction software optimizes execution and control, MeltPlan is building the missing layer - software that optimizes decisions and tradeoffs upstream, before scope is locked, procurement begins, and change orders become inevitable. MeltPlan’s long-term goal is to help teams make construction “boring” by making planning more intense: surfacing constraints and tradeoffs early, aligning stakeholders before plans are frozen, and reducing the need for late-stage redlines, rework, and change orders.

MeltPlan is founded by operators who have built at scale. Kanav previously co-founded Innovaccer, a $3Bn healthtech company focused on making US healthcare more affordable and accessible. He’s now applying that systems-level thinking to construction.He’s joined by Tanmaya Kala, former Project Executive at DPR Construction, who led large commercial, healthcare, and life sciences projects. We combine deep tech scale with real construction execution.

What This Role Really is :

We are seeking a detail-oriented and technically strong AI QA Engineer to ensure the quality, reliability, and performance of Large Language Model (LLM)-based systems. In this role, you will be responsible for designing and executing test strategies, validating model outputs, and building evaluation frameworks to enhance the accuracy, safety, and overall performance of AI-driven applications.We would particularly value candidates who have hands-on experience in developing evaluation frameworks (evals) for AI systems, along with strong expertise in comprehensive system testing and quality assurance practices.You are responsible for making MeltPlan work in the real world.

What You'll Do:

Design, develop, and execute evaluation frameworks (Evals) for Large Language Models (LLMs) and AI systems.
Perform end-to-end system testing, regression testing, and performance testing for AI-driven applications.
Validate model outputs for accuracy, consistency, safety, hallucination detection, and edge cases.
Build automated test pipelines and quality benchmarks for AI systems.
Collaborate closely with AI/ML engineers, product teams, and platform engineers to improve system reliability.
Analyze failures, identify root causes, and provide actionable feedback to improve model behavior.
Develop datasets, prompts, and testing scenarios to measure model performance across multiple use cases.
Monitor production performance and continuously improve evaluation metrics and testing standards.
Ensure compliance with responsible AI and quality assurance best practices.

What We're looking for:

Bachelor’s degree in Computer Science, Engineering, or related field
5–8 years of experience in QA/testing, preferably in AI/ML or data-driven systems
Strong experience in AI/LLM evaluation frameworks and system testing.
Hands-on experience with automated testing methodologies and QA processes.
Familiarity with prompt engineering, AI benchmarking, and model validation techniques.
Experience working with Python and testing frameworks.
Understanding of LLM behaviors, hallucinations, prompt injection risks, and AI safety concepts.
Exposure to tools/frameworks such as OpenAI Evals, LangSmith, DeepEval, Promptfoo, or similar platforms is preferred.
Strong analytical and debugging skills with attention to detail.
Excellent collaboration and communication skills.
Familiarity with Large Language Models and Generative AI concepts
Experience with API testing tools (e.g., Postman) and automation frameworks
Understanding of NLP concepts such as tokenization, embeddings, and text generation
Strong analytical and problem-solving skills
Experience testing AI/ML models or data pipelines
Experience with prompt engineering and prompt testing
Familiarity with cloud platforms (AWS, GCP, or Azure)
Exposure to AI safety, bias detection, and model governance

Bonus if you have:

Have worked in construction or on project sites
Have startup experience
Experience working with Generative AI or conversational AI products.
Knowledge of CI/CD pipelines and automation workflows.
Prior experience in performance testing and monitoring distributed systems.
Understanding of AI product lifecycle and production deployment environments.

We’re not looking for someone who waits for clean requirements.We’re looking for someone who thrives in the mess and turns it into systems.

Why meltplan

Massive industry, real-world impact
High ownership from day one
Small team, zero bureaucracy
Competitive comp + meaningful equity

AI Systems QA Engineer

What This Role Really is :

What You'll Do:

What We're looking for:

Why meltplan

Apply for this job