Contract: Senior AI Engineer
Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to over 30% of the Fortune 100 with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.
Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.
This is an engagement through Upwork’s Hybrid Workforce Solutions (HWS) Team. Our Hybrid Workforce Solutions Team is a global group of professionals that support Upwork’s business. Our HWS team members are located all over the world.
This hybrid engagement supports the development and production hardening of Natural Language Query (NLQ) systems, AI agents, and related applications powering talk-to-data experiences. The work focuses on improving NLQ accuracy and semantic understanding while building robust, scalable, and observable AI systems suitable for enterprise production environments.
This engagement requires senior-level software engineering depth, combined with AI evaluation and semantic modeling expertise, to translate research concepts into reliable, deployable systems.
Work/Project Scope:
- Design, implement, and maintain production-grade AI services supporting NLQ and AI agent workflows.
- Evaluate the NLQ system accuracy using quantitative and qualitative methods (precision/recall, semantic correctness, result equivalence).
- Build automated evaluation pipelines and regression test suites integrated into CI/CD workflows.
- Design and validate ontology-driven semantic models and knowledge graphs to improve NL answers accuracy and consistency.
- Analyze NLQ failures and define structured error taxonomies, instrumentation, and logging to drive continuous improvement.
- Develop and expose well-designed APIs for NLQ services, evaluation systems, and semantic layers.
- Contribute to system architecture decisions including service boundaries, scalability, latency, and reliability tradeoffs.
- Support cloud-native deployment, monitoring, and operational readiness of AI services.
Must Haves (Required Skills):
- Strong software engineering background building production systems (Python, APIs, distributed services).
- Experience evaluating NLQ or LLM-based systems, including precision/recall and semantic correctness.
- Hands-on expertise designing automated test frameworks and evaluation pipelines.
- Experience with cloud-native architectures, CI/CD, and production deployment of AI systems.
- Knowledge of semantic modeling, ontologies, or knowledge graphs in data-driven applications.
- Proven ability to design scalable, reliable systems bridging AI models, data platforms, and applications.
Upwork is an Equal Opportunity Employer committed to recruiting and retaining a diverse and inclusive workforce. We do not discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, or other legally protected characteristics under federal, state, or local law.
Please note that a criminal background check may be required once a conditional job offer is made. Qualified applicants with arrest or conviction records will be considered in accordance with applicable law, including the California Fair Chance Act and local Fair Chance ordinances. The Company is committed to conducting an individualized assessment and giving all individuals a fair opportunity to provide relevant information or context before making any final employment decision.
To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice
Create a Job Alert
Interested in building your career at Upwork? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
