Principal Edge AI Runtime & Compiler Engineer
Company Overview
Ambiq is on a mission to enable intelligence everywhere — powering the AI edge revolution with the world's lowest-power semiconductor solutions.
Built on our proprietary sub- and near-threshold technology, our chips deliver multi-fold improvements in energy efficiency without costly process scaling. Since 2010, we've shipped over 290 million units to customers building smarter wearables, medical devices, IoT products, and AI-powered edge applications.
Our cross-functional teams span design, research, development, production, marketing, sales, and operations across Austin, Hsinchu, Shanghai, Shenzhen, and Singapore. We move fast, tackle hard problems, and create space for people to grow through complex, meaningful work that shapes the future of technology.
We're looking for self-motivated, creative problem-solvers who are eager to push technological limits and make a real impact in energy efficiency.
At Ambiq, we live by five values: Innovate. Collaborate. Focus. Learn. Achieve.
If that's you, join us — the intelligence everywhere revolution starts here.
Scope
Ambiq is seeking a Principal Edge AI Runtime & Compiler Engineer to build and optimize the embedded software stack that powers real-time, battery-powered on-device AI. You will develop Ambiq’s AI runtimes, performance-critical operator libraries, and profiling/debug tooling that enable customers to deploy efficient inference on resource-constrained devices.
While the cloud has been the default home for AI, the next frontier is distributing intelligence everywhere—directly onto real-world devices. Edge AI enables real-time responsiveness, stronger privacy, lower bandwidth cost, and reliable operation even without connectivity. This role helps accelerate the shift to on-device intelligence across a rapidly growing ecosystem of health and fitness wearables, smart glasses, industrial IoT, and always-on sensors.
You’ll work closely with platform, silicon, and applied ML teams—and directly with customers—to ensure our software extracts maximum performance and energy efficiency from Ambiq hardware while remaining easy to integrate into production embedded toolchains.
Responsibilities
- Develop and support Ambiq’s embedded AI runtimes (HeliaRT—our fork/extension of TensorFlow Lite for Microcontrollers—and HeliaAOT) with focus on portability, correctness, performance, and usability.
- Implement and optimize ML operator kernels and embedded libraries for on-chip acceleration (DSP, vector, NPU), including HeliaDSP and HeliaCore components.
- Build and maintain on-device profiling and performance analysis tools, including converting PMU counters into actionable insights.
- Drive improvements in latency, memory footprint, and energy (e.g., joules/inference) through compute/bandwidth and memory-hierarchy analysis.
- Develop benchmark harnesses, microbenchmarks, and regression tests to ensure numerical correctness and prevent performance regressions.
- Enable seamless customer integration across embedded environments and toolchains (bare metal, FreeRTOS, Zephyr).
- Improve memory planning/runtime efficiency and manage upstream/fork health; publish and maintain customer-facing assets (docs, guides, examples, benchmarks).
Qualifications
- BS in Electrical/Computer Engineering, Computer Science, or related field + 5+ years relevant experience (or equivalent practical experience). MS is a plus, especially in embedded systems, compilers, computer architecture, or ML systems.
- Strong experience designing, developing, and testing embedded software in C/C++.
- Strong debugging discipline with an emphasis on correctness, reproducibility, and performance regression prevention.
- Solid understanding of compute, memory, and bandwidth/cache effects on deterministic latency and energy efficiency in constrained systems.
- Ability to interpret hardware/software documentation (datasheets, reference manuals; schematics a plus).
- Proficiency with Git (or equivalent version control).
- Efficient use of AI-assisted development tools to improve productivity while maintaining engineering rigor.
- Experience with embedded development workflows: Arm toolchains, GDB + J-Link/SEGGER-class probes, and CMake/Make build systems.
Nice to have:
- Python for profiling/analysis tooling, automation, and developer utilities.
- Rust experience.
- Familiarity with TensorFlow Lite for Microcontrollers (or similar embedded inference runtimes).
- Experience optimizing for embedded acceleration targets (e.g., DSP, vector extensions, NPU).
- Familiarity with CMSIS-NN, CMSIS-DSP, and/or Vela (or similar).
- Working knowledge of quantized inference (e.g., int8, per-channel) and embedded debugging implications.
- Packaging/integration (e.g., CMSIS-Pack/CPM.cmake) and/or lab measurement experience (power/debug with basic bench tools).
Must be currently authorized to work in the United States for any employer. We do not sponsor or take over sponsorship of employment visas (now or in the future) for this role.
Create a Job Alert
Interested in building your career at Ambiq Micro, Inc.? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
