.png?1773247664)
DevOps Engineer
Later is the world’s most intelligent influencer marketing company, built to give brands the confidence to create unforgettable campaigns. By combining real creator relationships, trusted intelligence, and expert guidance, Later removes fear and guesswork from one of marketing’s most visible investments.
Built on a native, AI-powered platform and more than a decade of proprietary data—including billions of social interactions, impressions, and $2.4B+ in verified influencer-driven purchases—Later helps teams understand what will work before they launch.
By combining trusted insight with expert guidance, Later removes guesswork from influencer marketing, enabling brands to choose the right creators, execute fully managed campaigns, and drive meaningful growth across awareness, engagement, and revenue. Trusted by leading enterprise brands including Nike, Wayfair, Unilever, and Southwest Airlines, Later bridges creativity and performance so campaigns don’t just look good—they deliver results. Learn more at later.com.
About this position:
We’re looking for a DevOps Engineer to help support and grow Later’s cloud infrastructure, DevOps practices, and emerging MLOps capabilities.
This role reports to the Infrastructure team and is primarily focused on DevOps, AWS, Kubernetes, CI/CD, Terraform, while also helping support the infrastructure needs of our Data teams. You’ll work closely with senior engineers, platform teams, data scientists, and product engineers to help build the systems that support application delivery, data workflows, model experimentation, and ML deployment.
This is a great role for someone who has a solid DevOps foundation and wants to grow deeper into cloud infrastructure, Kubernetes, GitOps, and MLOps. You’ll help maintain reliable infrastructure, improve deployment workflows, automate repeatable tasks, and support the systems that allow engineering and data teams to move faster and more safely.
What you'll be doing:
Strategy
- Support the development and execution of the infrastructure roadmap across DevOps, and MLOps, aligned with product, and data/AI growth plans.
- Partner with engineering, data, and ML teams to ensure scalability, reliability, security, and automation are built into both application infrastructure and machine learning workflows.
- Help evaluate and adopt DevOps and MLOps tools that improve system efficiency, observability, developer experience, model deployment, and operational reliability.
- Contribute to platform standards that make infrastructure, CI/CD, data pipelines, and ML systems more repeatable, secure, and easier to operate.
- Support the evolution of cloud-native practices that enable faster product delivery while also preparing the platform for future AI and ML initiatives.
Technical/ Execution
- Build and manage infrastructure to deploy ML models into production reliably using CI/CD pipelines , Flask-based APIs, and orchestration tools (e.g., Airflow, Kubeflow, or Argo Workflows).
- Automate training pipelines, model registry, validation, deployment, and rollback strategies using tools such as Amazon SageMaker Interface and Postman for testing.
- Build systems to monitor model performance, latency, data drift, and resource usage using Amazon CloudWatch, Prometheus, and Grafana.
- Design and maintain tools and systems to support model versioning, experiment tracking (e.g., MLflow, Amazon SageMaker Studio Notebooks), and reproducible training workflows.
- Operate across GCP and AWS to manage training/inference infrastructure, BigQuery datasets, and GPU workloads.
- Use tools like Terraform or CloudFormation to manage cloud infrastructure in a scalable, repeatable manner.
- Work with Data Scientists, Analysts, Platform Engineers, and Product Engineers to support their end-to-end ML workflows.
Team / Collaboration
- Partner with Product and Data teams to streamline CI/CD workflows, GitOps practices, and deployment processes that support both application delivery and data/ML workflows.
- Work closely with Data teams to support reliable infrastructure for data pipelines, model experimentation, training workflows, and production ML deployments.
- Collaborate with Product teams to understand platform needs and ensure infrastructure decisions support product reliability, scalability, and delivery speed.
- Share documentation, runbooks, and best practices that help Product and Data teams deploy, monitor, and troubleshoot systems with more autonomy.
- Support a collaborative DevOps and MLOps culture by helping teams adopt automation, observability, and repeatable deployment patterns.
Research/Best Practices
- Stay current with cloud-native, DevOps, data infrastructure, and MLOps trends, identifying tools, patterns, and practices that can improve reliability, automation, scalability, and delivery speed.
- Continuously evaluate infrastructure, CI/CD pipelines, data workflows, model deployment processes, performance, cost, and efficiency, recommending improvements that align with product, engineering, and data team goals.
- Contribute to documentation, runbooks, incident reviews, and post-mortem processes to strengthen operational learning across both DevOps and MLOps practices.
- Help define and share best practices for infrastructure-as-code, GitOps, observability, secure deployments, data pipeline reliability, and ML workflow automation.
- Look for opportunities to simplify systems, reduce manual work, and improve the developer and data team experience through better tooling, automation, and platform standards.
What success looks like:
First 30 Days — Learn, Support, and Understand
- In the first 30 days, the candidate will focus on learning Later’s cloud infrastructure, Kubernetes environments, AWS services, CI/CD workflows, and data/ML platform components.
- They will begin supporting existing deployment pipelines, observability tools, infrastructure documentation, and operational workflows. They will also start becoming familiar with GPU-enabled workloads, SageMaker environments, notebooks, model training jobs, and current ML deployment processes.
- By the end of 30 days, the candidate should understand how our DevOps and MLOps systems are structured, where support is needed, and how Product and Data teams use the platform.
- You are recognized internally as a trusted partner who supports operational excellence, improves platform reliability, and helps raise technical standards across DevOps and MLOps practices.
First 60 Days — Contribute, Automate, and Improve
- By 60 days, the candidate will begin taking ownership of smaller infrastructure and automation tasks across Kubernetes, AWS, CI/CD, and ML platform workflows.
- They will help improve deployment pipelines for applications, infrastructure, and ML workflows, making them more consistent, automated, and easier for teams to use. They will also support GPU node provisioning, monitoring, and optimization for training, experimentation, and inference workloads.
- The candidate will contribute to SageMaker support, including notebooks, model training jobs, access controls, monitoring, and repeatable deployment patterns. They will also help improve documentation, runbooks, alerting, and troubleshooting processes.
First 90 Days — Own, Optimize, and Scale
- By 90 days, the candidate should be contributing with more independence across DevOps and MLOps workflows.
- They will help improve reliability, scalability, security, and observability across Kubernetes clusters, AWS infrastructure, GPU-enabled workloads, SageMaker environments, and data/ML platform components.
- They will support production-ready ML workflows by helping improve experimentation, model deployment, monitoring, rollback, troubleshooting, and operational readiness. They will also help ensure security and compliance requirements are integrated into infrastructure, CI/CD, data pipelines, SageMaker workflows, and platform operations.
- By the end of 90 days, the candidate should be recognized as a trusted partner to Product and Data teams, helping reduce operational overhead through automation, documentation, alerting, GPU resource management, and repeatable platform standards.
What you bring:
- 2–5 years of hands-on DevOps or cloud engineering experience in production environments.
- Expertise with Kubernetes (EKS), Helm, and microservices.
- 1-2 years AWS and SageMaker Experience.
- 1-2 years creating data pipelines
- 1-2 years with LLM deployments on Kubernetes
- 1-2 years with kubernetes clusters using nodes that have GPU
- Proven track record using Terraform for scalable, auditable Infrastructure as Code.
- A collaborative, solution-oriented mindset and a passion for automation and continuous improvement.
How you work:
- Driven by Impact: You deliver results that matter—prioritizing high-value work, meeting deadlines, and adapting quickly while keeping outcomes clear.
- Strategic & Customer-Centric: You anticipate risks and opportunities, connect decisions to long-term growth, and build trust through proactive insights.
- Curious & Growth-Oriented: You seek knowledge, ask sharp questions, and apply learnings fast—challenging the status quo with a mindset of improvement.
- Collaborative & Resilient: You thrive in change by staying resourceful, solution-focused, and positive—removing roadblocks, sharing insights, and keeping morale high.
- Accountable & Honest: You own your work, hold yourself and others to a high bar, and use transparent feedback to drive growth.
- Emotionally Intelligent: You build trust through empathy and collaboration, foster inclusion, and inspire others with grit, optimism, and integrity.
Our approach to compensation:
We take a market-based & data-driven approach to compensation. We leverage data from trusted third-party compensation sources to help us understand the market value of a role based on function, level, geographic location, and scope. We evaluate compensation bi-annually, including performance and market-related factors.
Our salaries are benchmarked against market Total Cash Compensation for the geographic location of our job posting. Compensation for some roles is structured as On Target Earnings (OTE = base + commission/variable) while for others it is structured as Salary only.
To comply with local legislation and ensure transparency, we share salary ranges on all job postings. Skills, experience and other factors help determine the final salary we offer which may vary from the original range posted.
Additionally, all permanent team members are eligible to participate in various benefits plans as part of their overall compensation package.
Salary Range:
$ 120,000 - 140,000 CAD
#LI-Hybrid
Where we work:
We have offices in Boston, MA; Vancouver, BC; Chicago, IL; and Vancouver, WA. For select positions, we are open to hiring fully remote candidates. We post our positions in the location(s) where we are open to having the successful candidate be located.
Diversity, inclusion, and accessibility:
At Later, we are committed to fostering a culture rooted in an inclusion-first mindset at every level of the company, embracing the importance of hiring and building teams for culture add rather than culture fit. We openly build and maintain unbiased hiring, pay, and promotion practices to create a foundation for an equitable workplace, paving the way for systemic change.
We are committed to creating a diverse environment and are proud to be an equal opportunity employer. All applications will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, national origin, disability, or age. Please let us know if you require any accommodations or support during the recruitment process.
Create a Job Alert
Interested in building your career at Later? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field