
Senior Infrastructure Engineer
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, Dataiku meets teams where they are today, allowing them to begin building with AI using their existing skills and knowledge.
Our mission at Dataiku is to enable our customers to bring large-scale data analytics and AI technologies into a centralized and easy-to-use platform. To reach that goal, we are looking for highly motivated engineers to design, scale and maintain our internal and customer-facing infrastructures.
Our current technical stack is mainly running on AWS with some Azure and GCP bits. The tools we use the most are Terraform, Ansible, Kubernetes and Python.
Some expected outcomes for this role:
- Update, scale and maintain our configuration management stack.
- Update and maintain our multi-Cloud Infrastructure as Code repository to keep our fleet monitored, up-to-date, reliable and secure.
- Update, scale and maintain our Kubernetes clusters. This includes adding high availability services, monitoring and building self-healing mechanisms.
- Expand and maintain our CI/CD pipelines to automatically and safely deploy both customer-facing and internal applications as well as artifacts.
- Improve the monitoring we already have in place by scaling the current infrastructure, collecting additional metrics and managing our alerting pipeline.
- Developing automation tools to help us scale the infrastructure without scaling the various teams that depend on it. Specifically, to propagate user identities, access control, and better manage our Cloud resources.
- Apply Cloud security best practices to keep our infrastructure secure and resilient.
What you need to be successful:
- You have experience with at least one configuration management tool: Chef, Ansible, Puppet, SaltStack…
- You have experience with Terraform to manage Cloud resources.
- You are familiar with UNIX-like systems and have troubleshooting experience as well as knowledge of shell scripting.
- You have experience running production load on AWS. Including monitoring, high availability designs and services such as lambda, RDS, Load balancing, CDN, VPC …
- You have experience in designing CI/CD pipelines to save engineering time and increase deployment reliability.
- You have worked in environments where automation is key.
- You have knowledge of application development, ideally in Python.
- You like solving technical problems and you are curious about how things work under the hood.
- You care about documenting the solutions you implement, as well as monitoring the health of those solutions by creating and maintaining actionable alerts.
- You don't hesitate to ask questions when you don't know, and you treat your colleagues with respect, kindness, and transparency.
Bonus:
- Knowledge of the Grafana Labs monitoring stack.
- Knowledge of Web applications designs.
- Some experience with Golang and javascript/typescript languages. #LI-Hybrid
Apply for this job
*
indicates a required field