Senior DevOps AI Engineer
Obsidian Security is the leading SaaS security platform, trusted by global enterprises like Snowflake, T-Mobile, and Algolia. We protect 200+ organizations across North America, Europe, the Middle East, Southeast Asia, Australia, and New Zealand, including many of the world’s largest Fortune 1000 and Global 2000 companies.
Founded in 2017 and backed by top investors like Greylock, Obsidian was built to close a critical gap: securing SaaS apps where business happens—Microsoft 365, Salesforce, and hundreds more. The company does this by offering a complete SaaS security platform to reduce risk, detect and respond to threats, and prevent breaches at the source. Obsidian was built by leaders who redefined endpoint and identity security at CrowdStrike, Okta, Cylance, and Carbon Black. Now, they’re transforming how SaaS is secured.
With AI driving rapid SaaS growth and complexity, agentic AI tools gain privileged access to sensitive data through integrations, creating new risks most security tools miss. Obsidian uniquely detects anomalous OAuth token activity and manages integration risks. Major announcements are on the horizon. Recognizing that SaaS security needs to evolve, Obsidian enables growing organizations to start with a lightweight, prevention-focused browser extension and expand coverage over time.
With global momentum, a growing partner ecosystem including SentinelOne, Databricks, and Google Cloud, and a major fundraise ahead, Obsidian is scaling rapidly toward long-term growth and IPO readiness.
About the Team
DevOps focuses on providing an end-to-end service to turn software into live services. We work closely with Engineering, QE, and Customer Support teams to continuously improve engineering productivity and service reliability. We are also building Sherlock, an AI-powered SRE agent that automates incident investigation, root cause analysis, and runbook execution — and we need engineers who can both keep the infrastructure running and push the frontier of what AI-driven operations can do.
About the Role
Based in Sydney, Australia, this is a hybrid role for someone who thrives in both worlds: a hands-on infrastructure engineer who can own GCP/AWS cloud operations at scale, and a backend engineer capable of building the AI agent layer that makes Sherlock intelligent and self-improving. You will own core DevOps responsibilities while also contributing to — and eventually leading — Sherlock’s knowledge capture pipeline, investigation state machine, accuracy benchmarking, and Phase 4 capability expansions.
What You’ll Do — Infrastructure & DevOps
- Build and maintain infrastructure across GCP and AWS, including Compute Engine, GCS, GKE, Cloud SQL, Cloud DNS, VPC, PubSub, ElasticSearch, ScyllaDB, Databricks, Kafka, Sentry, Dagster, Airflow, Vault, Consul, Kong, and more.
- Own infrastructure automation with Terraform/Terragrunt, Ansible, and Helm charts.
- Drive microservice delivery via Helm charts, GitLab CI/CD pipelines, and ArgoCD.
- Partner with Engineering on capacity planning, performance tuning, and production maintenance.
- Partner with InfoSec to address production security issues.
- Take on-call shifts and contribute to incident response.
- Address tough scalability, stability, and observability problems.
What You’ll Do — AI SRE Agent (Sherlock)
- Knowledge Capture agent: post-approval LLM summarisation, embedding generation, and structured writes to Jira, Notion, and pgvector.
- Investigation state machine application layer: status transitions, retry logic, and dead-letter handling.
- Accuracy metric (semantic diff) and speed metric — the signals that drive all prompt improvement decisions.
- Regression test framework: replay 50+ historical investigations and gate prompt changes.
- Phase 4 implementations: Customer Impact agent, Runbook Executor agent, and Zoom transcription ingestion into the Fact-Finding context.
About You — Must-Have
- 5+ years of DevOps/SRE experience in GCP and/or AWS.
- Expert in Terraform/Terragrunt, Ansible, Kubernetes, Helm charts, and GitLab CI/CD.
- Proven ability to design deployment architecture and maintain high-scale, multi-layer web services on public cloud.
- Strong experience with k8s service mesh/ingress, autoscaling, and version upgrades.
- 4+ years of backend engineering in Python.
- LLM API experience: tool use, structured output, multi-turn conversations (Anthropic, OpenAI, Bedrock, or Vertex).
- Solid async Python: asyncio, task queues, worker patterns.
- Test-driven development — you write tests before or alongside code, not after.
- Comfort reading and writing SQL; PostgreSQL preferred.
- Computer science or related engineering degree.
- Full working rights in Australia.
About You — Highly Desired
- Multi-agent system design: coordinator-dispatcher patterns, registry-driven agent selection, tool-use orchestration across specialist agents.
- pgvector or other vector search experience.
- Slack API / Bolt framework for Python.
- Jira and Notion API integrations.
- Familiarity with Kafka, Elasticsearch, ScyllaDB, Databricks, Dagster, Sentry, and Kong.
- Prior work on internal DevOps or SRE tooling.
- Ability to diagnose system performance or functional issues from metrics and logs
Create a Job Alert
Interested in building your career at Obsidian Security? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field