
Tenant Health & Monitoring Engineer
Armis, the cyber exposure management & security company, protects the entire attack surface and manages an organization’s cyber risk exposure in real time. In a rapidly evolving, perimeter-less world, Armis ensures that organizations continuously see, protect and manage all critical assets - from the ground to the cloud. Armis secures Fortune 100, 200 and 500 companies as well as national governments, state and local entities to help keep critical infrastructure, economies and society stay safe and secure 24/7.
Armis is a privately held company headquartered in California.
The Tenant Health & Monitoring Engineer is a hands-on technical role responsible for proactively monitoring customer tenants, identifying anomalies, and resolving issues across performance, data quality, collector stability, and integration health. This role plays a critical part in Armis’ shift from reactive issue handling to proactive detection, reducing Customer Team & Support escalations, improving the customer experience, and enhancing the overall health of the Armis platform.
The Engineer will work closely with other teams to contribute to early detection and automation efforts.
This is a technical, operational role ideal for candidates with backgrounds in Network Operations Center (NOC), Site Reliability troubleshooting, monitoring, and incident response, Technical Support, or monitoring teams.
Responsibilities:
- Monitor tenant performance using Grafana, dashboards, reports, and internal logs.
- Detect early signs of tenant degradation, including CPU/RAM spikes, container resets, throttling, and abnormal behavior.
- Track and verify collector stability (online status, high CPU/RAM, disk usage, integration presence).
- Monitor integration health (password/credential issues, connectivity failures, SPAN flow issues, network mapper gaps).
- Identify data quality anomalies, duplication issues, and missing metadata trends.
- Perform log analysis, SSH-based troubleshooting, and telemetry validation to determine root causes.
- Resolve issues directly when possible and escalate appropriately when engineering escalation is required.
- Follow established workflows, runbooks, SLAs, and escalation paths.
- Provide updates and documentation in internal ticketing systems with high accuracy and clarity.
- Support the team in refining monitoring processes, identifying gaps, and improving playbooks.
- Generate daily or weekly summaries of monitored tenants, key findings, emerging patterns, and risks, and report as necessary.
- Collaborate with other teams to resolve complex issues.
- Assist in documenting troubleshooting procedures, new workflows, and technical runbooks.
Key Skills
- Proficiency with monitoring/observability tools (Grafana, Mode, or similar).
- Hands-on troubleshooting experience with Linux systems, logs, and network fundamentals.
- Ability to interpret telemetry signals and correlate them with tenant behavior and system impact.
- Familiarity with SPAN traffic, integrations, and collector-based architectures (preferred).
- Excellent attention to detail in monitoring, documentation, and analysis.
- Capable of prioritizing and managing multiple issues in a high-volume environment.
- Strong sense of ownership and urgency, who seeks to identify problems before they impact customers.
- Ability to explain technical issues in a structured, logical manner.
Min requirements:
- 3+ years in a NOC, Technical Operations, SRE, Support Engineering, MSP Monitoring, or similar role.
- Hands-on troubleshooting experience with Linux, logs, network paths, and system telemetry.
- Experience responding to monitoring alerts, dashboards, or operational incidents.
- Familiarity with observability tools (Grafana, Prometheus, Splunk, Elastic, etc.).
- Ability to interpret complex telemetry (CPU, RAM, DB performance, throughput, packet loss, retries, API failure patterns)
- Strong analytical abilities; capable of building dashboards, defining KPIs, and identifying systemic issues across tenants
- Experience supporting enterprise environments and working with cross-functional technical teams.
- Demonstrates a continuous-improvement mindset, believes that what was good enough yesterday must be improved today, and consistently pushes for higher standards, optimization, and refinement.
Preferred:
- Direct NOC experience in monitoring network, cloud, or infrastructure environments
- Familiarity with Armis components like collectors, related integrations, SPAN traffic, or similar architectures
- Understanding of cybersecurity fundamentals and enterprise network topologies
The choices you make in your career journey matter. You want to do interesting work in an important field while also having time to live your life, which is why we place so much value in your life-work balance. Armis sets you up for success with comprehensive health benefits, discretionary time off, paid holidays including monthly me days, and a highly inclusive and diverse workplace. Put your unique experiences and perspective to work in an environment where they will enable you to thrive, grow, and live your life with integrity.
Armis is proud to be an equal opportunity employer. We never discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected (or not) status. In compliance with federal law, all persons hired will be required to submit satisfactory proof of identity and legal authorization.
Create a Job Alert
Interested in building your career at Armis Security? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field