Back to jobs
New

Senior System Admin - Linux Tools

Mumbai, Maharashtra, India

Job Title: Linux Infrastructure Engineer – HPC & Cloud

Location: Kurla, Mumbai

Type: Onsite, 5 days a week


About Neysa:

At Neysa, we believe a great online experience should just work—intuitively, seamlessly, and powerfully—without making you read the entire manual. Our mission is to craft systems that feel natural and empower users to accomplish tasks efficiently. We’re driven by the idea that in a hyper-connected world, technology should enable, not distract. That’s why we’re building platforms and infrastructure that empower users while quietly handling complexity in the background.


Position Overview: 

Neysa AI is seeking a skilled HPC Linux System Administrator to manage and optimize our high-performance computing infrastructure. In this role, you’ll be responsible for deploying, configuring, and maintaining scalable Linux-based HPC systems that power our AI workloads.

You’ll ensure system performance, reliability, and security across compute clusters, storage, and networking. This role is ideal for someone with deep Linux expertise, experience in HPC environments, and a passion for supporting cutting-edge AI research and development.


Key Responsibilities:

  • Linux Systems Administration
    • Install, configure, harden, and maintain Linux systems (RHEL, CentOS, Ubuntu).
    • Manage system upgrades, patch cycles, kernel tuning, and storage configuration.
  • Automation & Provisioning
    • Create and manage infrastructure-as-code (IaC) using Ansible, Terraform, and shell/Python scripts.
    • Provision bare-metal and virtual infrastructure using Foreman, MAAS, or Cobbler.
  • Monitoring & Observability
    • Set up and optimize tools like Prometheus, Grafana, Zabbix, Nagios, or Telegraf.
    • Generate insights into infrastructure and service performance to detect and resolve anomalies proactively.
  • Security & Compliance
    • Enforce security best practices including SELinux, firewalls, and regular vulnerability assessments.
    • Configure secure access controls (LDAP, SSSD, PAM) and audit policies.
  • Containerization & Orchestration
    • Deploy and manage scalable workloads using Docker and Kubernetes.
    • Design CI/CD workflows and infrastructure using Jenkins, GitLab CI, or ArgoCD.
  • GPU & HPC Technologies
    • Configure and optimize GPU clusters using NVIDIA cards and CUDA libraries.
    • Set up GPUDirect RDMA and NVLink for ultra-low latency data transfer in distributed AI/ML environments.
    • HPC/GPU Benchmarking.
    • Tune performance for parallel workloads and manage Slurm or PBS batch schedulers.
  • Virtualization & Cloud Integration
    • Work with KVM, VMware, and Proxmox.
    • Manage hybrid and public cloud infrastructure via AWS, Azure, or Google Cloud.
    • Implement cloud orchestration and auto-scaling infrastructure for compute-intensive workloads.
  • Collaboration & Mentorship
    • Actively collaborate with DevOps, engineering, and research teams to align system design with workload demands.
    • Mentor junior team members and lead knowledge-sharing initiatives.
  • Documentation & Reporting
    • Maintain clear documentation for procedures, system configurations, and architecture diagrams
    • Create reports on uptime, security compliance, system health, and capacity planning.

Requirement & Qualification:

  • Deep expertise in Linux system administration and performance tuning.
  • Strong scripting skills in Bash, Python, or Perl.
  • Solid understanding of TCP/IP, DNS, DHCP, firewalls, and general network principles.
  • Hands-on experience with Ansible, Terraform, or similar tools.
  • Familiarity with Grafana, Prometheus, Zabbix, and log monitoring stacks (e.g., ELK, Loki).

Good to have skills:

  • Experience with GPU-accelerated workloads (NVIDIA, CUDA, GPUDirect RDMA).
  • Knowledge of Slurm, PBS, or HPC job schedulers.
  • Background in DevOps practices, including GitOps, CI/CD pipelines, and Infrastructure-as-Code.
  • Prior experience working with large-scale, high-availability systems.
  • Analytical mindset with a knack for debugging complex systems.
  • Excellent communication and mentoring skills.
  • Empathy and patience when dealing with diverse users—tech-savvy or not.
  • Ability to weigh system design trade-offs and make pragmatic choices.

Apply for this job

*

indicates a required field

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf