
Site Reliability Engineer - Big Data
Verisign helps enable the security, stability, and resiliency of the internet. We are a trusted provider of internet infrastructure services for the networked world and deliver unmatched performance in domain name system (DNS) services.
We are a mission focused, values driven company where each individual can contribute to building a stronger, more secure internet. We offer a dynamic and flexible work environment with competitive benefits and the ability to grow your career.
Within Verisign, our team is responsible for building and managing Verisign Data Platform enabling the creation of large-scale, high-throughput (millions requests per second) data products and services delivering actionable operational and business intelligence. To help us advance the platform, we are looking for a highly skilled Mid-level Site Reliability Engineer (SRE). This role will play a critical part in ensuring the stability, performance, and security of our data platforms
An ideal candidate should deeply care about big data systems and automation, be fluent in Infrastructure-as-Code, CI/CD, and be eager to learn as needed. The successful candidate should have an understanding of fundamentals, including core Computer Science concepts, operating systems, networking, file systems and databases accompanied by hands-on experience managing large-scale distributed systems. Acquiring these competencies typically requires an equivalent of a bachelor’s degree and 6 or more years of practical work experience. We are also open to other career paths.
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
Key Responsibilities:
- Architect, Design, deploy, monitor, and operate large scale data platforms like Hadoop, Kafka, Spark and Druid running both on physical servers and on top of Kubernetes
- Participate in technical designs, Proof of Concepts for software solutions that combine Open-Source components, COTS (commercial off the shelf) components, and custom developed components
- Deploy and manage Production releases with minimum supervision
- Automate cluster provisioning (CI/CD, Infrastructure-as-Code), scaling, and monitoring using Ansible, Python, Jenkins, Terraform and other relevant tools
- Build and deploy containerized applications using Docker and Kubernetes
- Troubleshooting complex issues in large and distributed environments
- Upgrading (including patching, deploying releases) large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
- Performance of occasional operations support functions, including problem isolation and resolution
- Participate in the on-call rotation to monitor the health of the production systems and respond to incidents or customer needs
- Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
- Supporting data platform customers and continuously improving the monitoring, performance, and functionality of the clusters
- Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
The candidate must have:
- Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience
- 5+ years of experience managing big data platforms (Hadoop, Spark Kafka, Druid)
- Excellent understanding of Linux configuration and administration
- Strong automation experience - Not just developing automation, but knowing why we automate and what to automate
- Strong understanding of infrastructure-as-code
- Strong written and verbal communication skills – able to clearly and succinctly describe complex issues
- Familiarity with networking protocols and systems
Desired Skills, Experience, and Attributes:
- Experience with a high-level scripting language such as Python
- Experience with RedHat Enterprise Linux and/or FreeBSD
- Experience with network troubleshooting using such tools as ping, traceroute and dig
- Deployment automation experience using tools such as Ansible
- Experience working with teams using Kanban and/or Scrum a plus
- Experience with Docker or Kubernetes in a production environment
- Experience with OpenStack in a production environment
- Experience administrating Unix systems in a large-scale environment
- Experience using Jenkins in a continuous delivery and integration environment
This position is based in our Reston, VA office and offers a flexible, hybrid work schedule
The pay range is $108,900 - $147,300.
The anticipated annual base salary range for this position is noted above, however, base pay offered may vary depending on job-related knowledge, skills, experience. Verisign offers a discretionary bonus which is based on individual and company performance, and certain roles may be eligible for discretionary stock awards.
Verisign is an equal opportunity employer. That means we recruit, hire, compensate, train, promote, transfer, and administer all terms and conditions of employment without regard to their race, color, religion, national origin, sex, sexual orientation, gender identity, age, protected veteran status, disability, or other protected categories under applicable law.
Additional Information:
Our Careers Page
Our Benefits Summary
Verisign in the Community
Our EEO Statement
Our Privacy Notice for Job Applicants/Candidates
Reasonable Accommodations
Staffing agency policy: No fees will be paid for unsolicited resumes submitted to Verisign or our employees by third parties.
Apply for this job
*
indicates a required field