Back to jobs

Manager, Network Operations Engineering

About Tekion:

Positively disrupting an industry that has not seen any innovation in over 50 years, Tekion has challenged the paradigm with the first and fastest cloud-native automotive platform that includes the revolutionary Automotive Retail Cloud (ARC) for retailers, Automotive Enterprise Cloud (AEC) for manufacturers and other large automotive enterprises and Automotive Partner Cloud (APC) for technology and industry partners. Tekion connects the entire spectrum of the automotive retail ecosystem through one seamless platform. The transformative platform uses cutting-edge technology, big data, machine learning, and AI to seamlessly bring together OEMs, retailers/dealers and consumers. With its highly configurable integration and greater customer engagement capabilities, Tekion is enabling the best automotive retail experiences ever. Tekion employs close to 3,000 people across North America, Asia and Europe.

Job Description:

We are seeking an experienced and proficient Engineering Manager, NOC to join our team. This role demands a minimum of 8 years of experience in NOC, Production Support, or Command Center roles within a 24 x 7 operational environment. The ideal candidate will possess advanced knowledge of observability platforms and ITIL processes, along with strong analytical and troubleshooting skills.


Roles and Responsibilities:
  • Ownwership of NOC Team operation, hiring great talent, building teams from ground up, mentorship, growth and retention of people.
  • Lead the monitoring and maintenance of system health using observability platforms such as AppDynamics, Dynatrace, Datadog, or New Relic.
  • Provide expert consultation, design, and implementation of APM, Real User Monitoring, Synthetic Monitoring, Infrastructure Monitoring, and Log Management modules.
  • Oversee incident, problem, change, and release management processes as per ITIL standards.
  • Manage and drive major incident bridge calls and post-incident reviews (PIRs).
  • Conduct root cause analysis and troubleshooting using tools like New Relic and
  • Kibana.
  • Develop and maintain monitoring alerts and dashboards.
  • Resolve production issues across various services and stack levels.
  • Ensure compliance with Service Level Objectives (SLOs) and Service Level
  • Agreements (SLAs).
  • Develop monitoring solutions to detect symptoms and prevent outages.
  • Automate operational processes to enhance system efficiency and reduce manual
  • tasks.
  • Take responsibility for on-call rotations to immediately address potential issues or
  • disruptions.
  • Work rotational shifts (Day/Night/Weekends) as needed.
Qualifications and Education Requirements:
  • 5+ years of experience in NOC/Production Support/Command Center roles.
  • 3+ years of experience as an Engineering Manager - people and performance management.
  • Extensive experience around establishing organisation wide processes such as Incident Management, and it’s evangelisation.
  • Advanced knowledge of observability platforms (AppDynamics, Dynatrace, Datadog,
  • New Relic).
  • Extensive hands-on experience across multiple platforms.
  • In-depth understanding of ITIL processes and best practices.
  • Superior problem-solving and troubleshooting skills.
  • Proficiency in log monitoring solutions (Sumologic, Splunk) is highly desirable.
  • Extensive experience with cloud service providers (Azure, AWS) is preferred.

Preferred Skills
  • Expertise in cloud computing platforms (AWS, Azure).
  • Significant experience in technical support or operations roles, ideally in a SaaS
  • environment.
  • Ability to create comprehensive technical documentation and knowledge bases.
  • Proven experience in monitoring and maintaining large-scale systems.
  • Advanced knowledge of monitoring and logging tools (Prometheus, Grafana, ELK
  • Stack).
  • Experience in Dev support is highly beneficial.
  • Proficiency in command center technologies and software (e.g., SIEM, NMS, ITSM tools).
  • Experience with crisis management and disaster recovery planning.
  • Ability to work under pressure and make quick, informed decisions.
  • Strong analytical skills to interpret data and trends.
  • Ability to work collaboratively with cross-functional teams.
  • Excellent organizational skills and attention to detail.
  • Experience in developing and managing budgets.
  • Familiarity with network and systems architecture.
  • Knowledge of cybersecurity principles and practices.
  • Availability to work flexible hours, including nights, weekends, and holidays, as needed.

Tekion is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

For California residents you can review Tekion's California Privacy Policy here.

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Education

Select...
Select...
Select...