Job Application for Intermediate Site Reliability Engineer, Database Operations at GitLab

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our platform unites teams and organizations, breaking down barriers and redefining what's possible in software development. Thanks to products like Duo Enterprise and Duo Agent Platform, customers get AI benefits at every stage of the SDLC.

The same principles built into our products are reflected in how our team works: we embrace AI as a core productivity multiplier, with all team members expected to incorporate AI into their daily workflows to drive efficiency, innovation, and impact. GitLab is where careers accelerate, innovation flourishes, and every voice is valued. Our high-performance culture is driven by our values and continuous knowledge exchange, enabling our team members to reach their full potential while collaborating with industry leaders to solve complex problems. Co-create the future with us as we build technology that transforms how the world develops software.

Int. Site Reliability Engineer: Database Operations

An overview of this role

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.

The Database Operations team’s mission is to build, run, own and evolve the entire lifecycle of the PostgreSQL database engine for GitLab.com. The team is focused on owning the reliability, scalability, evolution, performance & security of the database engine and its supporting services. The team should be seeking to build their services on top of Reliability::Foundations services and cloud vendor managed products, where appropriate, to reduce complexity, improve efficiency and deliver new capabilities quicker.

GitLab.com is a unique site and it brings unique challenges–it’s the biggest GitLab instance in existence. In fact, it’s one of the largest single-tenancy open-source SaaS sites on the internet. The experience of our team feeds back into other engineering groups within the company, as well as to GitLab customers running self-managed installations

Responsibilities

Automating every operational task is a core requirement for this role. For example, package updates, configuration changes across all environments, creating tools for automatic provisioning of user facing services, etc.
Responding to platform emergencies, alerts, and escalations from Customer Support.
Ensure systems exist to manage software life-cycles (e.g. Operating Systems) with a minimum of manual effort.
Develop a fully automated multi-environment observability stack based on the existing SaaS system, and extend it to predict capacity needs based on the usage patterns.
Plan for new service roll-outs, expansion and capacity management of existing services, and work with users to optimize their resource consumption.

As an SRE you will:

Work on database reliability and performance aspects for GitLab.com from within the SRE team as well as work on shipping solutions with the product.
Analyze solutions and implement best practices for our PostgreSQL database clusters and its components.
Work on observability of relevant database metrics and make sure we reach our database objectives.
Work with peer SREs to roll out changes to our production environment and help mitigate database-related production incidents.
OnCall support on rotation with the team.
Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations).
Work on automation of database infrastructure and help engineering succeed by providing self-service tools.
Use the GitLab product to run GitLab.com as a first resort and improve the product as much as possible.
Plan the growth of GitLab's database infrastructure.
Design, build and maintain core database infrastructure components that allow GitLab to scale to support hundreds of thousands of concurrent users.
Support and debug database production issues across services and levels of the stack.
Make monitoring and alerting alert on symptoms and not on outages.
Document every action so your learnings turn into repeatable actions and then into automation.

You may be a fit to this role if you:

Have primary experience running PostgreSQL in high-growth, large production environments using both self-managed (VM, Kubernetes with modern PostgreSQL Operators) as well DBaaS services.
Have hands-on experience using data from PostgreSQL internals to design, build and troubleshoot systems.
Have primary experience with infrastructure automation, orchestration and configuration management (Chef, Ansible, Puppet, Terraform)
Have solid understanding of SQL and PL/pgSQL
Significant experience working in a Large SaaS distributed Systems production environment
Share our values, and work in accordance with those values.
Have excellent written and verbal English communication skills, with an urge to collaborate and communicate asynchronously.
Have an urge to document all the things so you don't need to learn the same thing twice, and an urge for delivering quickly and iterating fast.
Have a proactive, go-for-it attitude. When you see something broken, you can't help but fix it
Solid data modeling and data structure design skills
Bonus: Solid programming skills as a (former) backend engineer - Preferably with Ruby and/or Go.
Bonus: Experience with Clickhouse, or other modern OLAP database.

Projects you could work on:

Review, analyze and implement solutions regarding database administration (e.g., backups, performance tuning)
Work with Ansible, Terraform, Chef and other tools to build mature automation (automate setup of new replicas or testing and monitoring of backups).
Implement self-service tools for our engineers using GitLab ChatOps.
Provide technical assistance and support to other teams on database and database-related application design methodologies, system resources, application tuning.
Review database related changes from engineering teams (e.g., database migrations).
Recommend query and schema changes to optimize the performance of database queries.
Jump on a production incident to mitigate database-related issues on GitLab.com.
Participate actively in the infrastructure design and scalability considerations focusing on data storage aspects.
Make sure we know how to take the next step to scale the database.
Design and develop specifications for future database requirements including enhancements, upgrades, and capacity planning; evaluate alternatives; and make appropriate recommendations.

Intermediate Site Reliability Engineer Criteria

Technical:

Expertise in at least 1 area of SRE work, with general knowledge of all areas.
Capable of mentoring Junior team members.
Contributes small improvements to the GitLab codebase to resolve issues.

Execution:

Identifies projects that result in substantial cost savings or revenue
Identifies changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make GitLab cheaper to run for all our customers.
Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.

Collaboration and Communication:

Ability to thrive in a fully remote, asynchronous work environment that places a high emphasis on documentation and written communication.
Develop expertise in a domain and radiate that knowledge
Participate in blameless RCAs on incidents and outages, looking for answers that will prevent the incident from ever happening again.

Influence and Maturity:

Lead Junior SREs by setting the example.
Develop ownership of a major part of the infrastructure.
Trusted to de-escalate conflicts inside the team

Performance Indicators

Site Reliability Engineers have the following job-family performance indicators:

Country Hiring Guidelines: GitLab hires new team members in countries around the world. All of our roles are remote, however some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.

Privacy Policy: Please review our Recruitment Privacy Policy. Your privacy is important to us.

GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status (which includes disabled veterans, recently separated veterans, active duty wartime or campaign badge veterans, and Armed Forces service medal veterans), or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics. See also GitLab’s EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know during the recruiting process.

First Name

Last Name

Phone

Location (City)

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

What's the name you'd prefer us to use throughout the interview process?

Are you subject to any employment agreements and/or post-employment restrictions with your current employer or a past employer?

Select...

It is important to us to create an accessible and inclusive interview experience. Please let us know if there are any adjustments we can make to assist you during the hiring and interview process.

Please choose the country in which you will be located if hired by GitLab.

Select...

Will you now or in the future require sponsorship for a visa to remain in your current location?

Select...

Do you have experience with Postgres at scale?

Select...

Do you have experience with Chef or Ansible (or a similar tool)?

Select...

Do you have commercial experience with Terraform?

Select...

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in GitLab’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Gender

Select...

Are you Hispanic/Latino?

Select...

Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Veteran Status

Select...

Voluntary Self-Identification of Disability

Form CC-305

Page 1 of 1

OMB Control Number 1250-0005

Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

Alcohol or other substance use disorder (not currently using drugs illegally)
Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
Blind or low vision
Cancer (past or present)
Cardiovascular or heart disease
Celiac disease
Cerebral palsy
Deaf or serious difficulty hearing
Diabetes
Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
Epilepsy or other seizure disorder
Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
Intellectual or developmental disability
Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
Missing limbs or partially missing limbs
Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
Partial or complete paralysis (any cause)
Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
Short stature (dwarfism)
Traumatic brain injury

Disability Status

Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.

Intermediate Site Reliability Engineer, Database Operations

Responsibilities

As an SRE you will:

You may be a fit to this role if you:

Projects you could work on:

Intermediate Site Reliability Engineer Criteria

Technical:

Execution:

Collaboration and Communication:

Influence and Maturity:

Performance Indicators

Apply for this job

Voluntary Self-Identification

Voluntary Self-Identification of Disability