Data Engineer

Collingwood, Victoria, Australia

IN A NUTSHELL

Bellroy’s Data team is searching for a Data Engineer to help us make better decisions by getting our (many) data pipelines flowing smoothly and sensibly into a well-architected data platform.

We need your help to build, improve and maintain the infrastructure required to stream and enrich data from a variety of internal and external sources. Together, we will enhance our cloud-based data platform to get the right info, infer useful things and make better decisions. Then test, and keep improving. In areas where the data platform is already fully functional, you’ll lend your sharp logic to improve internal processes, automate systems, and optimise, well, everything. And down the track, we’d love you to get involved with some Bayesian analysis, machine learning and other interesting data-related projects.

If you get excited by the idea of providing the right data to inform great decisions; and you want that data to be accessible, understandable and trustworthy, then this could be the job for you. If you bring your experience, smarts and detail-oriented brain to help us, we’ll offer a world-class team to learn from, the tools you need to do your thing, and the support you need to flourish.

 

IF YOU WERE HERE IN THE FEW LAST WEEKS YOU MIGHT HAVE:

  • Reviewed our overall data systems (supported by very competent sysadmins) to make sure everything was in order and to look for larger-scale improvements
  • Built out a handful of new pipelines to bring more of our core business data into our data platform within our GCP and AWS environments
  • Chased a handful of data validation alerts raised by our pipelines, and taken the time to get to the root cause of each of them, then either delegated the fix to an appropriate someone else or fixed them yourself
  • Worked outside of data team, with our developers, flexing your database and query optimisation skills to decide whether to fix a performance issue they’re having at the database level, or insist that the fix should be in the code (and, that’s fun - they’re an excellent bunch)
  • Provided an ad-hoc analysis (working with our analysts) to someone who requested it, integrating a one-off data source
  • Talked with our Data Manager about some of our mid-term plans, and how we’ll support them with data
  • Re-engineered an existing data flow to handle an order-of-magnitude growth in the amount passing through the flow.

 

THESE ARE SOME QUALITIES YOU MUST POSSESS:

  • At least five years experience in a data engineering role 
  • Advanced working knowledge of SQL and experience in ETL and streaming using a workflow management tool such as Apache NiFi (our chosen tool)
  • Experience with building and optimising data pipelines
  • Experience with collecting data from a variety of sources including APIs (good APIs, bad APIs, and ugly APIs)
  • Strong analytical skills and an ability to perform root cause analysis
  • Training in at least one of Computer Science, Statistics, Informatics, Information Systems or another relevant quantitative field (or demonstrable skill in one of those areas and the story of how you built that skill without formal training)
  • Very high precision – you need to know how to verify that your work is correct (even when dealing with unreliable data, where uncertainty may be inherent)
  • Bonus points for more relevant experience, such as with programming languages used in our projects (e.g.,  Ruby on Rails, Python, R, Haskell, TypeScript/JavaScript), PostgreSQL, AWS Aurora, Google BigQuery, Pub/Sub, project management and machine learning.

 

LOCATION AND HOURS

This role is full-time and based in our Collingwood office with work-from-home flexibility. 

Start Day: We're ready when you are!

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter*

Accepted file types: pdf, doc, docx, txt, rtf


  • Given the 4 tables in the sqlite3 db bellroy_question_1.sqlite3 in the repo (invoices, invoice_lines, products and orders) please write a SQL SELECT statement to prepare a dataset for a sales report returning columns month, style_code, color_name, and revenue.
  • Assume the db in the repo is only a small sample of the data your query will process; submit a SQL query that you expect to work on the whole dataset.
  • Comment freely on our schema and anything else you noticed. (We mention this because we suspect that if you’re the candidate we’re looking for, you’ll be at least a little horrified by something in this database.)