Posted on 2025-04-14 :: 591 Words :: Tags: Thought Piece, Data Engineering

Data Engineering Fundamentals

Exec Summary:

Data engineering is the backbone of effective analytics and data science. Without timely, high-quality data, even the best models and insights fall flat. In this series, I share my reflections as a data scientist and analytics freelancer on:

The core principles of data engineering inspired by Joe Reis and Matt Housley’s foundational work.
Practical tools and best practices that I currently prefer, emphasising automation, disciplined SQL, and sensible tool choices.
The importance of robust data modelling and the critical role of domain-savvy data architects.
Emerging architectural paradigms like data mesh and data fabric, and guidance on when to consider them.

This series is written from a Python/SQL-first perspective, focusing on reproducibility, version control, and avoiding vendor lock-in. I’m not affiliated with any vendors mentioned, and this reflects my personal experience and preferences.

Part 1: Core Principles of Data Engineering

Explore the foundational dimensions of data engineering, inspired by Reis and Housley’s Fundamentals of Data Engineering. Understand why timely, quality data is essential and learn about key axes like batch vs streaming, the 5 Vs, data accessibility, granularity, and change management.

Read Part 1 →

Part 2: Practical Tools and Better Practices

A deep dive into the tools and practices that make data engineering effective today:

Why automation and observability matter — Prefect.io vs Airflow
Disciplined SQL workflows with SQLMesh
Using DuckDB for local development and MVPs
The growing importance of synthetic data (e.g. Faker)
Avoiding premature scaling with heavyweight platforms
New tools to watch: like MotherDuck

Read Part 2 →

Part 3: Data Modelling and the Role of the Data Architect

Discuss the universal star schema approach to simplify and standardise analytics data models. Highlight the indispensable value of data architects with deep domain knowledge who ensure data solutions align with real business needs. Also, reflect on the messy reality of corporate data versus the clean datasets often used in tutorials.

Read Part 3 →

Part 4: Emerging Architectures — Data Mesh and Data Fabric

Understand the differences between data mesh and data fabric architectures, their organisational and technical implications, and when to consider each. Learn about hybrid approaches that combine the best of both worlds to balance agility and governance.

Read Part 4 →

Part 5: Navigating Constraints - Greenfield vs. Brownfield

Explore the reality of greenfield vs. brownfield data engineering projects. Learn how to assess technical landscapes, data quality, and organisational context to make informed decisions and embrace reversible decisions.

Read Part 5 →

Selected References

Selected references that informed this series of articles.

Table of Contents