Posted on :: 614 Words :: Tags: ,

Data Engineering Fundamentals

Exec Summary:

Data engineering is the backbone of effective analytics and data science. Without timely, high-quality data, even the best models and insights fall flat. In this series, I share my reflections as a data scientist and analytics freelancer on:

  • The core principles of data engineering inspired by Joe Reis and Matt Housley’s foundational work.
  • Practical tools and best practices that I currently prefer, emphasising automation, disciplined SQL, and sensible tool choices.
  • The importance of robust data modelling and the critical role of domain-savvy data architects.
  • Emerging architectural paradigms like data mesh and data fabric, and guidance on when to consider them.

This series is written from a Python/SQL-first perspective, focusing on reproducibility, version control, and avoiding vendor lock-in. I’m not affiliated with any vendors mentioned, and this reflects my personal experience and preferences.

Part 1: Core Principles of Data Engineering

Explore the foundational dimensions of data engineering, inspired by Reis and Housley’s Fundamentals of Data Engineering. Understand why timely, quality data is essential and learn about key axes like batch vs streaming, the 5 Vs, data accessibility, granularity, and change management.

Read Part 1 →


Part 2: Practical Tools and Better Practices

A deep dive into the tools and practices that make data engineering effective today:

  • Why automation and observability matter — Prefect.io vs Airflow
  • Disciplined SQL workflows with SQLMesh
  • Using DuckDB for local development and MVPs
  • The growing importance of synthetic data (e.g. Faker)
  • Avoiding premature scaling with heavyweight platforms
  • New tools to watch: like MotherDuck

Read Part 2 →


Part 3: Data Modelling and the Role of the Data Architect

Discuss the universal star schema approach to simplify and standardise analytics data models. Highlight the indispensable value of data architects with deep domain knowledge who ensure data solutions align with real business needs. Also, reflect on the messy reality of corporate data versus the clean datasets often used in tutorials.

Read Part 3 →


Part 4: Emerging Architectures — Data Mesh and Data Fabric

Understand the differences between data mesh and data fabric architectures, their organisational and technical implications, and when to consider each. Learn about hybrid approaches that combine the best of both worlds to balance agility and governance.

Read Part 4 →


Part 5: Navigating Constraints - Greenfield vs. Brownfield

Explore the reality of greenfield vs. brownfield data engineering projects. Learn how to assess technical landscapes, data quality, and organisational context to make informed decisions and embrace reversible decisions.

Read Part 5 →

Selected References

Selected references that informed this series of articles.

  1. 15 Data Engineering Best Practices to Follow in 2025 (lakefs.io)
  2. 10 Essential Data Engineering Tools To Use in 2025 (infomineo.com)
  3. Small Data Engineering tools/techniques : dataengineering (reddit.com)
  4. Data Engineering Best Practices | Nexla
  5. 14 Essential Data Engineering Tools to Use in 2025 (datacamp.com)
  6. Data Engineering: Components, Skills & Best Practices [2025 Guide] | Dagster
  7. Is Data Modelling Still Important In Modern Data Architecture? (matillion.com)
  8. The Role of Data Architecture and Data Modelling Strategy (cuelogic.com)