Posted on :: 229 Words :: Tags: ,

Reflecting on Data Engineering as a Data Scientist & Analytics Freelancer

Exec Summary:

Joe Reis and Matt Housley’s book, Fundamentals of Data Engineering, has become a foundational reference in the field. Its enduring value lies in its focus on principles—rather than tools—recognizing that while technologies evolve, the core challenges and dimensions of data engineering remain constant.

“Data analytics without timely provision of quality data is misguided at best.”

Key Data Engineering Axes

In their framework, echoed in my own experience, highlights several critical axes for data engineering:

  • Batch vs. Streaming: The choice between processing data in scheduled batches or in real time is fundamental, impacting architecture, latency, and complexity.
  • The 5 Vs:
    • Volume (scale of data)
    • Velocity (speed of data generation and processing)
    • Variety (diversity of data sources and formats)
    • Veracity (data quality and reliability)
    • Value (usefulness of the data for business outcomes)
  • Data Accessibility: Ensuring the right people can access the right data at the right time.
  • Data Modelling: Star and snowflake schemas, fact and dimension tables, and the importance of conformed and role-playing dimensions for consistency and flexibility.
  • Granularity: The level of detail in your data, which affects storage, performance, and analytical power.
  • Change Management: Handling slowly changing dimensions and evolving data structures.

Read Part 2: Practical Tools and Thoughtful Choices →