RoDE: Part 1 - Core Principles of Data Engineering.
Reflecting on Data Engineering as a Data Scientist & Analytics Freelancer
Exec Summary:
Joe Reis and Matt Housley’s book, Fundamentals of Data Engineering, has become a foundational reference in the field. Its enduring value lies in its focus on principles—rather than tools—recognizing that while technologies evolve, the core challenges and dimensions of data engineering remain constant.
“Data analytics without timely provision of quality data is misguided at best.”
Key Data Engineering Axes
In their framework, echoed in my own experience, highlights several critical axes for data engineering:
- Batch vs. Streaming: The choice between processing data in scheduled batches or in real time is fundamental, impacting architecture, latency, and complexity.
- The 5 Vs:
- Volume (scale of data)
- Velocity (speed of data generation and processing)
- Variety (diversity of data sources and formats)
- Veracity (data quality and reliability)
- Value (usefulness of the data for business outcomes)
- Data Accessibility: Ensuring the right people can access the right data at the right time.
- Data Modelling: Star and snowflake schemas, fact and dimension tables, and the importance of conformed and role-playing dimensions for consistency and flexibility.
- Granularity: The level of detail in your data, which affects storage, performance, and analytical power.
- Change Management: Handling slowly changing dimensions and evolving data structures.