2026-01-22 - DDIA_1

| 1 min read

Takeaways:

  • there is a distinction between a data engineer and an analytics engineer:
    • data eng: connect operational to the analytics systems
    • analytics eng: transform the data for business and data scientists' needs
  • point query: look up a small number of records - not looking for trends across the data
  • real-time analytics systems: OLAP DBs designed for analytical workloads but also user-facing. examples: clickhouse
    • traditional OLAP more batched updates, more support for fast high throughput writes not fast reads
  • data warehouse been around since the 80s
    • initially as a way to centralize data from multiple business areas
    • separated the work load of analytics from the day to day (don't bring down prod just to answer a business question)
    • ETL (extract-transform-load) process of getting data into a datawarehouse
      • sometimes its ELT
      • aka data pipes
    • HTAP (hybrid transactional analytical processing)
      • both OTLP and OLAP in one
  • data lake: files not tables
    • its not a step beyond datawarehouse but could be in done on the way to the warehouse
    • make data available to machine learning
    • feature engineering: transforming the data to prioritize model performance
    • more likely to be on something like s3
  • sushi principle: raw data is better