2026-01-22 - DDIA_1
|
1 min read
Takeaways:
- there is a distinction between a data engineer and an analytics engineer:
- data eng: connect operational to the analytics systems
- analytics eng: transform the data for business and data scientists' needs
- point query: look up a small number of records - not looking for trends across the data
- real-time analytics systems: OLAP DBs designed for analytical workloads but also user-facing. examples: clickhouse
- traditional OLAP more batched updates, more support for fast high throughput writes not fast reads
- data warehouse been around since the 80s
- initially as a way to centralize data from multiple business areas
- separated the work load of analytics from the day to day (don't bring down prod just to answer a business question)
- ETL (extract-transform-load) process of getting data into a datawarehouse
- sometimes its ELT
- aka data pipes
- HTAP (hybrid transactional analytical processing)
- both OTLP and OLAP in one
- data lake: files not tables
- its not a step beyond datawarehouse but could be in done on the way to the warehouse
- make data available to machine learning
- feature engineering: transforming the data to prioritize model performance
- more likely to be on something like s3
- sushi principle: raw data is better