Description
Executive summary: Opinionated scaffolds for ingestion → transformation → validation → orchestration, turning blank folders into reliable analytics pipelines.
What’s inside (modules):
– Airflow DAG Templates: idempotent loads, late-arriving handling, retries with backoff, SLAs & alerts.
– dbt Project: staging/core/marts; snapshotting; tests for uniqueness, non-nulls, referential integrity.
– Great Expectations Suites: data quality checks (ranges, schema drift, row counts); data-docs publishing.
– Sample Datasets & Profiles: Parquet/CSV samples; pandas profiling notebooks; synthetic generators.
– Ops Toolkit: lineage notes, partitioning strategies, backfill playbook, cost-control tips.
Technical specifications:
– Artifacts: .py DAGs, .sql models, .yml configs, .ipynb notebooks, .parquet/.csv data.
– Compat: Python 3.10+, Airflow 2.x, dbt Core, Great Expectations, Postgres/BigQuery/Snowflake.
Setup & integration:
– Compose up Airflow; configure warehouse; run ‘dbt seed/test/run’; schedule GE checkpoints in DAGs.
Security & compliance:
– Env-segregated connections; optional PII tokenization; audit tables for run metadata.
KPIs & ROI:
– Days→hours bootstrap; fewer data incidents; faster analyst onboarding.
Included files:
/airflow/dags/*, /dbt/*, /great_expectations/*, /data/*, /notebooks/*, /docs/*
