DBT Project Overview
Our dbt (Data Build Tool) project serves as the core analytics ETL pipeline for Pair Team, transforming healthcare data from multiple sources into analysis-ready marts for business intelligence. The project lives in the analytics-etl repository and orchestrates the transformation of raw healthcare data into actionable insights that power our Looker dashboards and operational reporting.
Architecture
The project follows a three-tier medallion architecture that processes data from four primary sources:
Data Sources:
- Arc: Primary care platform data including patient charts, communications, and ECM enrollment profiles
- Elation: Electronic health record (EHR) data with clinical visits, procedures, medications, and lab results
- Zus: Healthcare data aggregation platform providing conditions, encounters, and clinical observations
- Regal: Contact center operations data tracking calls, tasks, and agent activity
Transformation Layers:
- Connectors (
models/connectors/): Source-specific staging models that normalize and clean raw data from each platform using the naming patternstage_{source}__{table_name} - Intermediate (
models/[domain]/intermediate/): Business logic transformations that join and enrich data across sources, following the patternint__{domain}_{description} - Marts (
models/[domain]/marts/): Final analytics-ready tables materialized as tables for performance, consumed directly by Looker
Key Business Domains
The project organizes models into domain-specific schemas that align with business functions:
- Operations: Patient enrollment tracking, churn analysis, engagement metrics, and social needs assessment
- Care Plan: Intervention management and care coordination workflows
- Care Gaps: HEDIS quality measures and care gap identification for preventive care
- Communications: Patient-provider interaction analytics and communication effectiveness
- Population Health: Clinical readings, vital signs, and health outcome tracking
- Risk Stratification: High-risk patient identification and prioritization
- Finance: Billing, authorization tracking, and revenue cycle management
Development Workflow
Engineers work in isolated development environments where models materialize under DBT_DEV_{username} schemas in Snowflake. The workflow emphasizes:
- Local testing with
dbt build --select +model_name+to validate model logic and dependencies - SQL linting via SQLFluff with Snowflake dialect and pre-commit hooks to enforce code quality
- Comprehensive documentation with all mart models requiring column descriptions and data quality tests
- Pull request reviews before merging to main, where models automatically deploy to production
Orchestration
Our DBT production models are managed via GitHub Actions.
The production environment runs nightly at 3 AM Pacific via GitHub Actions, executing a full-refresh build of nearly all models. Models materialize under the DBT_PROD schema and connect to Looker for business intelligence consumption. Two models use specialized strategies: stage_zus__observation runs incrementally for performance, and readings_bp_snapshot maintains historical snapshots to filter outdated clinical readings. If there are any schema changes for these models, we will need to run DBT against the production target profile using the --full-refresh option.
Technical Stack
The project leverages several dbt packages to extend functionality:
- dbt_utils: Core utility macros and tests
- the_tuva_project: Open-source healthcare analytics models for standardized clinical and claims data transformation
- audit_helper: Data comparison tools for migration and refactoring validation
- dbt_artifacts: Run metadata logging for observability and monitoring
- dbt_project_evaluator: Automated best practice validation for project structure and quality
Development requires Python 3.x, SSH-based Snowflake authentication, and the Power User for dbt VS Code extension for enhanced productivity with inline query execution and visual lineage exploration.