Austin Trips — Austin Trips Detailed Story

Dashboards

Embedded visibility for on-call teams

Project overview

ETL built for Austin MetroBike transparency

This project implements an ETL pipeline to process historical bike sharing data from Austin (2013–present). The goal is to enable stakeholders to make informed decisions on bike usage patterns, station performance, demand distribution, and vehicle/subscription trends.

The pipeline extracts raw trip data, transforms it to calculate useful metrics (e.g., trips per kiosk, trips per hour, top stations, subscription and bike type analysis), and loads the results into a format suitable for visualization and analysis.

Source datasets are publicly available:

Bike Trips Data
Austin MetroBike Trips
Kiosk Locations
Austin MetroBike Kiosk Locations

Stakeholder requirements

What teams asked for

Trips by Start Kiosk: Total trips per kiosk aggregated by day of week, month, and year.
Demand Distribution Over Time: Hourly trips segmented by day of week, month, and year.
Geospatial Analysis of Stations: Identify and visualize top 20 kiosks by number of trips per day of week and month.
Trips by Subscription Type: Total trips made by each membership/pass type.
Trips by Vehicle Type: Total trips by bike type (Classic vs. Electric).

Proposed data warehouse

Blueprints before build

Schematic design

Conceptual design covering fact and dimension entities aligned to trip events, kiosks, vehicles, and subscriptions.

Logical design

Logical schema drawn in DBDiagram to guide implementation and enforce relational integrity.

Project architecture

What happened under the hood

Below is a high-level architecture of the ETL pipeline. It highlights the operational flow from raw data pull to dashboards teams use every day.

Austin Trips ETL high-level architecture

Data acquisition: Automated via run_etl.sh; raw datasets land in data/raw.
ETL processing: etl_main.py with Apache Spark performs feature engineering, dimension extraction, and measure computation.
Data storage: Fact and dimension tables are written into PostgreSQL.
Visualization: Grafana dashboards provide interactive filtering and exploration.
Containerization: Docker orchestrates Spark, PostgreSQL, and Grafana services.

Ready to explore

Let's bring the same rigor to your data story.

Together we can compose pipelines, metrics, and dashboards that make your operations faster, clearer, and more confident.

Start a project