Skip to content

Scenario 5 - Live dashboards

Problem statement#

As a data scientist, I need to create a dashboard that visualizes up-to-date insights.

Goals#

  • Fetch the weather data of the city of Brussels from the Open-Meteo API
  • Store the data into a SQL DB
  • Set the execution of data fetching on a daily basis
  • Build a dashboard to display the weather data

Tools & Capabilities#

To meet the use case goals, the following tools from the portal will be leveraged:

Tool Description Key capability
Jupyter notebook The Jupyter Notebook is a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and much more.
- Fetch the data from the API

- Connect with PostgreSQL

- Store the data into dedicate tables
PostgreSQL PostgreSQL is a powerful, open-source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. - load and store the data
Airflow Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. - Schedule the execution of the data fetching/storing script
Apache Superset Apache Superset is a modern data exploration and visualization platform. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.
- Connect the source data

- Create visualizations for the weather dashboard

Use case guide#

This document is meant to guide the user through Scenario 5 - Automated dashboards, by presenting a high-level overview of the main steps. As discussed in the use case description, the goal is to provide an automated weather forecast dashboard which fetches data from an open API on a daily basis (for the city of Brussels).

  • Step 1: Initialize the resources. Launch three instances – Jupyter, PostgreSQL and MinIO - from the Service Catalog section of the Portal, and verify that their status is ACTIVE in the My Services section to double-check that the deployed instance is ready to be used.
  • Step 2: Retrieve the PostgreSQL host address. Copy the PostgreSQL host address from the My Services section in the format <postgresSQLHost>:5432 for later use.
  • Step 3: Set up the data pipeline - Data fetching & DAG script. Download the commented data fetching script (UAT_Scenario_5.py) and the DAG script (UAT_Scenario_5_DAG.py), edit the data fetching script to define DB engine parameters by replacing placeholders with actual PostgreSQL password and host address, then upload both scripts to the "dags" folder in the Jupyterlab instance accessed from the My Services section, ensuring not to run the scripts in Jupyterlab to avoid interfering with the Airflow-Jupyterlab connection configuration.
  • Step 4: Initiate the data pipeline - Airflow. Access the Airflow instance from the My Services section, log in using provided credentials, activate the scheduled "meteo_data_fetching" DAG in the DAG section, manually trigger its execution, and upon successful completion, verify the creation and population of two new tables (hourly data and metadata) in the PostgreSQL database.
  • Step 5: Create the dashboard. Access the Superset instance, connect the PostgreSQL database, add datasets from the connected database (hourly_data and metadata), create various charts including a Weekly temperature forecast, Temperature vs. Perceived Temperature, Weekly Average Temperature with trendline, Min/Max temperature, cloud coverage, pressure level, and rain volume, and a Heatmap - Cloud coverage vs. Rain volume, create a dashboard titled "Brussels - Meteo," add tabs for Weather Data and Charts, populate the tabs with the respective charts, and finalize the dashboard setup.