Skip to content

Scenario 1 - Data visualization

Problem statement#

As data scientist, I need to extract insights and visualizations from the provided dataset to support the storytelling of the “energy balance in the EU” report.

Goals#

  • Load the data
  • Query the data
  • Explore the different visualization opportunities
  • Select the best-fit visualization techniques
  • Create the visualizations

Tools & Capabilities#

In order to meet the use case goals, the following tools from the portal will be leveraged:

Tool Description Key capability
PgAdmin PgAdmin is the most popular and feature rich Open-source administration and development platform for PostgreSQL, the most advanced Open-source database in the world. Data load
PostgreSQL PostgreSQL is a powerful, open-source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. Data query
Apache Superset Apache Superset is a modern data exploration and visualization platform. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.
- Data pre-processing (null values, outbound values, …)

- Explore the different visualization opportunities

- Select the best-fit visualization technique

- Create the visualizations

Use case guide#

This document is meant to guide the user through Scenario 1 - Data visualization, by presenting a high level overview of the main steps. As presented in the use case description, the goal is to provide relevant visualizations through Apache Superset, having the data stored on a DB in PostgreSQL as source.

  • Step 1: Initialize the resources. Launch three instances - pgAdmin, PostgreSQL, and Apache Superset - from the Service Catalog section of the Portal, and verify that their status is ACTIVE in the My Services section to double-check that the deployed instance is ready to be used.
  • Step 2: Configure pgAdmin and connect the PostgreSQL instance. Copy the PostgreSQL host address from the My Services section, log in to the pgAdmin instance, open the Browser tab, and register the PostgreSQL server by filling in the configuration modal with the PostgreSQL host, port, maintenance database, username, and password as defined in the configuration.
  • Step 3: Load the data. Utilizing pgAdmin, perform operations including creating a data table, defining its schema via SQL queries, and importing data from a source CSV file, enabling efficient interaction and management with the PostgreSQL database for tasks such as adding columns and importing data.
  • Step 4: Configure Apache Superset and connect the PostgreSQL database. Access your Apache Superset instance from the My Services section, log in using the configured credentials, and establish a connection to the PostgreSQL database by providing the necessary PostgreSQL credentials. Then, add the dataset from the PostgreSQL database to Apache Superset, specifying the relevant schema and table, and configure the dataset's temporal dimension property for enhanced data visualizations.
  • Step 5: Create a dashboard and populate it with charts. Within the Charts section of Apache Superset, generate Pie charts, Bar charts, Box plots, and World map charts by selecting datasets and chart types, configuring columns and metrics, adding filters, and specifying chart properties, then create a dashboard and populate it with the created visualizations, arranging and organizing them as desired.