Skip to content

Use case Description - Scenario 4 - Big data analytics#

Use case description - Problem statement#

As a data scientist, I need to perform data analytics operations on a large amount of data.

Use case goals#

  • Load data from MinIO
  • Read the data
  • Perform data warehousing operations/queries on the data
  • Show results locally

Tools & Capabilities#

To meet the use case goals, the following tools from the portal will be leveraged:

Tool Description Key capability
Jupyter notebook + Spark The Jupyter Notebook is a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and much more. Used for spark. - Trigger Spark execution
- Perform advanced analytics
MinIO MinIO offers high-performance, S3 compatible object storage. Native to Kubernetes, MinIO is the only object storage suite available on every public cloud, every Kubernetes distribution, the private cloud and the edge. MinIO is software-defined and is 100% open source under GNU AGPL v3. - load and store the data