Use case Description - Scenario 4 - Big data analytics#
Use case description - Problem statement#
As a data scientist, I need to perform data analytics operations on a large amount of data.
Use case goals#
- Load data from MinIO
- Read the data
- Perform data warehousing operations/queries on the data
- Show results locally
Tools & Capabilities#
To meet the use case goals, the following tools from the portal will be leveraged:
Tool | Description | Key capability |
---|---|---|
Jupyter notebook + Spark | The Jupyter Notebook is a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and much more. Used for spark. | - Trigger Spark execution - Perform advanced analytics |
MinIO | MinIO offers high-performance, S3 compatible object storage. Native to Kubernetes, MinIO is the only object storage suite available on every public cloud, every Kubernetes distribution, the private cloud and the edge. MinIO is software-defined and is 100% open source under GNU AGPL v3. | - load and store the data |