Azure Relational Database#
- User needs: Data storage, Data query
- User profiles: Data Analysts, Data Scientists, Data Engineers
- User assumed knowledge: How to connect to databases
Relational databases can be used to store data collections in a structured format. This is different from Azure Data Lake Gen 2, where data is stored in an unstructured manner. This means that, before you can add data to a table in SQL, you have to define the format of the data. The data resides in tables, which constitute of columns and rows.
The Azure part of the EC Data Platform offers three different SQL relational database options: Microsoft SQL Server, MySQL and PostgreSQL.
Choosing the right relational database type#
A big difference between the three services is that MySQL and PostgreSQL use an open-source licensing model, while Microsoft SQL Server uses a commercial licensing model. The impact on the pricing is around 37% in favor of the open-source models.
Even though the raw performance seems to be best for MySQL, SQL server has the benefit in robustness, security and integration with the Azure platform.
Robustness in the sense that the database can tolerate more perturbations, without impacting the underlying data or the availability of the system. This related to, e.g. cancelling queries that have been executed, performance impact during backup, restore procedure, among others. Security mostly relates to the built-in MS baseline security analyzer tool, of which no counterpart in MySQL exists.
Finally, the more mature integration with Azure has an impact on the management and security. Security in the fact that automated security alerts and recommendations can be activated, as well as more logs that can be generated. Management mostly relates to the integration with Azure Active Directory and the use of Microsoft tools such as Azure Data Studio and Microsoft SQL Server Management Studio.
The biggest difference between MySQL and Microsoft SQL Server on the one hand, and PostgreSQL on the other hand, is that PostgreSQL offers User-defined functions and dynamic actions out of the box. A good, but not completely accurate, comparison would be to say that MySQL is the open-source version of SQL Server, while PostgreSQL is the open-source version of Oracle databases.
A simplified decision tree, if you are still in doubt: - Are you currently using user-defined functions, and are in need for a procedural query language? Are you currently using PostgreSQL and do not want to switch? If yes, choose PostgreSQL. If not, follow the question below. - Are you willing to pay 40% more for Azure Active Directory integration, additional security logs, the use of T-SQL and the use of tools like Azure Data Studio and Microsoft SQL Server Management Studio? If yes, choose MS SQL Server. If not, use MySQL.
For more information around the difference between MySQL and MS SQL Server, there are a lot of resources available on the internet.
Microsoft SQL Server#
Connecting to your Microsoft SQL Server#
In order to connect to your database, you need to be inside the private network of the EC Data Platform. This can be done by connecting to your Amazon WorkSpaces, Azure Windows Virtual Desktop or by using a VPN connection. Please see the section on Accessing a Data Science Lab environment for more info. The Azure Windows Virtual Dektop connection is preferred in this case as it is the most user friendly way.
Once connected to the EC Data Platform, we need to use a tool to connect to the database. In this example we will be using Azure Data Studio, but you can also use Microsoft SQL Server Management Studio (or any other preferred SQL database tool).
Step 1: Open Azure Data Studio (this will be installed in the latest version of Amazon WorkSpaces and Azure Windows Virtual Desktop). If this you are using an older version of the Amazon WorkSpaces, navigate to this link and download and install via the Windows User Installer (recommended). After installation, open Azure Data Studio.
Step 2: Once Azure Data Studio is opened, click in the upper left corner on “Connections”. Then click on the “New connection” icon.
Step 3: If your deployment enabled Azure Active Directory integration, the first thing you should do is authenticate with Azure Active Directory. This can be done by setting the Authentication type to “Azure Active Directory – Universal with MFA support”. Next, in the account option, click on “Add an account…”
If you have Internet Explorer as your default browser, a popup might appear indicating that the https://login.microsoftonline.com
site is blocked.
To solve this, go to your internet explorer settings and click on Internet Options.
Next, go to Security, click on “Trusted sites” and click “Sites”.
In the textbox, past the login url of Microsoft: https://login.microsoftonline.com
and click on Add
Click on Ok and Close to exit and try again to “Add an account…” in Azure Data Studio (you might have to completely restart Azure Data Studio). If there is still an issue after this occurs, please change your default web browser.
If your deployment did not enable Azure Active Directory integration, you can log in using the Authentication type “SQL Login” by providing your User name and Password.
Step 4: In the web browser that pops up, sign into your Azure Active Directory account. After successful authentication, your Azure Data Studio should have added your account:
Step 5: Next, provide the SQL database endpoint that was provided to you as the server value. For me, this is:
testpproject-dev-db-sql-server.database.windows.net
Step 6: Additionally, specify the Database to which you have been provided access to. By default, the database that is specified will be <Default>
, however, in most cases you will not have access to the Default/master database. Therefore, I specify the database that was provided to me from deployment: testproject-uat-db-sql-database
Step 7: Finally, click on Connect to connect to the database. If the connection went successful, you should see a similar view as the below Figure.
Adding Azure Active Directory (AAD) users to access the Microsoft SQL Sever#
If you have chosen for AAD integration, once the SQL database is created only the Azure Active Directory user that was assigned as the admin has access to the database. In order to provide access to other AAD users, this admin should allow these users (or groups) to access the database. For more advanced use-cases, roles can also be assigned to these users or groups to provide granular permission on the actions that can be executed by the users.
In order to add AAD users to access the database, the following SQL query should be executed where the account name of the user is provided (including the domain):
CREATE USER [ncattoir-test@ec-dataplatform.eu] FROM EXTERNAL PROVIDER;
After executing this query, you should see that the query was successfully executed:
This will allow the selected user to log in using the same process as described in Connecting to your Microsoft SQL Server. Additionally, the admin account is also able to add groups AAD groups using the following query (please consult with the EC Data Platform administrators for group names):
CREATE USER [testUatGroup] FROM EXTERNAL PROVIDER;
Finally, in advanced use cases it might be necessary to add an AAD application to access the database. This can be done by executing the following command:
CREATE USER [appName] FROM EXTERNAL PROVIDER;
If you have not chosen for AAD integration, additional users can be created by executing the following query. It is the responsibility of the admin to create users and distribute the passwords:
CREATE LOGIN ncattoir WITH PASSWORD = '8fdKJl3$nlNv3049jsKK';
Granting permissions to the user of the Microsoft SQL Server#
Once the users have been created in the previous step, these users need to be granted permissions in order to read, write or perform other operations on the SQL Database. This can be done based on two approaches: assign a role to the user or set the permissions to the user directly.
Assigning a role to the user:
- To view a list of all roles and the included permissions in these roles, we refer to the Microsoft documentation on this link
- In order to assign a role to user, we use the SQL statement within a database:
- ALTER ROLE <role_name> ADD MEMBER <database_principal>
- Example (on the database):
ALTER ROLE db_owner ADD MEMBER [ncattoir-test@ec-dataplatform.eu]
- In order to remove a role from user, we use the SQL statement within a database:
ALTER ROLE <role_name> DROP MEMBER <database_principal>
- Example:
ALTER ROLE db_owner DROP MEMBER ncattoir-test@ec-dataplatform.eu
- Additionally, roles could be created, server roles can be assigned and many more. This is, however, outside of the scope of this document. You can certainly find a lot of information on this in the Microsoft documentation online.
Assigning permissions to the user:
- To view the Microsoft documentation on which permissions can be assigned in a database, refer to the Microsoft documentation on this link
- In order to add a permission to a user, use the following SQL statement:
- GRANT <permission(s)> TO <database_principal>
- Example:
GRANT CREATE TABLE, INSERT, SELECT TO [ncattoir-test@ec-dataplatform.eu]
- In order to remove a permission from a user, use the following SQL statement:
REVOKE <permission(s)> TO <database_principal>
- Example:
REVOKE CREATE TABLE TO [ncattoir-test@ec-dataplatform.eu]
MySQL#
Connecting to your MySQL Server#
In order to connect to your database, you need to be inside the private network of the EC Data Platform. This can be done by connecting to your Amazon WorkSpaces, Azure Windows Virtual Desktop or by using a VPN connection. Please see the section on Accessing a Data Science Lab environment for more info.
Once connected to the EC Data Platform, we need to use a tool to connect to the database. In this example we will be using MySQL Workbench, but you can also use other MySQL tools as you prefer.
Step 1: Open MySQL Workbench (this is installed in the latest version of Amazon WorkSpaces and Azure Windows Virtual Desktop).
Step 2: Once MySQL Workbench is opened, click in the upper left corner on “Database”. Then click on the “Connect to Database…” option.
Step 3: In the “Manage Server Connections” plane, click on New.
Next, provide the following parameters:
-
Connection name: \
-
Connection Method: Standard (TCP/IP)
-
Parameters:
-
Hostname: \
- Example:
testppproject-dev-db-mysql.mysql.database.azure.com
- Example:
-
Username: \
@\ - Example:
nicolas@testppproject-dev-db-mysql
- Example:
-
Password: Click on “Store in Vault” and provide the password that was provided to you
-
-
SSL:
- Use SSL: Require
Next, click on Test Connection in the lower right of the “Manage Server Connections” plane. If all went successful, you should see a popup indicating a successful test.
Finally, close the “Manage Server Connections” plane.
Step 4: In the upper left corner, select Database and select the “Connect to Database…” option.
Step 5: Next, select the connection that you created in Step 3 and click on “Ok”
The management plane of the MySQL Database should appear:
Adding users to access the MySQL database#
MySQL does not offer AAD integration in the EC Data Platform. This means that user accounts need to be created in the MySQL database.
In order to add users to access the database, the following SQL queries should be executed, by replacing the “newuser” with the preferred username and “user_password” with the preferred user password:
CREATE USER 'newuser'@'%' IDENTIFIED BY 'user_password';
(Note: the ‘%’ is used as a wildcard for the source IP address that the user is connecting from. If ‘%’ is used, the user can log in from any IP address)
After executing this query, you should see that the query was successfully executed:
This will allow the selected user to log in using the same process as described in Connecting to your MySQL Server.
Granting permissions to the users of the MySQL databases#
Once the users have been created in the previous step, these users need to be granted permissions in order to read, write or perform other operations on the MySQL Database. This can be done by setting the permissions to the user directly. Additionally, this can also be done by assigning permissions to created roles and assigning these roles to users. For this last way of working, please refer to the MySQL documentation around creating roles.
Assigning permissions to the user:
-
To view the MySQL documentation on which permissions can be assigned in a database, refer to the MySQL documentation on this link
-
In order to add a permission to a user, use the following SQL statement:
-
GRANT <permission(s)> ON <schema name>.<table name> TO <database_username>
-
Note that the wildcard character: ‘*’ can be used to refer to all tables or all schemas
-
Note also that not all permissions can be granted on the default “sys” schema
-
Example:
-
GRANT SELECT ON *.* TO newuser
-
In order to remove a permission from a user, use the following SQL statement:
-
REVOKE <permission(s)> ON <schema name>.<table name> FROM <database_username>
-
Example:
-
REVOKE SELECT ON new_schema.* FROM newuser
PostgreSQL#
Connecting to your PostgreSQL Server#
In order to connect to your database, you need to be inside the private network of the EC Data Platform. This can be done by connecting to your Amazon WorkSpaces or by using a VPN connection. Please see the section on Accessing a Data Science Lab environment for more info.
Once connected to the EC Data Platform, we need to use a tool to connect to the database. In this example we will be using pgAdmin 4, but you can also use other MySQL tools as you prefer.
Step 1: Open pgAdmin 4 Workbench (this is installed in the latest version of Amazon WorkSpaces).
Step 2: Once pgAdmin is opened, right-click in the upper left corner on “Servers”. Then follow “Create” and “Server…”.
Step 3: A popup will open, where you can provide the following details.
In the “General” tab:
- Name: \
Next, provide the following parameters in the “Connection” tab:
- Host name/address: \- Example:
nicolas@testppproject-dev-db-postgres`
- Password: \
Finally, change the SSL mode in the “SSL” tab and click on “Save”: - SSL mode: Require
Step 4: If the connection details were correct, you will see the database under the “Servers” in the left plane.
Queries can now be executed, by right clicking the database and selecting “Query tool”
Adding users to access the database#
PostgreSQL does not offer AAD integration in the EC Data Platform. This means that user accounts need to be created in the PostgreSQL database.
In order to add users to access the database, the following queries should be executed, by replacing the “newuser” with the preferred username and “user_password” with the preferred user password:
CREATE USER newuser WITH PASSWORD ' user_password';
After executing this query, you should see that the query was successfully executed:
This will allow the selected user to log in using the same process as described in Connecting to your PostgreSQL Server.
Granting permissions to the users of the PostgreSQL databases#
Once the users have been created in the previous step, these users need to be granted permissions in order to read, write or perform other operations on the PostgreSQL Database. This can be done by setting the permissions to the user directly. Additionally, this can also be done by assigning permissions to created roles and assigning these roles to users. For this last way of working, please refer to the PostgreSQL documentation around creating roles.
Assigning permissions to the user:
- To view the PostgreSQL documentation on which permissions can be assigned in a database, refer to the PostgreSQL documentation on this link
- In order to add a permission to a user in a table, use the following SQL statement:
- GRANT \ TO \
GRANT INSERT ON test TO newuser
- Note that the “test” table had to be created for this use
FROM \
REVOKE INSERT ON test FROM newuser