Skip to content

Linked Data Service Offering

The server is deployed using a preconfigured Virtuoso AMI that provides immediate access to a wealth of data access, integration, and data management functionalities, upon installation. The deployment procedure must be followed by the installation of the Virtuoso server to make the service available for use. The features include:

  • XML document storage & creation
  • Web page hosting
  • Web services creation & hosting
  • WebDAV compliant web store
  • Content replication & synchronization
  • Transparent access to heterogeneous data
  • Mail delivery & retrieval services
  • NNTP aggregation & serving

Within the Linked Data service offering, there are two environments to be found: the test and the production environment.

The test environment includes storage, a data ingestion pipeline, and a VM hosting OpenLink Virtuoso licensed software. The purpose of this environment is to test new use cases to prevent overloading the production server. Once it has been established that the server won’t get overloaded when performing your tasks, you can proceed your day-to-day workloads on the production server.

Similarly, as the test environment, the production environment includes storage, and a data ingestion pipeline. The production environment also includes a cluster of VMs hosting Openlink Virtuoso licensed software. In general, the production environment has a stronger architectural framework and more servers. Therefore, it is also more performant and less prone to failure.

Service offering considerations and constraints#

The current constraints and considerations regarding the service offering are on the number of tenants, concurrent users, memory allocation, and data load.

Number of tenants#

There is a maximum number of shared nodes in both production and test environment. The production environment exists out of three nodes, while the test environment is hosted on only one node. This first constraint can be circumvented by increasing current implemented license. An agreement between the requestor and DIGIT must be found, however, on the purchase of such license.

Concurrent Users/Queries#

DIGIT’s Linked Data production environment includes two load-balanced read-only instances that will be used to distribute query load. Each instance allows unrestricted connections.

Memory Allocation#

To ensure the best possible performance, we constrain the total graph size per tenant to fit comfortably in the working memory of the server. The sizing of these instances brings us to the second constraint: basically, allocated memory ultimately determines the maximum number of triples threshold. A resize of the machines can be requested, however, machines are currently shared with other users. An agreement must be reached with other users, or a dedicated environment must be purchased by the end user (see point 1).

Ingestion of Triples#

DIGIT’s Linked Data offering includes a custom-built ingestion mechanism, through which you will be able to upload linked data (i.e., data must be uploaded on an AWS S3 which will trigger the automatic ingestion mechanism to store the data in an AWS EBS database) in a number of supported file formats. This architecture lies at the root of our third constraint: Data files will be loaded asynchronously into the Virtuoso cluster.

You will need to be able to provide the linked data in file format.

Pull-based data integration services will be offered at a later stage.

You will receive API keys to perform the uploads to AWS S3 programmatically. To ensure proper performance, we limit the weekly triple delta to 200 million triples per week. This constraint does not apply to the initial loading of your dataset.