Skip to content

Onboarding Linked Data Platform#

The Linked Data service offered by DIGIT is part of the EC Data Platform. It is currently utilizing a Virtuoso Universal Server cluster. The offering is based on a particular license and hosting configuration at AWS.

To request the service, please send via email answers to the following questions:

  • How many concurrent connection threads do you expect to use at peak?
  • (If applicable) How many concurrent connection threads do you expect on average?
  • What is the estimated number of triples you will load?
  • What is the estimated maximum number of triples you will add/edit per week?
  • Do you have other requirements/questions?

The current constraints are indeed on the number of tenants, concurrent users, memory allocation, and data load. In particular: * There will be a maximum of 3 tenants per cluster.

The rest of the parameters are described below. More info on the specifics of the Virtuoso component can be found on the dedicated Virtuoso page.

1. Concurrent Users/Queries#

DIGIT's Linked Data offering includes two load-balanced read-only instances that will be used to distribute query load. This determines the first constraint: * Each instance allows ten concurrent connections. Two of those are reserved for internal processes, which means 2 x 8 = 16 concurrent query pools are available.

If your use-case consistently exceeds ten queries running against the read instances at any given time, we will not be able to accommodate you at this time. Will this limit restrict your potential usage? If not, what is the estimated number of concurrent query pools you will be using? If possible, provide a prospective peak and average figure.

2. Memory Allocation#

In order to ensure the best possible performance, we constrain the total graph size per tenant to fit comfortably in the working memory of the server. The sizing of these instances brings us to the second constraint: * We foresee a maximum of 1 billion triples per tenant.

This constraint can be increased on an ad-hoc basis, depending on the nature of your use-case (for example whether it spans multiple graphs).

3. Ingestion of Triples#

DIGIT's Linked Data offering includes a custom-built ingestion mechanism, through which you will be able to upload linked data in a number of supported file formats. This architecture lies at the root of our third constraint: * Data files will be loaded asynchronously into the Virtuoso cluster.

You will need to be able to provide the linked data in file format.

  • Pull-based data integration services will be offered at a later stage.

You will receive API keys to perform these uploads programmatically. To ensure proper performance, we limit the weekly triple delta to 200 million triples per week. This means that while we support graphs of up to 1 billion triples, as per constraint 2: * We limit the number of triples added/edited per week to 200 million triples.

This constraint does not apply to the initial loading of your dataset.