A LINQ provider for RDF files

The next provider I plan to upload here allows querying an RDF file. From the provider writer’s perspective there is a fundamental difference to the previous “Web page” provider: This provider uses IQueryable instead of IEnumerable and transforms .Net expression trees to query objects that can be executed to perform the query.

 

“RDF” stands for “Resource Description Framework”: RDF files express properties of and relations between “resources” like web pages, articles, or in fact any objects that can be characterized by a URI – and since you can define URIs for everything, RDF can describe anything.

You can find more material about RDF at https://planetrdf.com/guide/, you can see examples for RDF files describing

- parts of the above Web site at https://planetrdf.com/guide/rss.rdf (RSS 1.0 is an RDF format)

- recordings of the Kronos Quartet at
https://musicbrainz.org/mm-2.1/artist/f5586dfa-7031-4af0-8042-19b6a1170389/6  

- geographical / political / statistical data about Germany at
https://www.daml.org/2003/09/factbook/gm

 

The basic data structure used by an RDF file is a list of “(subject, predicate, object)” triples, where the subject and predicate have to be “resources”, while the object can be either a resource or a literal value (which we will just interpret as a string).

This RDF structure can be stored in different ways; the most common is an XML format. There is an open source RDF parser called “Drive” (written in C#) which reads in such a RDFXML file and exposes the content as triples; you can download it at https://www.driverdf.org/.

To see how the above RDF file stores the names of the states of Germany, the LINQ provider will allow you to use the following query:

 

...
RdfXmlDoc rdfGermany = new RdfXmlDoc(fileGermany);
ISolver solver = new SimpleSolver.Solver();
Rdf rdf = new Rdf(solver);
rdf.LoadRdf(rdfGermany);

string nsRdf = "https://www.w3.org/1999/02/22-rdf-syntax-ns#";

...

string isOfType = nsRdf + "type";

string germany = nsCountries + "GM";

string hasAdminDiv = nsFactbookOnt + "administrativeDivision";

string hasName = nsFactbookOnt + "name";

string germanState = rdfGermany.BaseNamespace + "#State";

 

IQueryable<string> q = from x in rdf

         from y in rdf

        where rdf.A(germany, hasAdminDiv, x)

           && rdf.A(x, isOfType, germanState)

           && rdf.A(x, hasName, y)
select y.Val + " [" + x.Val + "]";

 

RDF and additional layers on top of RDF (RDFS, OWL, OWL Rules) specify how additional triples can be inferred from the assertions in the RDF (OWL, …) files, but in this example this is not used, the rdf.A(…)expressions only consider the assertions in the RDF file themselves.

 

In the next blog entry I will describe the translation from LINQ queries to “RdfExpression”s which can be executed to efficiently search the Rdf structure for the required matches.

 

(To be continued…)