Parsing RDF In Perl With RDF::Core

It is often said that parsing RDF files is hard. It’s not. What most people really find hard is turning the RDF/XML into a series of triples and extracting data from them.

When stored in RDF/XML, triples can be written in a variety of various ways, but still mean the same thing. This means using an XML parser may not always give you the same results. To solve this we use an RDF parser to get the triples for us. An RDF parser is usually built on top of an existing XML parser but with enough logic to know how to turn the data contained in an XML file into a list of triples correctly.

There are various RDF parsers for Perl. These include RDF::Core, RDF::Simple, RDF::Redland and several others.

Parsing is all well and good, each one of the above can give us a list of triples. What we really need is a way to query the data to extract the data we need.

In this article I’m going to use RDF::Core.

RDF::Core is a bit of a beast and doesn’t have a very Perl’ish interface.

I’m going to parse my FOAF file and extract a list of my friends FOAF files.

FOAF means Friend of a Friend and it allows you to describe relationships between people in a machine readable RDF format. Details of it can be found on the FOAF project website.

So how do we do this using RDF::Core?

Firstly we need to set it up. We’ll need to a model and define the storage method we want. RDF::Core allows data to be stored in a database, DB_File or memory. In this example I’ll just use memory as it’s the simplest.

my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store;

Now we need create a parser, and parse the the RDF file. To do this we pass in our Model, BaseURI, Source, SourceType of the RDF data. In this example I’m going to use the file foaf.rdf so I’ll have to set the SourceType parameter to file. The BaseURI is used to resolve relative URI’s, so we’ll use the FOAF namespace URI http://xmlns.com/foaf/0.1/ for this. Once we have our parser built, we just call it’s parse method.

my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;

We now have a list of triples in our memory store. That’s great, but we want to find out if any of my friends have FOAF files.

If you look at the example in my previous article on RDF triples, you’ll know that my friends details are referenced by the #knows predicate. So we need to find out all the triples that have that as a predicate and get their objects. Once we have these, we can look for all triples that have this as their subject and also have the predicate of #seeAlso. The object of all these triples will be the address of my friend’s FOAF files.

Refer to my article What Is An RDF Triple for a detailed explanation and commented example of the above.

RDF::Core needs all resources to be created as RDF::Core::Resource objects. We’ll need to create resources for #seeAlso and #knows if we are to query with them, so lets do that now.

my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');

We can use the getStmts method in the model to get triples matching a given subject, predicate or object. We can either pass in RDF::Core::Resource’s with the values we want to check against, or undef if we want to match everything. This returns an enumerator that we can use to get each matching triple in turn.

To start with we need to get a list of any triples that match our #knows predicate. This is done like this.

my $knows_enum = $model->getStmts(undef, $knows, undef);

Next we need to enumerate over any triples we have.

my $statement = $knows_enum->getFirst;
while (defined $statement) {
my $knows_object = $statement->getObject;
## code to handle each statement goes here
$statement = $knows_enum->getNext;
}

Now we know have a list of triples whose object represents the subject of the triples we want to query, We need to match those with this value as the subject and with the predicate of #seeAlso.

We get get the triples using code very similar to code just used. If we insert the following code into the while loop, we can extract this information.

my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_object = $seealso_enum->getFirst;
if (defined $seealso_object) {
print $seealso_object->getObject->getLabel, "n";
}

As we only wanted the first #seeAlso value per #knows triple, we don’t have to worry about going through every object in the enumerator, only the first.

Putting it all together, we have a simple script to parse a FOAF file, and print out address of our friend’s FOAF files.

#!/usr/bin/perl -w
## Extract a list of seeAlso triples from a FOAF file
## Robert Price - http://www.robertprice.co.uk/
use strict;
use RDF::Core::Model;
use RDF::Core::Storage::Memory;
use RDF::Core::Model::Parser;
use RDF::Core::Resource;
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create a resource for seeAlso and knows triples.
my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');
## create an enumerator with all the knows triples.
my $knows_enum = $model->getStmts(undef, $knows, undef);
## enumerate over each knows triple.
my $statement = $knows_enum->getFirst;
while (defined $statement) {
## get the object of the current triple.
my $knows_object = $statement->getObject;
## look for subject of the enumerator (knows), and predicate of
## seealso
my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_obj = $seealso_enum->getFirst;
## if it has a seealso triple, show the value of it.
if (defined $seealso_obj) {
print $seealso_obj->getObject->getLabel, "n";
}
## get the next knows statement.
$statement = $knows_enum->getNext;
}

I hope this has started to demystify RDF parsing with Perl.