Parsing RDF In Perl With RDF::Simple

In this article I’ll describe how to parse and extract data from an RDF file using Jo Walsh‘s RDF::Simple::Parser module in Perl.

RDF::Simple::Parser does what it says on the tin, it provides a simple way to parse RDF. Unfortunately, that can make it hard to extract data. All it returns from a successful parse of the RDF file, is what Jo calls a “bucket-o-triples”. This is just an array of arrays. The first array contains an list of all the triples. The second array contains the actual triples broken down so Subject is in position 0, Predicate is in position 1 and Object in position 2.

Let’s define these as constants in Perl as they’re not going to be changing.

use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;

I’m going to use my usual example of my parsing my FOAF file, and I’ll be extracting the addresses of my friend’s FOAF files from it. See the example in What Is An RDF Triple, for a full breakdown of this.

We’ll define the two predicates we need to look for as constants.

use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';

We need to load in the FOAF file, so we’ll take advantage of File::Slurp’s read_file method to do this and put it in a variable called $file.

my $file = read_file('./foaf.rdf');

Before we can use RDF::Simple::Parser, we need to create an instance of it. I’ll set the base address to www.robertprice.co.uk in this case.

my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');

Now we have the instance, we can pass in our FOAF file for parsing and get back our triples.

my @triples = $parser->parse_rdf($file);

Let’s take a quick look at my FOAF file to get an example triple.

I know Cal Henderson, and this is represented in my FOAF file as…

<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>

Using the RDF validator we can get a the list of triples represented in this piece of RDF.


Triple Subject Predicate Object
1 genid:ARP40722 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
2 genid:ARP40722 http://xmlns.com/foaf/0.1/nick "Cal"
3 genid:ARP40722 http://xmlns.com/foaf/0.1/name "Cal Henderson"
4 genid:ARP40722 http://xmlns.com/foaf/0.1/mbox_sha1sum "2971b1c2fd1d4f0e8f99c167cd85d522a614b07b"
5 genid:ARP40722 http://www.w3.org/2000/01/rdf-schema#seeAlso http://www.iamcal.com/foaf.xml
6 genid:me http://xmlns.com/foaf/0.1/knows genid:ARP40722

The part we are interested are triples 5 and 6. We can see that triple 6 has Predicate value the same as our KNOWS_PREDICATE constant, and triple 5 has the Predicate value of our SEEALSO_PREDICATE constant. The part this links the two is that triple 6 has the Object value of triple 5’s Subject.

We know if we search for triples with the same predicate as our KNOWS_PREDICATE we’ll get triples that are to do with people I know. We can use Perl’s grep function to get these triples, then we can interate over them in a foreach loop.

foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {

We are only interest in the triples that have the same Subject as matching triple’s Object. Again, we can use grep to get these out so we can interate over them.

foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {

Now we just need to make sure that the triple’s Predicate matches our SEEALSO_PREDICATE constant, and if it does, we can print out the value of it from it’s Object.

if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
print $triple->[OBJECT], "n"
}

Let’s put this all together into a working example…

#!/usr/bin/perl -w
use strict;
use File::Slurp;
use RDF::Simple::Parser;
## constants defining position of triple components in
## RDF::Simple triple lists.
use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;
## some known predicates.
use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';
## read in my foaf file and put it in $file.
my $file = read_file('./foaf.rdf');
## create a new parser, using my domain as a base.
my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');
## parse my foaf file, and return a list of triples.
my @triples = $parser->parse_rdf($file);
## iterate over a list of triples matching the KNOWN_PREDICATE value.
foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {
## iteratve over a list of triples that have the same subject
## as one of our KNOWN_PREDICATE triples object.
foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {
## find triples that match the SEEALSO_PREDICATE
if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
## print out the object, should be the address
## of my friends foaf file.
print $triple->[OBJECT], "n"
}
}
}

The example will load in the FOAF file, parse it and print out any friends of mine that have FOAF files defined by the seeAlso predicate.