In this article I’ll describe how to parse and extract data from an RDF file using Jo Walsh‘s RDF::Simple::Parser module in Perl.
RDF::Simple::Parser does what it says on the tin, it provides a simple way to parse RDF. Unfortunately, that can make it hard to extract data. All it returns from a successful parse of the RDF file, is what Jo calls a “bucket-o-triples”. This is just an array of arrays. The first array contains an list of all the triples. The second array contains the actual triples broken down so Subject is in position 0, Predicate is in position 1 and Object in position 2.
Let’s define these as constants in Perl as they’re not going to be changing.
use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;
I’m going to use my usual example of my parsing my FOAF file, and I’ll be extracting the addresses of my friend’s FOAF files from it. See the example in What Is An RDF Triple, for a full breakdown of this.
We’ll define the two predicates we need to look for as constants.
use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';
We need to load in the FOAF file, so we’ll take advantage of File::Slurp’s read_file method to do this and put it in a variable called $file.
my $file = read_file('./foaf.rdf');
Before we can use RDF::Simple::Parser, we need to create an instance of it. I’ll set the base address to www.robertprice.co.uk in this case.
my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');
Now we have the instance, we can pass in our FOAF file for parsing and get back our triples.
my @triples = $parser->parse_rdf($file);
Let’s take a quick look at my FOAF file to get an example triple.
I know Cal Henderson, and this is represented in my FOAF file as…
<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>
Using the RDF validator we can get a the list of triples represented in this piece of RDF.
Triple | Subject | Predicate | Object |
---|---|---|---|
1 | genid:ARP40722 | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://xmlns.com/foaf/0.1/Person |
2 | genid:ARP40722 | http://xmlns.com/foaf/0.1/nick | "Cal" |
3 | genid:ARP40722 | http://xmlns.com/foaf/0.1/name | "Cal Henderson" |
4 | genid:ARP40722 | http://xmlns.com/foaf/0.1/mbox_sha1sum | "2971b1c2fd1d4f0e8f99c167cd85d522a614b07b" |
5 | genid:ARP40722 | http://www.w3.org/2000/01/rdf-schema#seeAlso | http://www.iamcal.com/foaf.xml |
6 | genid:me | http://xmlns.com/foaf/0.1/knows | genid:ARP40722 |
The part we are interested are triples 5 and 6. We can see that triple 6 has Predicate value the same as our KNOWS_PREDICATE
constant, and triple 5 has the Predicate value of our SEEALSO_PREDICATE
constant. The part this links the two is that triple 6 has the Object value of triple 5’s Subject.
We know if we search for triples with the same predicate as our KNOWS_PREDICATE
we’ll get triples that are to do with people I know. We can use Perl’s grep
function to get these triples, then we can interate over them in a foreach
loop.
foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {
We are only interest in the triples that have the same Subject as matching triple’s Object. Again, we can use grep
to get these out so we can interate over them.
foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {
Now we just need to make sure that the triple’s Predicate matches our SEEALSO_PREDICATE
constant, and if it does, we can print out the value of it from it’s Object.
if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
print $triple->[OBJECT], "n"
}
Let’s put this all together into a working example…
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use RDF::Simple::Parser;
## constants defining position of triple components in
## RDF::Simple triple lists.
use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;
## some known predicates.
use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';
## read in my foaf file and put it in $file.
my $file = read_file('./foaf.rdf');
## create a new parser, using my domain as a base.
my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');
## parse my foaf file, and return a list of triples.
my @triples = $parser->parse_rdf($file);
## iterate over a list of triples matching the KNOWN_PREDICATE value.
foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {
## iteratve over a list of triples that have the same subject
## as one of our KNOWN_PREDICATE triples object.
foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {
## find triples that match the SEEALSO_PREDICATE
if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
## print out the object, should be the address
## of my friends foaf file.
print $triple->[OBJECT], "n"
}
}
}
The example will load in the FOAF file, parse it and print out any friends of mine that have FOAF files defined by the seeAlso predicate.