Querying RDF In Perl With RDF::Core

Extracting data from an RDF file seems like an easy job for Perl.

I have discussed how to parse RDF in Perl using the RDF::Core module before. In that example I used the getStmts functionality. This is fine for simple things, but it’s a bit messy. Thankfully RDF::Core provides us with a query language similar to RDQL to help us get the data we want.

Let’s take a quick look at my FOAF file for some example RDF to query. I know Cal Henderson, and this is represented in my FOAF file as…

<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>

For a detailed explantion of the above RDF, have a look at my previous article – What Is An RDF Triple?

How could I find Cal’s foaf file using RDF::Core’s query language? It’s really simple and we’d use the following query.

select ?x->rdfs:seeAlso,
from ?x->foaf:nick{?y}
where ?y='Cal'

That just means we want to select the value of the triple rdfs:seeAlso where the parent triple has the foaf:nick of the value of Cal.

Why stop there? Lets get some more information on Cal.

select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum
from ?x->foaf:nick{?y}
where ?y='Cal'

Now we have Cal’s name, nickname, address of his FOAF file and the checksum of his email address.

Let’s see how to get this all in Perl now.

We need to setup some RDF::Core objects and parse the FOAF file before we can query it. We’ll use this code to do so.

## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;

For a full explanation of the above code, have a look at my previous article – How To Parse RDF In Perl.

We now need to add the new code.

Firstly we need to a RDF::Core::Evaluator and a RDF::Core::Query object. These two objects handle the querying of the data held in our $model.

## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);

As we have a query object, we can just insert our query statement using it’s query method.

## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");

This executes the query and returns the results as a reference to an array in the $results variable. We need to use the first arrayref returned in $results to get the data we need. This will be a list of RDF::Core::Literal or RDF::Core::Resource objects. As these both inherit from RDF::Core::Node, let’s just use that object’s getLiteral method to return the string values of the data we need and print it out.

## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}

This will return us the following data.

Cal Henderson
Cal
http://www.iamcal.com/foaf.xml
2971b1c2fd1d4f0e8f99c167cd85d522a614b07b

Here’s the final code in all it’s glory.

#!/usr/bin/perl -w
use strict;
use RDF::Core::Evaluator;
use RDF::Core::Model;
use RDF::Core::Model::Parser;
use RDF::Core::NodeFactory;
use RDF::Core::Query;
use RDF::Core::Resource;
use RDF::Core::Storage::Memory;
## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);
## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");
## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}

If you see warnings (when running using -w or use warnings;) from RDF::Core about use of an uninitialised value before the results are printed don’t worry. This is just a small bug in RDF::Core and won’t affect the running of the code.

Parsing RDF In Perl With RDF::Core

It is often said that parsing RDF files is hard. It’s not. What most people really find hard is turning the RDF/XML into a series of triples and extracting data from them.

When stored in RDF/XML, triples can be written in a variety of various ways, but still mean the same thing. This means using an XML parser may not always give you the same results. To solve this we use an RDF parser to get the triples for us. An RDF parser is usually built on top of an existing XML parser but with enough logic to know how to turn the data contained in an XML file into a list of triples correctly.

There are various RDF parsers for Perl. These include RDF::Core, RDF::Simple, RDF::Redland and several others.

Parsing is all well and good, each one of the above can give us a list of triples. What we really need is a way to query the data to extract the data we need.

In this article I’m going to use RDF::Core.

RDF::Core is a bit of a beast and doesn’t have a very Perl’ish interface.

I’m going to parse my FOAF file and extract a list of my friends FOAF files.

FOAF means Friend of a Friend and it allows you to describe relationships between people in a machine readable RDF format. Details of it can be found on the FOAF project website.

So how do we do this using RDF::Core?

Firstly we need to set it up. We’ll need to a model and define the storage method we want. RDF::Core allows data to be stored in a database, DB_File or memory. In this example I’ll just use memory as it’s the simplest.

my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store;

Now we need create a parser, and parse the the RDF file. To do this we pass in our Model, BaseURI, Source, SourceType of the RDF data. In this example I’m going to use the file foaf.rdf so I’ll have to set the SourceType parameter to file. The BaseURI is used to resolve relative URI’s, so we’ll use the FOAF namespace URI http://xmlns.com/foaf/0.1/ for this. Once we have our parser built, we just call it’s parse method.

my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;

We now have a list of triples in our memory store. That’s great, but we want to find out if any of my friends have FOAF files.

If you look at the example in my previous article on RDF triples, you’ll know that my friends details are referenced by the #knows predicate. So we need to find out all the triples that have that as a predicate and get their objects. Once we have these, we can look for all triples that have this as their subject and also have the predicate of #seeAlso. The object of all these triples will be the address of my friend’s FOAF files.

Refer to my article What Is An RDF Triple for a detailed explanation and commented example of the above.

RDF::Core needs all resources to be created as RDF::Core::Resource objects. We’ll need to create resources for #seeAlso and #knows if we are to query with them, so lets do that now.

my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');

We can use the getStmts method in the model to get triples matching a given subject, predicate or object. We can either pass in RDF::Core::Resource’s with the values we want to check against, or undef if we want to match everything. This returns an enumerator that we can use to get each matching triple in turn.

To start with we need to get a list of any triples that match our #knows predicate. This is done like this.

my $knows_enum = $model->getStmts(undef, $knows, undef);

Next we need to enumerate over any triples we have.

my $statement = $knows_enum->getFirst;
while (defined $statement) {
my $knows_object = $statement->getObject;
## code to handle each statement goes here
$statement = $knows_enum->getNext;
}

Now we know have a list of triples whose object represents the subject of the triples we want to query, We need to match those with this value as the subject and with the predicate of #seeAlso.

We get get the triples using code very similar to code just used. If we insert the following code into the while loop, we can extract this information.

my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_object = $seealso_enum->getFirst;
if (defined $seealso_object) {
print $seealso_object->getObject->getLabel, "n";
}

As we only wanted the first #seeAlso value per #knows triple, we don’t have to worry about going through every object in the enumerator, only the first.

Putting it all together, we have a simple script to parse a FOAF file, and print out address of our friend’s FOAF files.

#!/usr/bin/perl -w
## Extract a list of seeAlso triples from a FOAF file
## Robert Price - http://www.robertprice.co.uk/
use strict;
use RDF::Core::Model;
use RDF::Core::Storage::Memory;
use RDF::Core::Model::Parser;
use RDF::Core::Resource;
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create a resource for seeAlso and knows triples.
my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');
## create an enumerator with all the knows triples.
my $knows_enum = $model->getStmts(undef, $knows, undef);
## enumerate over each knows triple.
my $statement = $knows_enum->getFirst;
while (defined $statement) {
## get the object of the current triple.
my $knows_object = $statement->getObject;
## look for subject of the enumerator (knows), and predicate of
## seealso
my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_obj = $seealso_enum->getFirst;
## if it has a seealso triple, show the value of it.
if (defined $seealso_obj) {
print $seealso_obj->getObject->getLabel, "n";
}
## get the next knows statement.
$statement = $knows_enum->getNext;
}

I hope this has started to demystify RDF parsing with Perl.

Serving MS Excel Documents From A Perl CGI Script

How can you serve a Microsoft Excel spreadsheet from a Perl CGI script?

It’s actually quite easy, and just a case of sending the right headers.

Assuming we have our raw binary Excel sheet in a variable called $excel, we can use this simple block of code to allow it to be downloaded from a web script with the filename of text.xls.

print "content-type: application/vnd.ms-exceln";
print "content-disposition: attachment; filename=text.xlsnn";
print $excel;

This will prompt the user with a file download box asking where to store text.xls.

The key here is the content-disposition tag. We could change the content-disposition to inline to try to force the browser to open the Excel document in browser itself, but that’s not really very friendly.

Alternating Table Rows With Template Toolkit And CSS

Whilst looking at a potential recode of the Smash Hits Chart website, I came across the need to make each alternate row in the main chart table a different colour.

As the recode will be using Template Toolkit, this is simple.

The chart is passed as a list, so we have to iterate over it to get each entry out. I do this using a simple [% FOREACH %]. Template Toolkit thoughtfully provides us with a Template::Iterator object called loop inside the FOREACH block, so we can use the index() method to get the current elements list position. Using this, we just need to check if the value is odd or even to decide what colour to make the row.

Let’s see some sample code. To change the row’s colour I’m going to use two CSS classes called lite and dark.

<table>
[% FOREACH entry = chart %]
<tr class="[% IF loop.index % 2 %]lite[% ELSE %]dark[% END %]">
<td>[% entry.position %]</td>
<td>[% entry.artist %]</td>
<td>[% entry.track %]</td>
<tr>
[% END %]
</table>

The magic is in the line loop.index % 2. It takes the remainer (modulus) of loop.index divided by 2. This returns either 0 or 1, which is what Template Toolkit uses to determine truth, so can be used directly in the IF statement.