Grayscaling Images With Perl

One thing that caught my interest today was how to convert a colour image into grayscale.

It turns out the basic algorithm is very simple. Basically it’s just…

grey = 0.15 * red + 0.55 * green + 0.30 * blue;

This can be turned into a Perl subroutine using the following code.

sub grayscale {
my ($r, $g, $b) = @_;
my $s = 0.15 * $r + 0.55 * $g + 0.30 * $b;
return int($s);
}

Here we pass in the RGB values of the colour we want to turn into gray. We apply the algorithm and return the integer value of gray.

The value we get for gray is used to replace each of the values for red, green and blue.

We can test this subroutine out with the help of the Perl GD module (available for free on CPAN).

#!/usr/bin/perl -w
use GD;
## grayscale subroutine
sub grayscale {
my ($r, $g,$b) = @_;
my $s = 0.15 * $r + 0.55 * $g + 0.30 * $b;
return int($s);
}
## create a new GD object with the data passed via STDIN
my $image = new GD::Image(*STDIN);
## iterate over the number of colours in the colour table
for (my $i = 0; $i < $image->colorsTotal(); $i++) {
## get the RGB values for the colour at index $i
my ($r, $g, $b) = $image->rgb($i);
## convert the RGB to grayscale
my $gray = grayscale($r,$g,$b);
## remove the original colour from the colour table
$image->colorDeallocate($i);
## add in the new gray
$image->colorAllocate($gray,$gray,$gray);
}
## make sure we output binary
binmode STDOUT;
## pass the image as a raw GIF to STDOUT
print $image->gif;

This code takes an image piped in from STDIN and outputs a grayscale GIF version of the image to STDOUT.

If the code was called convert.pl it would be called as ./convert.pl <test.gif >>test_result.gif.

Here’s a conversion I did earlier of a GIF image of Kitt, Bev and Justin at the Emap Performance Awards 2004 using the above Perl code.

Kitt, Bev and Justin in colour

Kitt, Bev and Justin in grayscale

CellTrack’ing Between Colchester And London

I’ve been looking at CellTrack program for series 60 phones recently.

This is a native series 60 Symbian application that can record details of the current mobile phone cell your phone is using. It also lets you annotate each cell if you want.

Celltrack is something I downloaded for my Nokia 7610 a while ago, and have just installed on the Nokia 6630.

Screenshot of CellTrack running on a Nokia 6630

On Monday, while the train was running slow, I had it running and started to annotate stations so I could tell where I was in the evening when it’s dark outside. CellTrack has a feature that allows you to log used cells to a flat tab seperated file. In my case, as I have the software installed on the 6630’s MMC card, the file can be found in the directory E:NokiaOthersCellTrack and copied off using the Nokia PC Suite.

Here’s the journey I took on Tuesday morning by train. I turned on CellTrack at Marks Tey station and had it running to just before the train pulled into Stratford station in East London.

Time Cell ID LAC Cell Name Description
07:26:08 12972 629 XXBC97 B Marks tey station
07:27:15 12973 629 XXBC97 C Approaching marks tey
07:27:35 8812 629 XXB881 B Approaching kelvedon
07:28:03 4340 629 XXB434 A no info
07:29:01 4339 629 XXB433 X Kelvedon station
07:29:25 4341 629 XXB434 A Approaching kelvedon
07:31:40 16772 629 XXBG77 B Between witham and kelvedon
07:32:10 16774 629 XXBG77 X Between kelvedon and witham
07:32:43 2084 629 XXB208 X Approaching witham
07:34:09 2086 629 XXB208 F Witham station
07:36:34 382 629 XXB038 B Approaching witham
07:37:15 2086 629 XXB208 F Witham station
07:37:55 7249 629 XXB724 X Hatfield Peveral station
07:38:33 7251 629 XXB725 A Approaching hatfield peveral
07:39:30 13877 629 XXBD87 G Approaching hatfield peveral
07:39:40 13878 629 XXBD87 X Between hatfield peveral and chelmsford
07:39:52 13879 629 XXBD87 X Between hatfield peveral and chelmsford
07:41:17 3910 629 XXB391 A Approaching chelmsford
07:41:37 3912 629 XXB391 B Approaching chelmsford
07:42:07 16055 629 XXBG05 E Chelmsford station
07:43:01 3877 629 XXB387 G Chelmsford station
07:43:52 16057 629 XXBG05 G Approaching chelmsford
07:44:10 3879 629 XXB387 X Approaching chelmsford
07:44:24 5282 629 XXB528 B Approaching chelmsford
07:44:46 16779 629 XXBG77 X Between chelmsford and ingatestone
07:44:58 16778 629 XXBG77 X Approaching chelmsford
07:45:08 16779 629 XXBG77 X Between chelmsford and ingatestone
07:45:31 16780 629 XXBG78 A no info
07:45:49 2073 629 XXB207 C Between chelmsford and ingatestone
07:46:01 367 629 XXB036 G Between chelmsford and ingatestone
07:46:11 12354 629 XXBC35 X Between ingatestone and chelmsford
07:46:25 12355 629 XXBC35 E Between ingatestone and chelmsford
07:47:03 2073 629 XXB207 C Between chelmsford and ingatestone
07:47:21 369 629 XXB036 X Approaching ingatestone
07:47:32 11240 105 XXBB24 A Approaching ingatestone
07:48:14 11242 105 XXBB24 B Ingatestone station
07:48:34 3755 105 XXB375 E Ingatestone station
07:49:14 3756 105 XXB375 F Between ingatestone and shenfield
07:49:30 11239 105 XXBB23 X Between shenfield and ingatestone
07:50:09 16872 105 XXBG87 B Approaching shenfield
07:50:35 16875 105 XXBG87 E Approaching shenfield
07:50:49 3661 105 XXB366 A Approaching shenfield
07:51:42 3662 105 XXB366 B Shenfield station
07:51:54 3663 105 XXB366 C Shenfield station
07:55:03 531957 0 XXB-76 X ?:no info
07:55:25 531957 65535 XXB-76 X ?:no info
07:55:59 0 0 XXB000 A ?:no info
07:56:50 7240 105 XXB724 A no info
07:57:26 3788 105 XXB378 X no info
07:57:52 3789 105 XXB378 X Approaching gidea park
07:58:09 2068 105 XXB206 X no info
07:58:19 16035 105 XXBG03 E Gidea park station
07:59:31 19568 105 XXBJ56 X no info
07:59:45 5057 105 XXB505 G no info
08:00:16 197140 3008 XXB-12 F *:Gidea park station
08:01:09 10925 105 XXBA92 E no info
08:01:26 5058 105 XXB505 X Approaching gidea park
08:01:59 6249 700 XXB624 X Approaching gidea park
08:02:18 1381 700 XXB138 A no info
08:02:30 197214 3009 XXB-69 A no info
08:03:19 4829 700 XXB482 X no info
08:03:23 8611 600 XXB861 A Seven kings station
08:03:49 7748 600 XXB774 X no info
08:04:49 11170 700 XXBB17 A Approaching ilford
08:05:17 9724 600 XXB972 X Manor park station
08:05:39 3325 600 XXB332 E Approaching manor park
08:06:02 9726 600 XXB972 F Manor park station
08:06:16 17536 600 XXBH53 F Approaching forest gate
08:06:44 17535 600 XXBH53 E Forest gate station
08:07:55 1335 600 XXB133 E no info
08:08:19 14197 600 XXBE19 G no info
08:08:38 10334 700 XXBA33 X Maryland station

So what do some of the columns mean? Well Cell ID is the ID taken from the actual cell. LAC means the location area code of the cell. I’m not sure what Cell Name actually is, the CellTrack site says it comes from the cell broadcast as I have a service number set. The description is the text I entered to give a rough location to the cell.

As I said before, the log file has the data in tab seperated format. The data is recorded in the following order…

  1. Date
  2. Time
  3. Cell ID
  4. LAC
  5. Country
  6. Net
  7. Signal
  8. Signal dBm
  9. Cell Name
  10. Description

This makes it very easy for us to write a data extractor using Perl. Here’s the code I used to generate the table above.

#!/usr/bin/perl -w
use strict;
## Perl script to parse the CellTrack trace.log file, and split selected
## contents into an HTML table.
## Robert Price - rob@robertprice.co.uk - March 2005
## start the table, and print out a table header.
print "<table>n";
print " <tr><th>Time</th><th>Cell ID</th><th>LAC</th><th>Cell Name</th><th>Description</th></tr>n";
## iterate over each line, placing the contents in $line.
while (my $line = <>) {
## clean up the data a bit.
chomp($line); # loose trailing linefeeds.
$line =~ s/r//g; # loose any rogue carriage returns.
$line =~ s/t */t/g; # remove preceeding spaces from data.
## split the data in $line into variables.
my ($date,$time,$cellid,$lac,$country,$net,$strength,$dBm,$cellname,$description) = split(/t/,$line);
## create a copy of $time, and format it so it has colons between hours and minutes.
my $nicetime = $time;
$nicetime =~ s/(d{2})(d{2})(d{2})/$1:$2:$3/g;
## print out the data we're interested in.
print " <tr><td><a link="$time" />$nicetime</td><td>$cellid</td><td>$lac</td><td>$cellname</td><td>$description</td></tr>n";
}
## close the table.
print "</table>n";

You may have noticed I didn’t bother to print the country or network used. Well that’s because it’s always the same for me. The country is 234 (UK) and the network is 33 (Orange). This may be more interesting when travelling abroad and using roaming.

WSSE Authentication For Atom Using Perl

Atom uses the WSSE authentication for posting and editing weblogs.

Mark Pilgrim explains more about this in layman’s terms in an old XML.com article, Atom Authentication.

This information is passed in an HTTP header, for example…

HTTP_X_WSSE UsernameToken Username="robertprice", PasswordDigest="l7FbmWdq8gBwHgshgQ4NonjrXPA=", Nonce="4djRSlpeyWeGzcNgatneSA==", Created="2005-2-5T17:18:15Z"

We need 4 pieces of information to create this string.

  1. Username
  2. Password
  3. Nonce
  4. Timestamp

A nonce is a cryptographically random string in this case, not the word Clinton Baptiste gets in Phoenix Nights (thanks to Matt Facer for the link). In this case, it’s encoded in base64.

The timestamp is the current time in W3DTF format.

The for items are then encoded together to form a password digest that is used for the verification of the authenticity of the request on the remote atom system. As it already knows you username and password, it can decrypt the password the nonce and timestamp passed in the WSSE header. It uses the well known SHA1 algorithm to encrypt the pasword and encodes it in base64 for transportation across the web.

We can use Perl to create the password digest, as shown in this example code.

my $username = "robertprice";
my $password = "secret password";
my $nonce = "4djRSlpeyWeGzcNgatneSA==";
my $timestamp = "2005-2-5T17:18:15Z";
my $digest = MIME::Base64::encode_base64(Digest::SHA1::sha1($nonce . $timestamp . $password), '');

The password digest is now stored in the variable $digest.

We can also create the HTTP header from this if needed.

print qq{HTTP_X_WSSE UsernameToken Username="$username", PasswordDigest="$digest", Nonce="$nonce", Created="$created"n};

Please note, to use this Perl code, you have to have the MIME::Base64 and Digest::SHA1 modules installed. Both are freely available on CPAN.

Update – 22nd November 2006

Some more recent versions of Atom expect the digest to be generated with a base64 decoded version of the nonce. Using the example above, some example code for this would be…


## generate alternative digest
my $alternative_digest = MIME::Base64::encode_base64(Digest::SHA1::sha1(MIME::Base64::decode_base64($nonce) . $timestamp . $password), '');

When using WSSE for password validation, I now always check the incoming digest with both versions of my generated digested to ensure it’s compatible with different versions of Atom enabled software. One of the best examples of this is the Nokia Lifeblog. Older versions expect the nonce to be left, newer versions expect the nonce to be decoded first.

Plain Scones Recipe

Delicious, simple to make scones, that go just right with a dollop of fresh whipped cream. This recipe only takes about 20 to 30 minutes to complete.

You will need…

  • 250g self-raising flour
  • 40g butter
  • 150ml milk
  • 1.5 tbsp caster sugar
  • pinch of salt
  1. Heat the over to gas mark 7, (220 C / 425 F).
  2. Sift flour into a bowl and add the butter.
  3. Rub the butter gently into the flour until it resembles bread crumbs.
  4. Add the salt and sugar.
  5. Slowly mix the milk in using a metal spoon to form a soft dough.
  6. Knead the dough with your hands to bind it.
  7. Roll out the dough so it’s about 2cm thick.
  8. Use a 4cm pastry cutter to cut the little scones out.
  9. Keep reforming and cutting until all the dough has been used.
  10. Place the scones, with a little dusting of flour on top, onto a greased baking sheet and put into the oven.
  11. Remove the scones after 12-15 minutes and cool on a wire rack.
  12. Enjoy with butter, cream and/or jam.

Scones go off very quickly so eat them within a few hours of baking. They are lovely when still warm!

UPDATE: I originally recommended 225g of plain flour, but I have since increased this to 250g as I found the dough too wet to shape sometimes.

Photo of fresh scones #1
Photo of fresh scones #2

Precompiling Templates With Template Toolkit

I’ve been playing about with configuration options in the Template Toolkit to try to improve the performance of a site I maintain.

I’ve been focusing on the caching and compiling options in particular.

By setting the COMPILE_DIR and COMPILE_EXT options, Template Toolkit automatically compiles all the templates it uses to the specified directory. Once they are compiled, Template Toolkit will try to use them instead of the original template wherever possible. This seems to be giving some real speed increases and also reducing the load on the server.

my $template = Template->new({
COMPILE_DIR => '/tmp/compiled_templates',
COMPILE_EXT => '.ttc',
});

Here we are storing our compiled templates in the /tmp/compiled_templates directory. Template Toolkit replicates the directory structure of the original template under this automatically. We’re also saying we want all compiled templates to end in the file extension .ttc.

It definately seems to be a quick win for improving the performance of Template Toolkit based sites.

Parsing RDF In Perl With RDF::Simple

In this article I’ll describe how to parse and extract data from an RDF file using Jo Walsh‘s RDF::Simple::Parser module in Perl.

RDF::Simple::Parser does what it says on the tin, it provides a simple way to parse RDF. Unfortunately, that can make it hard to extract data. All it returns from a successful parse of the RDF file, is what Jo calls a “bucket-o-triples”. This is just an array of arrays. The first array contains an list of all the triples. The second array contains the actual triples broken down so Subject is in position 0, Predicate is in position 1 and Object in position 2.

Let’s define these as constants in Perl as they’re not going to be changing.

use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;

I’m going to use my usual example of my parsing my FOAF file, and I’ll be extracting the addresses of my friend’s FOAF files from it. See the example in What Is An RDF Triple, for a full breakdown of this.

We’ll define the two predicates we need to look for as constants.

use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';

We need to load in the FOAF file, so we’ll take advantage of File::Slurp’s read_file method to do this and put it in a variable called $file.

my $file = read_file('./foaf.rdf');

Before we can use RDF::Simple::Parser, we need to create an instance of it. I’ll set the base address to www.robertprice.co.uk in this case.

my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');

Now we have the instance, we can pass in our FOAF file for parsing and get back our triples.

my @triples = $parser->parse_rdf($file);

Let’s take a quick look at my FOAF file to get an example triple.

I know Cal Henderson, and this is represented in my FOAF file as…

<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>

Using the RDF validator we can get a the list of triples represented in this piece of RDF.


Triple Subject Predicate Object
1 genid:ARP40722 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
2 genid:ARP40722 http://xmlns.com/foaf/0.1/nick "Cal"
3 genid:ARP40722 http://xmlns.com/foaf/0.1/name "Cal Henderson"
4 genid:ARP40722 http://xmlns.com/foaf/0.1/mbox_sha1sum "2971b1c2fd1d4f0e8f99c167cd85d522a614b07b"
5 genid:ARP40722 http://www.w3.org/2000/01/rdf-schema#seeAlso http://www.iamcal.com/foaf.xml
6 genid:me http://xmlns.com/foaf/0.1/knows genid:ARP40722

The part we are interested are triples 5 and 6. We can see that triple 6 has Predicate value the same as our KNOWS_PREDICATE constant, and triple 5 has the Predicate value of our SEEALSO_PREDICATE constant. The part this links the two is that triple 6 has the Object value of triple 5’s Subject.

We know if we search for triples with the same predicate as our KNOWS_PREDICATE we’ll get triples that are to do with people I know. We can use Perl’s grep function to get these triples, then we can interate over them in a foreach loop.

foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {

We are only interest in the triples that have the same Subject as matching triple’s Object. Again, we can use grep to get these out so we can interate over them.

foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {

Now we just need to make sure that the triple’s Predicate matches our SEEALSO_PREDICATE constant, and if it does, we can print out the value of it from it’s Object.

if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
print $triple->[OBJECT], "n"
}

Let’s put this all together into a working example…

#!/usr/bin/perl -w
use strict;
use File::Slurp;
use RDF::Simple::Parser;
## constants defining position of triple components in
## RDF::Simple triple lists.
use constant SUBJECT => 0;
use constant PREDICATE => 1;
use constant OBJECT => 2;
## some known predicates.
use constant KNOWS_PREDICATE => 'http://xmlns.com/foaf/0.1/knows';
use constant SEEALSO_PREDICATE => 'http://www.w3.org/2000/01/rdf-schema#seeAlso';
## read in my foaf file and put it in $file.
my $file = read_file('./foaf.rdf');
## create a new parser, using my domain as a base.
my $parser = RDF::Simple::Parser->new(base => 'http://www.robertprice.co.uk/');
## parse my foaf file, and return a list of triples.
my @triples = $parser->parse_rdf($file);
## iterate over a list of triples matching the KNOWN_PREDICATE value.
foreach my $known (grep { $_->[PREDICATE] eq KNOWS_PREDICATE } @triples) {
## iteratve over a list of triples that have the same subject
## as one of our KNOWN_PREDICATE triples object.
foreach my $triple (grep { $_->[SUBJECT] eq $known->[OBJECT] } @triples) {
## find triples that match the SEEALSO_PREDICATE
if ($triple->[PREDICATE] eq SEEALSO_PREDICATE) {
## print out the object, should be the address
## of my friends foaf file.
print $triple->[OBJECT], "n"
}
}
}

The example will load in the FOAF file, parse it and print out any friends of mine that have FOAF files defined by the seeAlso predicate.

The Current UK Population In JavaScript

One interesting webpage going round the office yesterday were the UK Population Statistics.

Looking at the figures for 2002-2003 we saw the total UK population grow from 59,321,700 people to 59,553,800 people.

That means the population grows by approximately 1 person every 2 and a half minutes.

I’ve knocked together a quick JavaScript that can calculate the approximate population of the UK assuming this increase is constant.

We work out the per minute increase in population. We also know the starting population of the UK on the 1st January 2003, all we have to do is work out how minutes have elapsed between then and now, and multiply that by the population growth per minute.

var startDate = new Date("January 1, 2003 00:00:00");
var perMinuteIncrease = 232100 / (365 * 24 * 60);
var startPop = 59553800;
// return the estimated current population of the UK.
function currentPopulation() {
var currentDate = new Date();
var diffMinutes = Math.floor((currentDate.getTime() - startDate.getTime()) / 60000);
return Math.round(startPop + (diffMinutes * perMinuteIncrease));
}

To get the current population you just call the currentPopulation() function.

Let’s see what the approximate current UK population is now according to this script…



Just reload/refresh the page to get an updated population count.

Querying RDF In Perl With RDFStore

Apart from RDF::Core and Redland, another option for parsing and querying RDF in Perl is RDFStore. This also provides the Perl RDQL::Parser module used by the very useful DBD::RDFStore driver.

Following on from the previous examples showing how to extract information from my FOAF file using RDF::Core (Query RDF In Perl With RDF::Core) and RDF::Redland (Querying RDF In Perl With RDF::Redland), here I’ll re-implement the query using RDFStore.

As a quick recap from the previous articles, here is the bit of RDF we want to extract information from.

<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>

The solution used to extract the data from the RDF looks a lot more Perl-like than the previous examples we have seen.

If you have ever queried databases using SQL in Perl, then you have certainly come across the powerful DBI module. This abstracts the common database usage making it possible to very easily port your applications between various databases. One of the best things about using RDFStore is that it provides a DBD driver allowing you to use standard DBI methods when querying your RDF data. Unlike other modules that make you create triple stores and factory methods, RDFStore lets that be hidden from you.

To start with we’ll need to create a database handle using DBI and the DBD::RDFStore modules and store it in the variable $dbh.

my $dbh = DBI->connect("DBI:RDFStore:");

This creates a database on the fly, but we can connect to an existing database on a local or remote server if we so wished.

Now we need to create our RDQL query. It looks very similar to the query we used in the Redland example.

my $query = $dbh->prepare(<<QUERY);
SELECT ?name ?nick ?seeAlso ?mbox_sha1sum
FROM <file:foaf.rdf>
WHERE
(?x <rdf:type> <foaf:Person>),
(?x <foaf:name> ?name)
(?x <foaf:nick> ?nick)
(?x <rdfs:seeAlso> ?seeAlso)
(?x <foaf:mbox_sha1sum> ?mbox_sha1sum)
AND
(?nick eq 'Cal')
USING
foaf for <http://xmlns.com/foaf/0.1/>,
QUERY

Here we’re selecting the values the name, nick, seeAlso and mbox_sha1sum triples for a Person with the nick of Cal. We’ve explicitly set where our triples come from using the FROM clause. In this case, it’s the file foaf.rdf, which contains my FOAF information.

We have the query in the variable $query, so lets execute it.

$query->execute();

We can use standard DBI methods to fetch the data from our query. Here I’m going to create some bound variables to keep any matching data in.

my ($name, $seeAlso, $mbox_sha1sum, $nick);
$query->bind_columns($name, $nick, $seeAlso, $mbox_sha1sum);

Now we just have to fetch each row that matches our query and print them out.

while ($query->fetch()) {
print $name->toString, "n";
print $nick->toString, "n";
print $seeAlso->toString, "n";
print $mbox_sha1sum->toString, "n";
}

The values returned are either RDFStore::Literal or RDFStore::Resource objects, so we have to use their toString methods to print them.

To tidy up, we’ll finish our query and disconnect from our database.

$query->finish;
$dbh->disconnect;

That’s it! It really is as simple as that.

Let’s put this all together now to produce our final example code listing.

#!/usr/bin/perl -w
## An example showing how to use RDFStore and RDQL::Parser to
## extract information from a FOAF file.
## Copyright 2004 - Robert Price - http://www.robertprice.co.uk/
use strict;
use DBI;
## create a DBI connection to our NodeFactory.
my $dbh = DBI->connect("DBI:RDFStore:");
## prepare our query.
my $query = $dbh->prepare(<<QUERY);
SELECT ?name ?nick ?seeAlso ?mbox_sha1sum
FROM <file:foaf.rdf>
WHERE
(?x <rdf:type> <foaf:Person>),
(?x <foaf:name> ?name)
(?x <foaf:nick> ?nick)
(?x <rdfs:seeAlso> ?seeAlso)
(?x <foaf:mbox_sha1sum> ?mbox_sha1sum)
AND
(?nick eq 'Cal')
USING
foaf for <http://xmlns.com/foaf/0.1/>,
QUERY
## execute the query.
$query->execute();
## define some holding variables and bind them to our query results.
my ($name, $seeAlso, $mbox_sha1sum, $nick);
$query->bind_columns($name, $nick, $seeAlso, $mbox_sha1sum);
## while we have results being returned...
while ($query->fetch()) {
## print out the values.
## As these can be RDFStore::Literal or RDFStore::Resource's we
## need to use the toString method of these objects to print.
print $name->toString, "n";
print $nick->toString, "n";
print $seeAlso->toString, "n";
print $mbox_sha1sum->toString, "n";
}
## end the query and disconnect.
$query->finish;
$dbh->disconnect;

In conclusion, RDFStore provides a very clean and Perlish interface to querying RDF data. The code implements a DBD module allowing standard DBI methods to be used, making it quick and simple for Perl developers to learn and use effectively.

Querying RDF In Perl With RDF::Core

Extracting data from an RDF file seems like an easy job for Perl.

I have discussed how to parse RDF in Perl using the RDF::Core module before. In that example I used the getStmts functionality. This is fine for simple things, but it’s a bit messy. Thankfully RDF::Core provides us with a query language similar to RDQL to help us get the data we want.

Let’s take a quick look at my FOAF file for some example RDF to query. I know Cal Henderson, and this is represented in my FOAF file as…

<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>

For a detailed explantion of the above RDF, have a look at my previous article – What Is An RDF Triple?

How could I find Cal’s foaf file using RDF::Core’s query language? It’s really simple and we’d use the following query.

select ?x->rdfs:seeAlso,
from ?x->foaf:nick{?y}
where ?y='Cal'

That just means we want to select the value of the triple rdfs:seeAlso where the parent triple has the foaf:nick of the value of Cal.

Why stop there? Lets get some more information on Cal.

select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum
from ?x->foaf:nick{?y}
where ?y='Cal'

Now we have Cal’s name, nickname, address of his FOAF file and the checksum of his email address.

Let’s see how to get this all in Perl now.

We need to setup some RDF::Core objects and parse the FOAF file before we can query it. We’ll use this code to do so.

## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;

For a full explanation of the above code, have a look at my previous article – How To Parse RDF In Perl.

We now need to add the new code.

Firstly we need to a RDF::Core::Evaluator and a RDF::Core::Query object. These two objects handle the querying of the data held in our $model.

## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);

As we have a query object, we can just insert our query statement using it’s query method.

## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");

This executes the query and returns the results as a reference to an array in the $results variable. We need to use the first arrayref returned in $results to get the data we need. This will be a list of RDF::Core::Literal or RDF::Core::Resource objects. As these both inherit from RDF::Core::Node, let’s just use that object’s getLiteral method to return the string values of the data we need and print it out.

## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}

This will return us the following data.

Cal Henderson
Cal
http://www.iamcal.com/foaf.xml
2971b1c2fd1d4f0e8f99c167cd85d522a614b07b

Here’s the final code in all it’s glory.

#!/usr/bin/perl -w
use strict;
use RDF::Core::Evaluator;
use RDF::Core::Model;
use RDF::Core::Model::Parser;
use RDF::Core::NodeFactory;
use RDF::Core::Query;
use RDF::Core::Resource;
use RDF::Core::Storage::Memory;
## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);
## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");
## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}

If you see warnings (when running using -w or use warnings;) from RDF::Core about use of an uninitialised value before the results are printed don’t worry. This is just a small bug in RDF::Core and won’t affect the running of the code.

Parsing RDF In Perl With RDF::Core

It is often said that parsing RDF files is hard. It’s not. What most people really find hard is turning the RDF/XML into a series of triples and extracting data from them.

When stored in RDF/XML, triples can be written in a variety of various ways, but still mean the same thing. This means using an XML parser may not always give you the same results. To solve this we use an RDF parser to get the triples for us. An RDF parser is usually built on top of an existing XML parser but with enough logic to know how to turn the data contained in an XML file into a list of triples correctly.

There are various RDF parsers for Perl. These include RDF::Core, RDF::Simple, RDF::Redland and several others.

Parsing is all well and good, each one of the above can give us a list of triples. What we really need is a way to query the data to extract the data we need.

In this article I’m going to use RDF::Core.

RDF::Core is a bit of a beast and doesn’t have a very Perl’ish interface.

I’m going to parse my FOAF file and extract a list of my friends FOAF files.

FOAF means Friend of a Friend and it allows you to describe relationships between people in a machine readable RDF format. Details of it can be found on the FOAF project website.

So how do we do this using RDF::Core?

Firstly we need to set it up. We’ll need to a model and define the storage method we want. RDF::Core allows data to be stored in a database, DB_File or memory. In this example I’ll just use memory as it’s the simplest.

my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store;

Now we need create a parser, and parse the the RDF file. To do this we pass in our Model, BaseURI, Source, SourceType of the RDF data. In this example I’m going to use the file foaf.rdf so I’ll have to set the SourceType parameter to file. The BaseURI is used to resolve relative URI’s, so we’ll use the FOAF namespace URI http://xmlns.com/foaf/0.1/ for this. Once we have our parser built, we just call it’s parse method.

my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;

We now have a list of triples in our memory store. That’s great, but we want to find out if any of my friends have FOAF files.

If you look at the example in my previous article on RDF triples, you’ll know that my friends details are referenced by the #knows predicate. So we need to find out all the triples that have that as a predicate and get their objects. Once we have these, we can look for all triples that have this as their subject and also have the predicate of #seeAlso. The object of all these triples will be the address of my friend’s FOAF files.

Refer to my article What Is An RDF Triple for a detailed explanation and commented example of the above.

RDF::Core needs all resources to be created as RDF::Core::Resource objects. We’ll need to create resources for #seeAlso and #knows if we are to query with them, so lets do that now.

my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');

We can use the getStmts method in the model to get triples matching a given subject, predicate or object. We can either pass in RDF::Core::Resource’s with the values we want to check against, or undef if we want to match everything. This returns an enumerator that we can use to get each matching triple in turn.

To start with we need to get a list of any triples that match our #knows predicate. This is done like this.

my $knows_enum = $model->getStmts(undef, $knows, undef);

Next we need to enumerate over any triples we have.

my $statement = $knows_enum->getFirst;
while (defined $statement) {
my $knows_object = $statement->getObject;
## code to handle each statement goes here
$statement = $knows_enum->getNext;
}

Now we know have a list of triples whose object represents the subject of the triples we want to query, We need to match those with this value as the subject and with the predicate of #seeAlso.

We get get the triples using code very similar to code just used. If we insert the following code into the while loop, we can extract this information.

my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_object = $seealso_enum->getFirst;
if (defined $seealso_object) {
print $seealso_object->getObject->getLabel, "n";
}

As we only wanted the first #seeAlso value per #knows triple, we don’t have to worry about going through every object in the enumerator, only the first.

Putting it all together, we have a simple script to parse a FOAF file, and print out address of our friend’s FOAF files.

#!/usr/bin/perl -w
## Extract a list of seeAlso triples from a FOAF file
## Robert Price - http://www.robertprice.co.uk/
use strict;
use RDF::Core::Model;
use RDF::Core::Storage::Memory;
use RDF::Core::Model::Parser;
use RDF::Core::Resource;
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create a resource for seeAlso and knows triples.
my $seealso = RDF::Core::Resource->new('http://www.w3.org/2000/01/rdf-schema#seeAlso');
my $knows = RDF::Core::Resource->new('http://xmlns.com/foaf/0.1/knows');
## create an enumerator with all the knows triples.
my $knows_enum = $model->getStmts(undef, $knows, undef);
## enumerate over each knows triple.
my $statement = $knows_enum->getFirst;
while (defined $statement) {
## get the object of the current triple.
my $knows_object = $statement->getObject;
## look for subject of the enumerator (knows), and predicate of
## seealso
my $seealso_enum = $model->getStmts($knows_object, $seealso, undef);
my $seealso_obj = $seealso_enum->getFirst;
## if it has a seealso triple, show the value of it.
if (defined $seealso_obj) {
print $seealso_obj->getObject->getLabel, "n";
}
## get the next knows statement.
$statement = $knows_enum->getNext;
}

I hope this has started to demystify RDF parsing with Perl.