Extracting data from an RDF file seems like an easy job for Perl.
I have discussed how to parse RDF in Perl using the RDF::Core module before. In that example I used the getStmts
functionality. This is fine for simple things, but it’s a bit messy. Thankfully RDF::Core provides us with a query language similar to RDQL to help us get the data we want.
Let’s take a quick look at my FOAF file for some example RDF to query. I know Cal Henderson, and this is represented in my FOAF file as…
<foaf:knows>
<foaf:Person>
<foaf:nick>Cal</foaf:nick>
<foaf:name>Cal Henderson</foaf:name>
<foaf:mbox_sha1sum>2971b1c2fd1d4f0e8f99c167cd85d522a614b07b</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://www.iamcal.com/foaf.xml"/>
</foaf:Person>
</foaf:knows>
For a detailed explantion of the above RDF, have a look at my previous article – What Is An RDF Triple?
How could I find Cal’s foaf file using RDF::Core’s query language? It’s really simple and we’d use the following query.
select ?x->rdfs:seeAlso,
from ?x->foaf:nick{?y}
where ?y='Cal'
That just means we want to select the value of the triple rdfs:seeAlso where the parent triple has the foaf:nick of the value of Cal.
Why stop there? Lets get some more information on Cal.
select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum
from ?x->foaf:nick{?y}
where ?y='Cal'
Now we have Cal’s name, nickname, address of his FOAF file and the checksum of his email address.
Let’s see how to get this all in Perl now.
We need to setup some RDF::Core objects and parse the FOAF file before we can query it. We’ll use this code to do so.
## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
For a full explanation of the above code, have a look at my previous article – How To Parse RDF In Perl.
We now need to add the new code.
Firstly we need to a RDF::Core::Evaluator and a RDF::Core::Query object. These two objects handle the querying of the data held in our $model.
## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);
As we have a query object, we can just insert our query statement using it’s query method.
## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");
This executes the query and returns the results as a reference to an array in the $results variable. We need to use the first arrayref returned in $results to get the data we need. This will be a list of RDF::Core::Literal or RDF::Core::Resource objects. As these both inherit from RDF::Core::Node, let’s just use that object’s getLiteral method to return the string values of the data we need and print it out.
## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}
This will return us the following data.
Cal Henderson
Cal
http://www.iamcal.com/foaf.xml
2971b1c2fd1d4f0e8f99c167cd85d522a614b07b
Here’s the final code in all it’s glory.
#!/usr/bin/perl -w
use strict;
use RDF::Core::Evaluator;
use RDF::Core::Model;
use RDF::Core::Model::Parser;
use RDF::Core::NodeFactory;
use RDF::Core::Query;
use RDF::Core::Resource;
use RDF::Core::Storage::Memory;
## list of known namespaces we might need.
my $namespaces = {
'foaf' => 'http://xmlns.com/foaf/0.1/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'dc' => 'http://purl.org/dc/elements/1.1/',
};
## create our model and storage for triples.
my $store = RDF::Core::Storage::Memory->new;
my $model = RDF::Core::Model->new(Storage => $store);
## create node factory for query
my $factory = RDF::Core::NodeFactory->new;
## create our parse and parse the Source file.
my $parser = RDF::Core::Model::Parser->new(
Model => $model,
BaseURI => 'http://xmlns.com/foaf/0.1/',
Source => './foaf.rdf',
SourceType => 'file'
);
$parser->parse;
## create an evaluator based on our model, factory and namespaces.
my $evaluator = RDF::Core::Evaluator->new(
Model => $model,
Factory => $factory,
Namespaces => $namespaces,
);
## create a query object based on the evaluator.
my $query = RDF::Core::Query->new(Evaluator => $evaluator);
## run our query and save the results in $results.
my $results = $query->query("select ?x->foaf:name, ?x->foaf:nick, ?x->rdfs:seeAlso, ?x->foaf:mbox_sha1sum from ?x->foaf:nick{?y} where ?y='Cal'");
## go over the results and print the data out.
foreach my $result (@{$results->[0]}) {
print $result->getLabel, "n";
}
If you see warnings (when running using -w
or use warnings;
) from RDF::Core about use of an uninitialised value before the results are printed don’t worry. This is just a small bug in RDF::Core and won’t affect the running of the code.