New Train Departure And Arrival RSS Feeds
I've updated my train arrival and departure RSS feeds as livedepartureboards.co.uk have changed their page to XHTML strict.
The script relies on screen-scraping, so the change broke the original script. It's welcome to see they are using strict XHTML. I should really rewrite the entire script to use XML parsing now, but I'm lazy, and all that needs changing are a few regular expressions in the original script for it to work.
An example of the feed is shown below in my FeedReader.
Here is the full updated Perl code.
#!/usr/bin/perl -w
## Script to convert train departures from livedepartureboards.co.uk
## into RSS feeds.
## Takes the parameter "stationcode", as defined on the page nationalrail.co.uk
## page - http://www.nationalrail.co.uk/frameset.asp?location=ldb
##
## Changed from the original to understand their lovely new XHTML site.
##
## Robert Price - www.robertprice.co.uk - 7 December 2004
use strict;
use CGI;
use HTTP::Request;
use LWP::UserAgent;
use XML::RSS;
## Take the stationcode parameter, else default to COL (Colchester)
my $CGI = new CGI;
my $stationcode = $CGI->param('stationcode') || 'COL';
## The URL we have to screen scrape for the information.
my $url = 'http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=' . $stationcode;
## Create our new browser, and pretend to be IE.
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
## Get the data from the livedeartureboards.co.uk website.
my $req = HTTP::Request->new(GET => $url);
$req->header('Accept' => 'text/html');
my $res = $ua->request($req);
if ($res->is_success) {
my $page = $res->content;
## Get the station name and when the board was last updated.
my ($updated) = ($page =~ m[Last updated: (.*?)</td>]sg);
my ($station_name) = ($page =~ m[<h1>Train Times for (.*?)</h1>]sg);
## Create our RSS data, update every 10 minutes.
my $rss = new XML::RSS (version => '1.0');
$rss->channel(
title => $station_name . ' Train Departures',
link => 'http://www.nationalrail.co.uk/',
description => 'Train times from ' . $station_name,
syn => {
updatePeriod => 'hourly',
updateFrequency => '6'
},
);
## Monster regex to extract the data.
while ($page =~ m[<tr.*?>\r\n<td><a href="(.*?)">(.*?)</a></td>\r\n<td.*?>(.*?)</td>\r\n<td.*?>(.*?)</td>\r\n<td.*?><a href=".*?">(.*?)</a></td>\r\n<td.*?>(.*?)</td>\r\n<td.*?>(.*?)</td>\r\n<td.*?><a href=".*?">(.*?)</a></td>\r\n</tr>]sg) {
## Build a hash to store the information in.
my $train = {
'link' => 'http://www.livedepartureboards.co.uk/ldb/' . $1,
'from' => $2,
'timetabled_arrival' => $3,
'expected_arrival' => $4,
'to' => $5,
'timetabled_departure' => $6,
'expected_departure' => $7,
'operator' => $8
};
## Add the train departure to the RSS feed.
$rss->add_item(
title => ($train->{'timetabled_departure'} ? $train->{'timetabled_departure'} . ' to ' : '') . $train->{'to'} . ($train->{'expected_departure'} ? ( ' (' . $train->{'expected_departure'} . ')' ) : ''),
link => $train->{'link'},
description =>
"From: " . $train->{'from'} . ",\n" .
"Timetabled Arrival: " . $train->{'timetabled_arrival'} . ",\n" .
"Expected Arrival: " . $train->{'expected_arrival'} . ",\n" .
"To: " . $train->{'to'} . ",\n" .
"Timetabled Departure: " . $train->{'timetabled_departure'} . ",\n" .
"Expected Departure: " . $train->{'expected_departure'} . ",\n" .
"Operator: " . $train->{'operator'} . "\n",
);
}
## Return the RSS feed, ensuring we have the correct MIME type.
print "Content-type: application/rdf+xml\n\n";
print $rss->as_string;
}
Just install it as a CGI script on your webserver, and point your RSS reader at it. You'll need to add your station code as the stationcode parameter, codes can be found on the National Rail website's station code page.





