blog archive contact about feed

Updated Heathrow Arrivals WAP Service

I had an email from Pete Gross asking about updating the old BAA Heathrow Flight arrivals WAP service I wrote to work with the current BAA heathrow arrivals website.

I've not used the site for a while so thought it would be complex to get working again, but it turned out to be just a quick change of regular expressions for the data extraction.

For the benefit of others I'll quickly go over the regular expression used in the main while loop to explain what's going on.

1 while ($page =~ m[ 2 <tr .*?> # opening tr 3 \s+<td>(\d{2}:\d{2})</td> # scheduled time 4 \s+<td>(.*?)</td> # flight number 5 \s+<td>\s*(.*?)\s*</td> # from 6 \s+<td>\s*(.*?)\s*</td> # status 7 \s+<td>(.*?)</td> # terminal 8 \s+</tr> # closing tr 9 ]xsg) {
  1. The main while loop saying we're repeating the following regular expression on $page
  2. Find a <tr with a space and option characters before the closing >. This is because the BAA's site alternates with an optional class of liveFlightGrey.
  3. Ignore any space before <td>Extract and memorise to $1 the contents if they are in the format dd:dd.
  4. Ignore any space before <td>Extract and memorise to $2 the contents of the tag.
  5. Ignore any space before <td>Ignore any starting or tailing space and extract the contents of the tag to $3.
  6. Ignore any space before <td>Ignore any starting or tailing space and extract the contents of the tag to $4.
  7. Ignore any space before <td>Extract and memorise to $5 the contents of the tag.
  8. Ignore any space before the tr tag.
  9. We're using x so we can break up the regular expression, s so we can match newlines with ., and g to work over the whole content of the variable.

The whole code follows, just cut and paste it to your webserver to use it. You'll need Perl with LWP::Simple installed to get the page live from the BAA's website.

Ideally we should all be using XHTML MP now instead of WML, but this still does the job.

#!/usr/bin/perl -w ## Script to screen scrape Heathrow arrivals from ## the BAA website, and show them on a WAP page. ## Updated - Robert Price - 15/04/2006 ## Originally - Robert Price - 26/01/2004 use strict; use CGI; use LWP::Simple; ## default flight number use constant DEFAULT_FLIGHT => 'BA010'; ## url of the heathrow arrivals, seems to be the ## same format for other airports. my $url = 'http://www.heathrowairport.com/portal/site/default/menuitem.e1f47266574138bae8890127c02865a0/'; ## get the flight we're interested in. my $CGI = new CGI; my $flight = $CGI->param('flight') || DEFAULT_FLIGHT; ## uppercase the flight to make it easier to search. $flight = uc($flight); ## send the header and start of the WML page. print<<HEADER; Content-type: text/vnd.wap.wml <?xml version="1.0"?> <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml"> <wml> <card id="main" title="Flights"> <p>Arrivals at Heathrow</p> HEADER ## if we're looking for a flight... if ($flight) { ## get the arrivals page. my $page = get $url; ## holding hash for flights. my %arrivals = (); ## iterate over the page extracting information. while ($page =~ m[ <tr .*?> # opening tr \s+<td>(\d{2}:\d{2})</td> # scheduled time \s+<td>(.*?)</td> # flight number \s+<td>\s*(.*?)\s*</td> # from \s+<td>\s*(.*?)\s*</td> # status \s+<td>(.*?)</td> # terminal \s+</tr> # closing tr ]xsg) { ## skip if we have no flight number next unless ($2); ## store the information in the hash. $arrivals{$2} = { 'scheduled_time' => $1, 'flight_number' => $2, 'from' => $3, 'status' => $4, 'terminal' => $5, }; } ## get the page modification time. my ($modified) = ($page =~ m[Current Update:</span> (.+)\r\n]g); ## if we have found the flight, print ## out it's information to the WML page. if (exists $arrivals{$flight}) { my $details = $arrivals{$flight}; print "<p>\n"; print 'Flight: ' . $details->{'flight_number'} . "<br/>\n"; print 'From: ' . $details->{'from'} . "<br/>\n"; print 'Scheduled:' . $details->{'scheduled_time'} . "<br/>\n"; print 'Status: ' . $details->{'status'} . "<br/>\n"; print 'Terminal: ' . $details->{'terminal'} . "<br/>\n"; print "</p>\n"; } else { print "<p>Flight $flight doesn't seem to be listed</p>\n"; } print "<p>Updated: $modified</p>\n"; } ## print the end of the page and the query ## form if we want to search again. print<<FOOTER; <p> Flight:<br/> <input name="flight" emptyok="false"/> <br/> </p> <p> <anchor title="search"> <go href="/cgi-bin/mobile/arrivals.pl" method="post"> <postfield name="flight" value="\$flight"/> </go> Find Arrivals </anchor> </p> </card> </wml> FOOTER
Entered: 2006-04-15 16:14:31
Modified: 2006-04-18 15:30:35
TRACKBACK - http://www.robertprice.co.uk/cgi-bin/robblog/trackback.pl?id=708

Rob's Other Blog Entries

See other blog entries for April 2006, or an index of all blog entries.