Regular Expressions Simple and Powerful.

Eoin | Delphi, IT, Internet, PHP, REST, Software, WebServices, mobile | Thursday, May 24th, 2007

Yes! Regular Expressions are simple once you learn the grammar, and thats also the biggest problem with them, unless you learn the grammar, Regular Expressions look like the gibberish of some dark art, And unless you actually sit down and study you’ll not make much progress with them.

There are some good books on Regular Expressions.
Regular Expression Pocket Reference

Mastering Regular Expressions

Once you have Regular Expressions in your tool box you’ll quickly see many uses they can be put to such as page scraping or data validation, I’ve even seen them used them for updating Delphi code bases to the latest version of Delphi.

In the train timetable service I used 3 Regular expressions to extract the information need to output optimized version of the time table.

The first two

/<input type=”hidden” name=”DepTime” value=”[0-9][0-9]:[0-9][0-9]

/<nput type=”hidden” name=”ArrTime” value=”[0-9][0-9]:[0-9][0-9]

Are used to strip out the table elements which contain the departure and arrival times,
The strings which match the patterns are stored in two arrays, one for arrival and one of departures.

Then iterating through the two arrays a third regular expression is used


[0-9][0-9]:[0-9][0-9]

This Regular Expression returns the times from the strings contained in the two arrays and it is this information which is used to produce the timetables you see when using the service.

I’d be interested to hear an if there is an even easier way to do this.

There is an excellent tool available for working with Regular Expression, Regex Buddy It is a fantastic piece of software.

Most languages and platforms support Regular Expressions, For Delphi you can use the TRegex component which is free, for Delphi .NET it’s not needed as .NET supports Regular Expressions.

The RESTful URL

Eoin | IT, Internet, PHP, REST, Uncategorized, WebServices | Tuesday, May 22nd, 2007

When creating the train time table service I decided to use a RESTful approach for the API, This meant the URL should confirm to a number of RESTful Principles.

1) The URL represents a resource and consist of NOUNs, not VERBs.
If you using Verbs then you are not using REST. you’re using RPC.
One of the defining principles of REST is the use of a limited set of Verbs POST,GET,PUT and DELETE which correspond to Create, Read, Update and Delete (CRUD).
This allows the easy contruction of URLS so that a consumer and go directly to the resource required.
for example
http://www.eoinprout.com/trains/Cork/Cobh/
will return the Cork Cobh time table resource.

2) The URL hides implementation details.
There are many advantages to hiding the implementation details not the least of which is that it allows the background implementation to be changed without affecting the API.
In my case the service is implemented in PHP but by using mod_rewrite it was possible to hide the implementation details of the service, One more advantage of this is that many search engines do not properly index dynamic pages which require the correct parameters to be set. This aids the discovery of the service.

3) The service should expose other parts of the service via links allowing discovery and traversal.
The trick here is not expose everything in one URL. In the case of my service the user starts at ../trains/ which presents a list of departure stations and links to a list of possible destinations, from there another link will return the URL for the particular time table required. so all the users needs to remeber is the ../trains/ URL.

4) GETs are used for obtaining a copy of the resource.
GET is the standard HTTP method getting content when you type a URL into the address bar of the browser it is a GET which is executed , On IrishRail.Com POSTs are used to retrieve a copy of the time table a user may be looking for.
There are serious disadvantages when using POSTs to retrieve a resource, The resource cannot be bookmarked, The URL for the resource cannot be sent to someone else, URL cannot be constructed and search engines cannot index the resource.

Irish Rail Train Timetables Service Optimised For Mobile Internet

Eoin | Internet, PHP, REST, Uncategorized, WebServices, mobile | Sunday, May 20th, 2007

In one of my previous posts I moaned about the high cost of the mobile internet in Ireland, The site I used as an example was Irishrail.com, I use this site a lot, checking the Cork/Cobh timetables. I wanted to be able to check the train times on my mobile but I wouldn’t use the Irish Rail website when it costs so much to do so. I decided to create a REST style web service optimized for the mobile internet, i.e. use as little bandwidth as possible.
It’s also be a chance to try out some ideas related to REST.

IMHO REST is the way to construct web services.

The Service is available here and you get your timetable by traversing the links.
OR
You can go directly to the timetable you want by constructing the URL based in the Format
http://www.eoinprout.com/trains/DeparturePoint/Destination
For Example
http://www.eoinprout.com/Cobh/Cork/
http://www.eoinprout.com/Cork/Cobh/

The service will show the timetable for trains between your destination and point of departure for the day, bookmarking the timetable will allow you to quickly check “Todays” timetable.
The service returns “Todays” timetables, If I’m planning journeys far in advance I’ll use my PC, The phone is for when I need to check when the next train home is :). If anyone really wants it changed to support dates in the future just ask.

Of course the first question you ask is, “How much bandwidth does it save ?
It saves a lot.

Irishrail.com eoinprout.com/trains/
Traversing Links 167kb 9kb
Constructing Link Not possible 1kb



It’s not possible to construct the URL for a particular timetable on Irishrail.com because it’s using a POST to send parameters rather then a GET, Interestingly this means that the timetable cannot be bookmarked to allow quick access.

The service was created using PHP and mod_rewrite, which I’ll talk more about in the future as I’m going to be using this service to illustrate REST.

Tag Clouds and Social Bookmarking

Eoin | IT, Internet, Software, del.icio.us, social | Thursday, May 17th, 2007

Tag Clouds are a useful way of visualizing information.

When I saw the tag cloud generated by del.icio.us using my bookmarks, I was surprised how clearly I could see myself in it.

It presents a interesting picture of my interests although it is a skewed towards computing related subjects, but being an internet based tool it is only to be expected (or maybe it’s me who is skewed ;)).

It’s interesting or worrying, I haven’t decided yet which, how much can be deduced about someone just from their social bookmarks.

The High Price of Mobile Internet in Ireland

Eoin | IT, Internet, mobile | Thursday, May 17th, 2007

Most new mobile phones have internet capabilities which makes it possible to browse the web while out and about.
The big problem with these services is the high cost of bandwidth on the mobile networks .
The basic price plans range from 1 to 2 cents per KB, This may not seem like a lot but it very quickly adds up.

For example, to check today’s train timetable between Cork and Cobh on Irishrail.com requires approx 167kb of data, this can cost as much as €3.34 depending price plan you have chosen with your mobile operator, Expensive for a train timetable.

3G networks have cheaper data rates then the GPRS ones but are still hugely expensive compared to standard broadband.

You can try reducing the amount of unnecessary data that downloads by turning off images, sounds and animations. But what is really needed are sites with content optimized for the mobile web.
In the old days, as a rule of thumb, we tried to keep a web page to 50kb or less.
We shouldn’t always assume a fast cheap Internet connection is being used.

Googlebot Strikes Back

Eoin | IT, Internet | Thursday, March 22nd, 2007

Last week on a lunch time radio phone in show, a Google “scandal” broke.

A caller had for no apparent reason searched Google using her phone number, and got back a single result. Although when she tried to view it the classic “not found” message appeared.

Not to be defeated she went back to Google and clicked on the “cached” link and up popped hers and her husbands personal details.

Shocked at the apparent intrusion on her privacy she called the company to which the document belonged and demanded to know what was what.
She was told that the document contained the details of a number of customers (of which she was one) which had accidentally been copied to the wrong location on a server and exposed to the internet, but that this had been corrected as soon as it was spotted. Unfortunately Googlebot had got there first, Googlebot indexed and cached the page and went on it’s merry way. So even thought the company had taken corrective action the page was still available through Google’s cache.

The company is now in the process of contacting Google to get them to remove the page from their cache.
I didn’t have the heart to ring in and mention the WayBack Machine and other ways in which this page could still exist out in the wild.

I found it interesting that for the callers, Google is the internet.

Unfortunately they didn’t give out enough information for me the find the page on Google ;)

Powered by WordPress | Theme by Roy Tanck