It’s a horrible wet day so I’ve got to find something productive to keep me occupied. I’ve decided I’m going to try and create something which alerts me to trains approaching the level crossing near my flat. The level crossing is very near. I live just across the road from West Worthing station and at the end of the station is the crossing. If I don’t have music on I can easily hear the sirens when the barriers go down. In fact I’ve just heard them now.
To be honest, the level crossing is a real pain, I have to cross it every day on my way to and from work. Almost inevitably, the barriers are down either going or coming from work. It’s also a busy road so there are long queues and most British drivers, being who they are, like to sit in the queue with their engines idling. The crossing is so hated that it has it’s own spoof twitter account.
I’d like to create a web page which displays when the barriers are expected to be up or down in the near future. Ideally, I’d like it to have a level of precision such that I can predict how long it will be until the barriers go up if they are down.
To do this I need data. I already have a few pieces of data:
- The barriers always go up 5 seconds after a train passes through.
- The barriers are always down when a train passes through (obviously)
- The barriers go up for at least enough time to get one cycle of traffic through them (the crossing is adjacent to a crossroad which has traffic lights)
- Most trains stop at the station (but not all do)
- Almost all the trains are ‘Southern’ passenger trains (I think there’s one freight train per day in the early hours of the morning)
What remaining data do I need:
- The times of the trains passing through the level crossing
- The length of time before the train that the barriers are lowered for.
- The length of time for one cycle of traffic at the crossroad
I’ll come back to the times of the trains, that’s the most crucial piece of data.
The length of time before the train that the barriers are lowered for is not easy to find out. The crossing is controlled by an operator using CCTV so the timings are probably quite variable. I need to collect some data to see if I can pick out any trends. Thankfully I can do that from the comfort of my living room with a timer. It’s a bit tedious but I collected the data.There was a wide variation in time, the shortest time was just 83 seconds whilst the longest was almost 5 minutes. The average was around 2.5 minutes.
The length of time for one cycle of traffic can also be found in the same way. I know that it varies depending on the traffic. From a short time collecting data, I found that the average traffic cycle was 65 seconds in length. The shortest cycle I measured was 40 seconds and the longest 110 seconds.
I’ll make the assumption that the barriers don’t open if they are likely to be open for less than a minute.
Finally the most important piece of data is the train timings. The naive approach here would be to simply find a timetable copy and work out when trains should pass through this stretch of track. That would be a good approach to get rough times but it wouldn’t be very accurate for a few reasons: trains get delayed, they get cancelled, extra trains get put on and timetables change slightly every few months, I don’t want to spend time updating the website every few months.
In summary I need real time data about the trains in the near future. Good news for me, Network Rail provides that data. I signed up for a free account and within a few minutes I potentially had access to lots of lovely data.
Of course it wasn’t that simple, I needed to subscribe to the correct data feeds and set up a way of receiving those feeds and turning them into the data that I actually wanted. This process is not simple but there was a useful wiki to help me.
Network rail provides several data feeds, they provide a daily and weekly schedule of trains, they provide short term updates to the schedules and they provide a live feed of trains passing through timing points on the network. In addition they provide reference data about the timing points and estimated time between them for various types of train.
The real time data from network rail is delivered in messages using the STOMP protocol. STOMP is relatively simple however it isn’t trivial to correctly get messages from the server. I’m used to working with PHP to build my websites so I decided to use PHP for this project. After a short search on the internet, I found this STOMP library.
With all this data I need to decide how to present the data to the end user. I’ve decided that mapping out the barrier closures in the next hour probably provides the best amount of data and accuracy. I’ll attempt to update the model in real time as new events come in on the feeds.
One of my favourite parts of a project comes from putting together the user interface. I always like to sketch out ideas on paper first, it helps me put together web pages much quicker.
I then went and put together the actual web page. I’ve tried to use colour and scale to highlight the important information.
So the first half of my project is complete but I still need to work out how to properly populate it with data. This means building a suitable database and updating the web page using Ajax. Stay tuned for part 2.
As an aside I’m thinking of getting a webcam which produces a stream of images I can analyse to see how well my predictions stack up to the real data. It was also give me much more accurate data about the train services to further improve my models.
As a second aside, the photo at the beginning of the post is not my level crossing, It’s another (nicer) level crossing that I happened to take a photo of in the summer.