Back online…

Well that was an effort.

We’re back online, it seems my port forwarding documentation is accurate – but our dynamic dns provider is being “less than perfect” at the moment.

To cut a long story short, they were claiming we hadn’t changed our IP when we had (even when I tried to force an update) so weren’t propagating the changes.

All sorted for now, but I’m going to have to find another way around this DDNS issue.

New router

We’ve got a new BT Broadband router, and the new one hasn’t been configured to work with the asterisk yet.

So at the moment, the phones we’ve got at home are all offline, and my remote access is also offline so I can’t fix it remotely.

The monitoring doesn’t make this state of play obvious, but I’ll sort that out once I’ve got access.

It’ll all come back to life when I next make it to site (hopefully Saturday 21st, weather permitting)

New monitoring

Since we went static IP for our broadband, I can no longer infer the state of the Norchard broadband from the number of times we change IP address per hour (previously every time the broadband connection went down it would come up on a different IP address)

So I’ve put some better monitoring in place, and http://dfrvoip.org.uk/blog/status/ now tracks our connection to the outside world by pinging the google DNS servers.

Due to the way they’re hosted – the graphs on that page might not be visible on every interenet connection. I’ve got a few ideas about how to change that, but they’ll have to wait for another day as life is somewhat hectic at the moment!

The technology I’m using is pretty basic network monitoring software called smokeping. It’s not my monitoring tool of choice, but it’s easy to install and get going – and for something as simple as monitoring a single internet link it’s pretty good.

The gratifying results of this are that our current broadband looks a lot more stable than the previous broadband!

Changes to dial in access

Some time ago, I put in place a test number on our Asterisk system to allow the Telecoms Team to access the internal telephone network from the public telephone network, so that we can test and verify facilities, and access our exchange test numbers from home.

It’s been brought to my attention that this number has “leaked” and is now being used by at least one person to access the internal network from their mobile phone during the running day.

Apparently me working on the Asterisk DP during the day on the 8th August caused an issue for this person, and while they didn’t complain to me directly, I did get to hear of the complaint.

I don’t know who this person is, but I can see that they have used the facility 27 times in the last 3 weeks and rumour has it they are operational staff – and that worries me.

The Asterisk system which hosts the test number is maintained by a single volunteer who has a full time day job in Bristol (me!), and I can only really work on it at weekends – during the running day. It has no resilience or redundancy designed into it, and the phone number is provided by a free provider (so may go away or change at short notice)

In the past, when the Asterisk has failed it’s taken me 3 weeks to get to site to resolve the issue (in one case, much longer) If that was to happen again, and someone was relying on it for operational use then that may put the safety of the railway, or the continuity of the business at risk!

Given these limitations I cannot in good conscience allow the facility to become relied upon for business, operational, safety or emergency use – so I need to nip this in the bud and clarify the position.

I have changed the recorded message on it this evening to something which makes the status of the facility clear, and if the message doesn’t appear to get through – I may be forced to change the PIN.

Sorry if that’s inconvenient for anyone, but I just can’t let an informal test facility sneak into operational service like this!

Two minor fixes

I’ve done two minor fixes this week:

  • Status Graphs: These were being updated, but not drawn for most visitors. I’ve fixed that now, if you still have no graph try a force refresh of the page (ctrl-f5)
  • Speaking Clock: Ian noticed that it was saying “thirty four” twice, skipping “thirty five” and jumping straight to “thirty-six” – This was a fault I fixed on my asterisk server years ago, but seemingly never applied the fix to the railway version. The route cause was that the “35.wav” file contained the words “thirty four”. I’ve updated to a newer version of the samples and everything is working again.

I’m going to re-think the status graphs, as we now have static-ip so don’t need dynamic DNS any more (so don’t really need to monitor it any more).

Perhaps I’ll finally write the peer monitoring stuff as well, so we can spot when sipgate goes away.

We’re definitely back now. Hopefully.

After much head scratching, inconsistent one way audio problems, some routes through the astersisk producing audio some not…  I noticed I’d set the port forwarding on the router for the RTP stream as “TCP” not “UDP”.  Simple slip of the finger, ticked the wrong box on the router interface.

It caused some really weird problems though!

Any audio path which resulted in the asterisk setting up the RTP stream worked, but anything which relied on the VOIP phone initiating the RTP stream ended up with either no audio, or one way audio.  This wasn’t immediately obvious from the pattern of symptoms as reported!

Anyway, I have rectified the error, ticked the right box, and my testing seems to suggest that it’s working now.

I’ll try and fix the next fault a bit quicker, promise!

And we’re back. Hopefully.

The router was reset on March 15th to try and troubleshoot the persistent problems we’re having with it, unfortunately in the process all the port forwarding rules we require dropped off the router.

Due to holidays, easter and various other commitments I wasn’t able to attend until April 7th and while I thought I’d fixed it then I discovered when I got home that I hadn’t – and that one of the changes I thought I’d made hadn’t been saved.

So it’s now April 18th (over a month since James reset the router under advice from BT) and we should now be back online.

I think this is a perfect example of why the DFR VoIP system should never be considered a “production” or “business critical” service – anything which goes wrong and relies on a chap who lives a 40 minute drive away and has a full time job (so isn’t available in business hours) just isn’t going to have a quick turnaround for fixes!

Level 0 – Special Services

The new “Level 0” relay set currently only has one line connected to it, for test purposes. It was pointed out to me last night that we hadn’t added it to the dialplan, so Level 0 services weren’t available from SIP phones.

I’ve now rectified that, and we’re currently matching 0X and 0XX numbers as being destined for Norchard special services.

If we deem any of the special services “inappropriate to dial from SIP phones” (as we have with the fire alarm number) I’ll bar them individually, but for now everything on 0X and 0XX is available from the asterisk.

I’ve also updated the directory listing over in the sidebar, to include the level 0 services.

Up… and Down…

On the 29th, I initiated an upgrade[1] to my broadband connection at home, and the engineer who was sent out botched the job and moved the wrong pair in the cabinet – so my number (491) has been offline for a week.

I’ve now fixed that, but it looks like the DFR has dropped off the internet!

Hopefully, it’s just that the power to the router has tripped out, and someone will fix it tomorrow when the shop opens.

If it’s not all working again by the 10th, I’ll be on-site anyway and will work out what’s going on. More details when I get them!

[1] I’m now all swanky and 40Mbit!

Update: Service was restored 2015-01-06 11:48 – No word yet what the problem was!

Severn Bridge Disaster Museum Display

A little over a week ago, we installed a phone in the museum as part of a display about the Severn Bridge Disaster of 25th October 1960. This allows visitors to pick up the handset, and by dialling a single digit they can listen to one of 10 pieces of audio taken from a BBC Radio Gloucestershire program about the disaster.

The phone was installed on the 22nd Oct, just in time for the 54th anniversary of the disaster. So far, it’s had 11 calls, with the most popular section being the one about “The night of the disaster”

Technical details:
The phone itself is just a cheap DTMF handset (easy to replace when the public break it!) connected to a Cisco SPA112 ATA. The ATA is configured to expect a single digit and then dial immediately (I’ve set the dialplan to “x”)

The ATA registers itself with the Asterisk with a restricted SIP account, which lands in its own context. That context is pretty simple, and the dialplan for it looks like this:

; Samples should be placed in /usr/share/asterisk/sounds/bridge - eg
; ln -s /etc/asterisk/museum_display/bridge /usr/share/asterisk/sounds/bridge

[MuseumDisplay-Bridge]
; Severn Bridge Disaster - 10 audio files, bridge/bridge_[1-0].mp3

; Single Digit Extensions play back selected audio
exten => _X,1,Answer();
exten => _X,n,Playback("bridge/bridge_${EXTEN}");
exten => _X,n,Hangup();

; Everything else gets "number not recognised"
exten => _.,1,Answer();
exten => _.,n,Playback("bridge/not_recognised");
exten => _.,n,Playback("bridge/not_recognised");
exten => _.,n,Playback("bridge/not_recognised");
exten => _.,n,Playback("bridge/not_recognised");
exten => _.,n,Playback("bridge/not_recognised");
exten => _.,n,Hangup();

It’s a pretty straight forward context. Any incoming call which matches a single digit (“exten => _X”) plays back one of the 10 audio files relating to the disaster.

Any other incoming call (“exten => _.”) plays a message which says “That number is not recognised, please hangup and try again” it repeats the message 5 times, then hangs up.

The “number not recognised” block shouldn’t get hit in normal use, because the ATA is configured to only send one digit.

It’s mostly there for testing, in case we ever have to replace the ATA in a hurry with one that can’t be set to dial immediately, or in case we ever decide to allow access to this feature from elsewhere in the dialplan (eg from C*Net)