Order posts by limited to posts

24 Feb 16:00:00
Details
24 Feb 16:01:15
Some work was done today (following issues at the weekend). This work is on part of a redundant pair of routers, and so should have no impact. The normal redundancy aspects have worked, with changes to routing and VRRP, but for some unknown reason there are issues with BGP announcements internally which are causing blips on external connectivity to some Ethernet customers. The work has been completed but we are now investigating how this has had an impact on services so they can be avoided in future.
Started 24 Feb 15:45:00
Closed 24 Feb 16:00:00
Previously expected 24 Feb 16:00:00

02 Sep 2011 09:41:55
Details
02 Sep 2011 09:43:14

The usage graphs were disrupted yesterday due to various work on core routers, and will be reset today starting shortly. This is to fix a problem where the ping monitoring was not working correctly. Service should otherwise be unaffected by this change.

Started 02 Sep 2011
Closed 02 Sep 2011 09:41:55

30 Aug 2011 22:30:00
Details
30 Aug 2011 13:34:53

There appears to be a major issue with fibre links in to our node in Maidenhead. This is also affecting out office which is why there was a delay posting this.

Staff are on the way to site now to investigate.

Update
30 Aug 2011 13:43:01

Staff are on the way. As ever, backup equipment was not set up right in our offices which had led to a delay updating customers on this. Incoming calls are being handle by very few staff at present which is also causing delays taking calls. We expect staff to be back on it in a few minutes.

Update
30 Aug 2011 14:02:08

From what we can see most things in the data centre are working, e.g phones. But the fact we are unable to call the data centre suggests they have something major going on. More to follow. Staff still not managed to get on site yet.

Update
30 Aug 2011 14:10:31

Ok still no proper news, but the data centre is apparently swamped with openreach, suggesting something major, but also suggesting that thy are on the case.

Update
30 Aug 2011 14:16:50

We can confirm that all the fibre links to our rack in Maidenhead are showing alarms at present, but that power and everything else is fine. BT vans are on site and they have a big hole in the ground, but the BT engineers claim nothing wrong and they did not touch anything.

We are reporting multiple fibre failures to BT now.

Update
30 Aug 2011 14:18:05

Oh, and no clue why data cent were not answering the phone. We'll worry about that later.

Update
30 Aug 2011 14:57:13

We have confirmation that other customers in the data centre have also lost fibre links. BT are on the case but no ETA yet.

Update
30 Aug 2011 15:40:21

Ok. Clarifying rumours here. BT vans on site, apparently on unrelated things. Hole in ground is near site and some sort of construction work. May be unrelated. Maybe not. We are waiting up updates from BT now.

Update
30 Aug 2011 16:25:02

It appears that BT had internal system problems meaning they have not actually logged the fault internally until just before 4pm!!

Update
30 Aug 2011 16:30:10

Whilst delays in BT are frustrating, the fact this appears to be a fibre break with other people affected means they should already be working on it. We hope. We are still awaiting an update.

Update
30 Aug 2011 16:59:50

Further checks confirm that our link from our office to Maidenhead is only half broken, which is surprising. I.e. one of the two fibres is apparently working. This suggests a damaged rather than cut fibre.

Update
30 Aug 2011 17:53:00

So far, all we have, is that BT are running diagnostics. No more information yet.

Update
30 Aug 2011 18:05:53

Now waiting for a call from the Openreach engineer who is running diagnostics...

Update
30 Aug 2011 18:37:00

There are two Openreach engineers on site investigating. They have said that they're investigating someone elses's fault on site so sounds like they might have found a common problem.

Update
30 Aug 2011 18:53:47

BT engineer turned up on site to find two other engineers already working on it. So it is in hand. No ETA.

Update
30 Aug 2011 19:55:10

BT found what looks to be a physical fibre fault (so a break somewhere - notes added at 19:24) and confirmed at 19:37 that they are still working on it.

Update
30 Aug 2011 21:03:42

BT are still working on the break. No ETA yet.

Update
30 Aug 2011 21:51:05

Still no news. BT should call back, hopefully with more information, soon.

Update
30 Aug 2011 21:52:25

Whilst we expect BT to sort this over night, we may not have status updates until early morning. All systems are set up to come back on line as soon as BT fix the fibre.

Update
30 Aug 2011 22:18:03

BT are trying to put in a temporary fix to the problem. No details as to the root cause, and no time scale for a fix yet, but they've said that if the temporary fix fails they'll have to replace the broken fibre. If that happens it is unlikely to be fixed tonight.

Update
30 Aug 2011 22:42:26

Most affected customers seem to have come back online in the last few minutes.

Update
30 Aug 2011 22:54:46

BT have confirmed that this is a "permanent fix", by which they mean that any further disruption should be a part of planned engineering works.

Update
31 Aug 2011 08:08:58

Our links for Ethernet services have been restored and our office is back on line. We're waiting for a full explanation from BT.

Update
31 Aug 2011 13:25:09

More details: http://aa.net.uk/news-2011-fibre-break.html

Resolution

BT fixed broken fibre.

Started 30 Aug 2011 13:14:00
Closed 30 Aug 2011 22:30:00

27 Aug 2010 17:44:00
Details
27 Aug 2010 17:54:41

At just before 17:00 today a minor routing announcement change was made. This was a very minor change and was not run as a proper planned work because it was so minor, or so we thought. Even so the change was not done in the middle of the day.

At 17:36 we were advised by MSO text that customers on Ethernet links were seeing problems. By 17:44 we had identified the cause of the problem and resolved it.

Technical details: The problem is in fact related to exhaustion of IPv4 space. We are involved in arrangements to swap IP blocks around between ISPs to free up some additional space, and this meant fragmenting one of our larger IP blocks. In principle this is simple, we announce the smaller blocks and drop the larger block announcement later. Unfortunately one of the smaller blocks is routed to Ethernet connected customers and was included in the list annonced at Telehouse in London.

The impact was that some of the Internet was not accessable to Ethernet customers, depending on routing. This included all of our ADSL lines and many services peering on LINX in London which could not see Ethernet lines. Unfortunately, as a partial failure it was not picked up by any of our automated monitoring which could still see Ethernet customers.

This was human error, and resolved within 8 minutes of being reported on a bank holiday weekend out of hours.

We will consider ways to manage this better - it is a trade off as we could have put more in automated control which would have stopped this specific error but could lead to more radical errors being possible with a lot less effort. We're reviewing procedures for checking changes like this before hand in future.

Sorry for the inconvenience.

Started 27 Aug 2010 17:00:00
Closed 27 Aug 2010 17:44:00