Order posts by limited to posts

Currently Open Posts

Per IP Stats - Open 3 Jan 09:59:26
Details
3 Jan 09:59:26

The 'per IP stats' usages pages are not working very well at the moment. Some customers are not seeing any usage, and some are seeing very little usage, and others are seeing correct amounts of usage!

This is being looked in to.

This does not affect the 'Line' usage or any billing based usage records. 

Update
11 Apr 12:25:10

We are sorry for how long this has been ongoing.
We are still trying to resolve the issue, however it is taking a lot longer than we expected.

We do not currently have an ETA however we will update this status once we have further news.

Started 3 Jan 09:56:05

All Broadband Minor Outage Posts

BE Outages at Shepherds Bush, Kensington Gardens and Acton Exchanges - Closed 17 May 09:22:49
Details
16 May 15:25:06

02 have informed us that there is currently an outage at the Shepherds Bush (LWSHE), Kensington Gardens (WRKGDN) and Acton (LWACT) exchanges.

Engineers are currently investigating and 02 expect to have an update within the next 2 hours. 

Update
17 May 09:23:23

This is now resolved, we have not been given any information as to the cause or fix for this outage.

Update
17 May 09:23:29

This is now resolved, we have not been given any information as to the cause or fix for this outage.

Started 16 May 15:23:05
Closed 17 May 09:22:49

1/3rd of lines PPP restarted - Closed 15 May 14:01:02
Details
15 May 13:48:38

Due to human error, we have managed to clear PPP on around a 1/3 of lines (B.gormless) today.

We are looking in to how procedures can be tightened up to avoid this in future.

Update
15 May 13:55:50

This was a controlled restart of the LNS not a crash, and as such the PPP restarts were even quicker as devices did not have to wait for LCP timeout to restart.

And yes, we are seriously looking at the best way to avoid this sort of error happening again.

And yes, the human concerned was me - RevK - so I owe a few people a pint...

Started 15 May 13:45:00
Closed
15 May 14:01:02

The answer is css - the control pages of the various machines currently all look identical apart from the host name. Using some css we can make all "live" boxes stand out very clearly.


Lines dropped: 100% 21CN-REGION-21CN-BRAS-RED10-L-STE and 100% 21CN-REGION-21CN-BRAS-RED13-L-STE and 99% 21CN-REGION-L-STE - Closed 14 May 09:33:53
Details
14 May 02:21:02
Lines: 100% 21CN-REGION-21CN-BRAS-RED10-L-STE and 100% 21CN-REGION-21CN-BRAS-RED13-L-STE and 99% 21CN-REGION-L-STE dropped at 2013-05-14 02:19:57
We have advised BT
This is likely to have affected multiple internet providers using BT
Update
14 May 09:35:21

Ater chasing BT on the cause for this outage, it would appear that this was caused by a Planned Works at London Stepney Green. 

The aim of these planned works was to increase capacity. 

Update
14 May 09:35:28

Ater chasing BT on the cause for this outage, it would appear that this was caused by a Planned Works at London Stepney Green. 

The aim of these planned works was to increase capacity. 

Broadband Users Affected 3%
Started 14 May 02:19:57 by AAISP automated checking
Closed 14 May 09:33:53
Cause BT

Issues with c.gormless - Closed 2 May 02:14:25
Details
30 Apr 19:18:22

We are trying to get one of our LNSs working on latest code, and this is proving a struggle as there are some major upgrades. We hope to finish this as per the planned work during this evening. This may mean loss of graphs and some PPP restarts for some customers.

Update
30 Apr 21:32:21

This is being an issue, and we will be working on this over night we expect.

Update
1 May 00:46:20

We think working over night on this may have paid off - we're monitoring now, but we hope we have cracked the issue at last.

Update
1 May 01:03:25

c.gormless is now running at normal load (1/3 of customers) and showing no leaked RADIUS slots or stuck sessions or processes. This is good news.

Update
1 May 06:30:12
Monitoring is showing the LNS appears to be stable, but we have seen some issues with stats being recorded on our core RADIUS still. This is somewhat easier to address, and being looked at still.
Update
1 May 17:17:34

It looks like the work last night as paid off. This evening we have the "D" LNS working on the newest code, and are letting lines trickle over to it when they reconnect for any reason. We plan to switch over more lines over night.

But so far it is looking like it is working as planned. We are checking the performance of the new code as well to make sure this was all worth it smiley

Thank you all for your patience on this.

Update
2 May 02:09:03

We have switched traffic to d.gormless over night as planned.

The LNSs are looking stable - we are still seeing a few sessions from the previous few days which have not had proper accounting (so free usage) which are being cleared up.

We are also seeing the odd BT issue from time to time, and it is nice to say that a blip is not, in fact, my fault - more "situation normal".

We're still looking in to why people are getting the the odd up/down emails and tweets as it is all related. Now that the LNSs are being stable we can investigate this somewhat more easily.

Update
3 May 04:57:05

The scheduled LNS switch overs to move on to the new code base seem to be working as planned. Tonight's has gone through without any apparent issues. We will be rolling out the change over the next couple of nights to cover all LNSs.

Once again, thank you all for your patience and understanding.

Started 30 Apr 19:17:18
Closed 2 May 02:14:25

Packet loss on BE lines - Closed 7 May 21:00:00
Details
7 May 13:46:00

There is packet loss on BE lines thoughtout the country, we have reported this BE and they are going to investigate.

Update
7 May 14:54:25

We've had reports that BE Retail customers are also seeing a similar problem, so this seems like a general BE network/backhaul problem.

We have reported this to BE.

Update
7 May 15:58:26

BE are aware of a possible core network problem and they are investigating.

Update
7 May 16:12:51

BE have said that this is due to a technical fault on their network which they are investigating.

Update
8 May 08:38:02

This was resolved just before 9pm - repairs to fibres were made within the BE network.

Started 7 May 12:18:00 by AAISP Staff
Closed 7 May 21:00:00

Pink status and odd emails - Closed 2 May 02:14:51
Details
18 Apr 18:35:10

It looks like we have a blip in RADIUS around 5pm which has resulted in lines showing "pink" or "salmon" on the management pages and some people getting odd status update emails.

Services are working - this is an accounting issue.

We're not sure what happened exactly, but this is a scenario that the systems are designed to cope with relatively sensibly. Some usage may not be metered for the next few hours and some lines will be PPP restarted over night.

We are also doing an LNS switch over tonight, but there are issues with that which means it is likely to all happen later than usual, i.e. around 7am.

As it happens we are working on some updates to RADIUS which is hoped to be phased in over the next few weeks. Incidents like this, whilst minor, and not service affecting, are a nuisance, and they are being incorporated in to the new design to make such issues less likely.

Update
18 Apr 18:58:48

We're clearing the pink lines now, ppp restart.

Update
28 Apr 21:12:03

Some customers on C or D gormless will have gaps in their graphs this afternoon and this evening. The lines have been connected through these gaps but have has some PPP restarts.

Started 18 Apr 17:00:00
Previously expected 19 Apr 07:00:00
Closed 2 May 02:14:51

Session restarts - Closed 29 Apr 08:31:03
Details
29 Apr 08:31:52

Some customers will have had PPP restarts around 08:30 and possibly moving from the D to the C LNS.

Sorry for any inconvenience.

Started 29 Apr 08:20:00
Closed 29 Apr 08:31:03

RADIUS issues - Closed 28 Apr 18:09:25
Details
28 Apr 12:30:40

We seem to have an issue with RADIUS accounting this morning. We have restarted things and we are investigating the cause.

There have been some odd tweets and emails about the issue, but it lists the previous "AdminReset" from a previous LNS switch over from earlier in the week. This seems to be an odd side effect of the problem which we also need to investigate.

Update
28 Apr 12:33:22

P.S. As usual, this also means some free usage, and will mean some PPP restarts later today and over night.

Update
28 Apr 12:33:58

It also means the lines show "salmon pink" on the control pages. But the lines are working, of course.

Update
28 Apr 16:34:57

We are manually clearing some of the remaining sessions for which there is no accounting. This means a brief PPP restart for some customers.

Update
28 Apr 17:26:29

There is definitely something up with RADIUS which we are stil chasing.

Two thirds of lines are all sorted and accounting cleanly now.

One third is still being working on.

Started 28 Apr 08:00:00
Closed
28 Apr 18:09:25

We think it is all sorted for now


BE Congestion - Closed 17 Apr 11:18:24
Details
5 Apr 12:38:03

We have noticed that for the last few days there have been increased latency on quite a few of our BE lines.
Initially the latency was occurring between 18:00 - 00:00, however today we have seen the latency start at 09:00.  

BE have been informed and this has been passed to O2 to investigate possible network issues.

 

Update
5 Apr 20:51:53

Looks like the increased latency suddenly stopped around 1:30.
Neither BE or O2 could see a cause for this.

We will montior over the weekend and early next week to make sure the issue is actaully resolved. 

Update
11 Apr 11:05:47

The increased latency has not returned as of yet, however we are still chasing BE as to an explaination as to what the issue was.

Update
17 Apr 11:19:12

We finally have an update as to what the cause of this issue was:

"I can confirm that the underlying cause had been narrowed down to a network card in our provider's core network. The card was found to be just looping and generating high amounts of traffic. As a countermeasure, an engineer physically rebooted the device and thereby resolved the issue. The card itself has remained stable since and no further concerns have been registered."

Update
17 Apr 11:19:20

We finally have an update as to what the cause of this issue was:

"I can confirm that the underlying cause had been narrowed down to a network card in our provider's core network. The card was found to be just looping and generating high amounts of traffic. As a countermeasure, an engineer physically rebooted the device and thereby resolved the issue. The card itself has remained stable since and no further concerns have been registered."

Started 5 Apr 11:58:06 by AAISP Pro Active Monitoring Systems
Closed
17 Apr 11:18:24

Card Reboot 

Cause BE

Lines dropped: 100% 21CN-BRAS-RED5-LS-BAS-HUDDERSFIELD - Closed 11 Apr 11:03:36
Details
5 Apr 02:06:02
Lines: 100% 21CN-BRAS-RED5-LS-BAS-HUDDERSFIELD dropped at 2013-04-05 02:05:39
We have advised BT
This is likely to have affected multiple internet providers using BT
Update
5 Apr 03:05:03
Lines: 100% 21CN-BRAS-RED5-LS-BAS-HUDDERSFIELD dropped again at 2013-04-05 03:04:35.
Update
11 Apr 11:03:52

This appears to have been caused by a BT PEW.

Update
11 Apr 11:04:09

This appears to have been caused by a BT PEW.

Started 5 Apr 02:05:39 by AAISP automated checking
Closed 11 Apr 11:03:36
Cause BT

LNS issue - loss of graphs - Closed 4 Apr 19:45:25
Details
4 Apr 15:40:06

One of our LNSs had an issue today, lines blipped and reconnected automatically. Graphs are lost from before the incident

This is being investigated.

This only affected one of the three live LNSs.

Broadband Users Affected 33%
Started 4 Apr 15:03:00
Closed
4 Apr 19:45:25

This happened again, and we are investigating the cause. Thankfully the fallback systems are very quick and efficient these days, but this should not happen!


Packet loss is back on lines on the MANOR PARK exchange  - Closed 1 Apr 10:35:28
Details
5 Mar 10:58:09

Packet loss is back on lines on the MANOR PARK exchange.  We have reported this to BT and BTO are going to investigate.

 

Update
22 Mar 11:18:27

BT's IP department are currently investigating and monitoring this issue.

Update
22 Mar 11:19:22

BT's IP department are currently investigating and monitoring this issue.

Started 2 Feb 12:57:22 by AAISP Staff
Closed 1 Apr 10:35:28

Lines dropped: 100% 21CN-REGION-21CN-BRAS-RED10-L-NWS - Closed 1 Apr 10:34:08
Details
27 Mar 16:27:02

Lines: 100% 21CN-REGION-21CN-BRAS-RED10-L-NWS dropped at 2013-03-27 02:24:36
We have advised BT
This is likely to have affected multiple internet providers using BT

Started 27 Mar 02:24:36 by AAISP automated checking
Closed 1 Apr 10:34:08
Cause BT

LINX peering loss - Closed 1 Apr 10:33:41
Details
23 Mar 19:02:10

An issue was reported this evening with packet loss at LINX. We have suspended LINX peering for the time being until this was resolved.

We'd like to thank customers for alerting us to this with correct use of our MSO text system.

Started 23 Mar 18:37:00
Previously expected 23 Mar 18:42:00
Closed 1 Apr 10:33:41

Home::1 restrictions - Closed 1 Apr 09:57:09
Details
1 Apr 09:47:59

Not quite an April fool's joke, sadly, but some Home::1 users have seen their lines restricted this morning. This appears to be due to a delay between bill for this month being issued and DD being scheduled. This is silly! It has been fixed for next month.

Quite separately, something is not right with RADIUS accounting which means a lot of users are not seeing any metered usage, i.e. download is free for many customers this morning, including Home::1 users. This too i sbeing fixed now.

Started 1 Apr 06:00:00
Closed
1 Apr 09:57:09

All affected lines have had the restrictionlifted.


LNS issue, loss of graphs - Closed 29 Mar 20:39:42
Details
29 Mar 20:42:11

One of our LNSs had an issue today, lines blipped and reconnected automatically. Graphs are lost from before the incident, and usage for the hour up to the incident will not have been metered/billed.

We have some clear ideas what caused this and it is being investigated.

This only affected one of the three live LNSs.

Update
29 Mar 21:03:53

Note we cleared some of the affected lines back to the original LNS at 9pm causing a PPP reconnect.

Broadband Users Affected 33%
Started 29 Mar 18:42:00
Closed 29 Mar 20:39:42

Evening Packet loss on BRADFORD-ON-AVON - Closed 5 Mar 16:11:44
Details
14 Jan 15:57:15

For the last few nights there appears to have been packet loss on all 21CN BRADFORD-ON-AVON line.
The packet loss seems to last between 17:00 - 00:00.

We are currently reporting this to BT. 

Update
14 Jan 16:43:31

BT have raised a works request.

Update
15 Jan 10:39:27

BT operate are still investigating, we should have an update tomorrow morning.

Update
5 Mar 16:11:59

This looks to be all fixed.

Started 14 Jan 15:55:26 by AAISP Staff
Closed 5 Mar 16:11:44
Cause BT

Packet loss on lines on the MANOR PARK exchange - Closed 28 Feb 09:03:42
Details
27 Feb 13:17:30

There is  packet loss on lines on the MANOR PARK exchange. We have reported this to BT and BTO are going to investigate.

Started 2 Feb 12:57:22 by AAISP Staff
Closed
28 Feb 09:03:42

Service was fully restored this morning after loop testing BT performed cleared alarms.

Cause BT

Lines Down on BT BRAS 21CN-BRAS-RED10-SL and 21CN-BRAS-RED9-SL - Closed 27 Feb 03:03:03
Details
27 Feb 02:31:51

Since 02:06 today we have all lines that are connecting though the BT BRAS '21CN-BRAS-RED10-SL' and '21CN-BRAS-RED9-SL' are down.

This is due to BT planned work on the Slough BRAS.

This will be affecting other ISPs too, and is affecting about 0.6% of our customers. 

Broadband Users Affected 0.30%
Started 27 Feb 02:06:45
Closed
27 Feb 03:03:03

Lines came back online at around 3am.


At risk - wholesale routing issues - Closed 14 Feb 20:12:23
Details
26 Jan 11:51:26

Following loss of one external link at 4:31 this morning we can see that there are some areas which are not quite falling back as planned. This means some of the wholesale customer links have needed some manual reconfiguration. We believe this is all resolved now, but as it is running on only one link the wholesale interconnects are at-risk until the link is fixed. We are chasing this with the suppliers.

This should have no impact on our broadband customers generally, though you may find you switch LNS if you reconnect at all today as part of the work around for this has been a change of active LNSs. The impact of this is loss of graphs during the day, but should be correct historically from tomorrow.

In the longer term we will be contacting wholesale customers to review the fallback arrangements in their network and ours to ensure such a link failure in future would fall-back seamlessly.

Started 26 Jan 04:31:00
Closed 14 Feb 20:12:23

Packet loss on FTTP line on BRADWELL ABBEY exchange - Closed 11 Feb 14:15:43
Details
21 Jan 16:24:43

There is packet loss on FTTP lines on the BRADWELL ABBEY exchange, we have reported this to BT.

Update
22 Jan 09:26:50

The packet loss appears to have disappeared at around 22:00 last night. We're awaiting further updates from BT later on today.

Update
1 Feb 23:50:25

This is still a problem, and we're seeing low levels of packet loss which will affect speed.

The latest from BT is that they have planned work on 6th Feb to increase capacity.

We will try to get more details and a clarification as to wether this upgrade will fix the problems that our monitoring and customers are seeing. 

Started 16 Jan 14:20:02 by AAISP Staff
Closed
11 Feb 14:15:43

BTs  planned capacity increase on 6th Feb appears to have resolved the issue.


Network issue affecting some DSL and Wholesale customers - Closed 30 Jan 16:59:47
Details
30 Jan 13:19:23

Similar to the problem at the weekend, one of our ports to Datahop is flapping - we're disabling the port and moving the traffic over to our other port.

More details to follow

Update
30 Jan 16:59:47

Datacentre staff are investigating the faulty cable, but service was restored in about 15 minutes of the fault happening.

Started 30 Jan 13:10:18
Closed
30 Jan 16:59:47

Service restored, the cable has sinces be connected to a different port at the far end.


Home::1/SIM usage graphic - Closed 26 Jan 18:41:58
Details
26 Jan 17:20:26

The main www.aa.net.uk web page normally has a graphic on the home page for Home::1 and quota limited SIM users.

This is not working at present for most customers as a result of a link failure.

Once we have sorted the link failure we are going to be making a few changes so this does not happen again.

It also means the top-up screen for users on Home::1 is missing, please text support if you need topup over the weekend.

Started 26 Jan 04:31:00
Closed
26 Jan 18:41:58

Resolved by a reconfiguration for now


Network blip - Closed 26 Jan 04:46:10
Details
26 Jan 04:45:54

Nagios has gone mental telling me about a port blip at 04:31 in London, and I can see that a handfull of customers had an issue at the same time. I'm investigating what is going on, but overall things look fine, traffic is flowing, and not a peep out of customers on irc.

Update
30 Jan 13:20:29

This port started flapping again on 30th Jan 13:10 more info here: http://status.aa.nu/apost.cgi?incident=1726

Started 26 Jan 04:31:00
Closed
26 Jan 04:46:10

The issue relates to a specific link, and the lines that blipped werte mobile data SIMs. The link has been shut down (thanks Paul). We're investigating this further, but panic over for now.


LNS restart - Closed 18 Jan 12:03:24
Details
18 Jan 12:05:17

The "B" LNS restarted unexpectedly - lines reconnected to other LNSs as expected, but graphs are lost.

Started 18 Jan 12:03:13
Closed 18 Jan 12:03:24

HAMPTON exchange - Closed 16 Jan 11:15:23
Details
16 Jan 09:13:58

Lines on the Hampton exchange have been down since around 18:04 this is linked to the Southampton 7750 router/Multi-Service Interconnect Link failing, an enigneer is currently working to replace the on-board Flash card.

Started 15 Jan 18:04:14 by AAISP Staff
Closed
16 Jan 11:15:23

Service was restored at 11:15:55


GISBURN exchange drop - Closed 14 Jan 18:21:00
Details
14 Jan 16:33:45

Some lines on the GISBURN exchange dropped at 15:56:52

Update
14 Jan 16:47:04

A card is failing on MSIP equipment at Preston. A remote reset has been attempted but did not restore service. A BT  engineer is being tasked to perform a reseat or change hardware.

Started 14 Jan by AAISP Staff
Closed
14 Jan 18:21:00

 

BT Incident Reference:2357. Service was restored at 18:21 following the change of a failed card at Preston.

 


BE LAC restart? - Closed 6 Jan 02:00:00
Details
6 Jan 01:59:53

One of Be's LACs appears to have restarted affecting a hadful of customers. Lines reconnected.

Started 6 Jan 01:57:00
Closed 6 Jan 02:00:00

Unexpected LNS resart - Closed 6 Jan 01:45:00
Details
6 Jan 01:56:58

The "D" LNS restarted affecting a 1/3 of customers - it is the same cause code as before and we are looking to see if the extra diagnostics have provided more clues and how we can add even more to track this next time. This is very infreqent and we have failed to reproduce this in the lab, so it is taking time to track it down and fix it. Lines reconnected immediately as they are supposed to and graphs are lost before the restart.

Started 6 Jan 01:41:00
Closed 6 Jan 01:45:00

Home::1 and Direct Debit - Closed 1 Jan 09:22:24
Details
1 Jan 09:17:47

Again we have run in to a snag with Direct Debits and Home::1 which has meant there is a race condition where several Home::1 accounts show as not being paid before the Direct Debit notices are sent out. We believe we have found the underlying cause of this at last, and we are getting lines back on line now. Sorry for the inconvenience.

Started 1 Jan 09:00:00
Closed
1 Jan 09:22:24

The start of month bill run took longer than expected which created the race condition, we have adjusted the logic to try and avoid this issue in future.


Network blip - Closed 31 Dec 2012 02:25:00
Details
31 Dec 2012 02:32:57

There was some sort of blip that caused problems with 3 of our 4 external routers. It is not clear what exactly at the stage. They all recovered automatically.

Started 31 Dec 2012 02:20:00
Closed 31 Dec 2012 02:25:00

Be Lines Blip - Closed 17 Dec 2012 15:49:43
Details
14 Dec 2012 14:52:45

Some Be lines just blipped, they have come back though. More info to follow.

Update
14 Dec 2012 14:54:11

This looks like a(n) LAC within the Be network, we're contacting Be for more details, howevr lines came back online within a couple of minutes.

Update
14 Dec 2012 15:03:56

Another blip happened at 15:02, we are chasing Be.

Update
14 Dec 2012 15:14:23

Be are aware of the fault and suspect a router on the Be network - they are investigaating the cause.

Update
17 Dec 2012 15:49:43

The fault has passed now and we've asked Be for an update as to what they have done.

Started 14 Dec 2012 14:43:00
Closed
17 Dec 2012 15:49:43

This was caused be a router restart within Be.


LNS Blip - Closed 18 Dec 2012 11:38:31
Details
18 Dec 2012 11:37:50

We've just had an LNS reboot which has meant about a third of our lines dropped. 

They will reconnect in the next minute or so.

Update
18 Dec 2012 11:38:31

This is similar to the problem yesterday which affected a different LNS.

We are investigating.

Update
18 Dec 2012 13:52:33

Lines came back online within a minute or two, we are investigating the cause of these blips.

Closed 18 Dec 2012 11:38:31

Congestion issue on Redruth Exchange - Closed 20 Dec 2012 18:55:28
Details
19 Dec 2012 13:37:44

The Redruth Exchange is experiencing packet loss in the evenings from around 7pm to 11pm. It has been reported to BT

Update
20 Dec 2012 18:57:02

BT say they have resolved this problem and that their links last night were showing good levels of traffic without problems.

We'll continue to monitor, but hopfully the latency problem has now been fixed.

Started 19 Dec 2012 13:34:53
Closed
20 Dec 2012 18:55:28

BT say they have resolved this problem and that their links last night were showing good levels of traffic without problems.

We'll continue to monitor, but hopfully the latency problem has now been fixed.


BT Lines on Stepney node up and down all night - Closed 20 Dec 2012 07:12:11
Details
20 Dec 2012 05:27:30

Sorry for lack of post earlier, it should have been automatic.

BT clearly have some major issues with their Stepney node which has meant lines have been up and down all night.

Broadband Users Affected 5%
Started 20 Dec 2012 01:35:26
Closed
20 Dec 2012 07:12:11

BT say this is fixed, and lines all look stable


BT Fibre Blip - Closed 18 Dec 2012 22:28:00
Details
18 Dec 2012 22:26:30

One of our links to BT 'blipped' causing around a third of our customers to disconnect. Lines are reconnecting now though.

Update
18 Dec 2012 22:30:09

Most lines are back online now.

Started 18 Dec 2012 22:18:00
Closed
18 Dec 2012 22:28:00

The issue was indeed the link to BT, and we are awaiting an explanation from BT as to what happened. It is possible they will not know either.


LNS Blip - Closed 17 Dec 2012 17:05:00
Details
17 Dec 2012 17:01:36

One of our LNS's just rebooted causing DSL lines connected to it to blip.

We are looking in to the cause, but line will reconnect within a minute or two.

Update
18 Dec 2012 05:59:13

Lines did reconnect immediately - if nothing else this is a good test of the fall-back and redundant systems. Only a third of customers were effected and the main problem is loss of the monitoring graphs. We do apologise for any inconvenience.

Started 17 Dec 2012 16:59:30
Closed
17 Dec 2012 17:05:00

This is indeed the same issue we saw last time, and is still being investigated.


Graphs today - Closed 15 Dec 2012 23:59:59
Details
12 Dec 2012 07:23:44

As part of the preparation for tomorrow's planned work, we are running on two LNSs today rather than three. This allows the switch change to go ahead tomorrow.

Normally an LNS change some means people will see their graph start in the early hours, but graph history is correct and shows the graphs join up the next day.

Today is slightly special and some people will change LNS if they re-connect at all during the day. The result is a graph that starts from that first reconnect. Again, over night, the historical graphs will glue it all together. It just means some people with line issues will not see the start of day today on their graphs today.

Anyone with multiple lines bonded that reconnects may change LNS too, in which case the other lines will be moved within a couple of minutes to ensure bonding continues normally.

Update
13 Dec 2012 16:44:05

We are switching back to three LNSs tonight - but the issue with graphs changing on a reconnect will still apply until we have cleared lines to new LNSs over the next few nights.

Started 12 Dec 2012
Previously expected 15 Dec 2012 23:59:59
Closed 15 Dec 2012 23:59:59

LNS restart - Closed 16 Dec 2012 18:13:00
Details
16 Dec 2012 18:10:28

One of our LNSs restarted unexpectedly (c.gormless) and we are looking in to it now. Lines reconnecting automatically as expected.

Update
16 Dec 2012 18:12:31

The diagnostics are not that clear and this will take a bit longer to get to the bottom of this. The immediate issue is over with an automatic restart, and all four LNSs are in operation.

Started 16 Dec 2012 18:07:00
Closed
16 Dec 2012 18:13:00

We are adding more diagnostics and will be including this in LNS updates over coming weeks. This should help track what happened if this ever happens again.


BT issues in Manchester - Closed 16 Dec 2012 16:18:00
Details
16 Dec 2012 18:38:29

There appear to have been a couple of blips with BT 20CN lines in Manchester. No word from BT.

Started 16 Dec 2012 15:51:00
Closed 16 Dec 2012 16:18:00

Router upgrade - Closed 13 Dec 2012 19:40:00
Details
12 Dec 2012 19:28:41

We have identified a reason some upgrades have causes a few seconds down time when not expetced to cause any. We have just done upgrades to rectify this before tomorrows switch change over.

Some customers may have seen a few seconds of issues but hopefully we have managed to sort this once and for all now.

Sorry for any inconvenience.

Started 12 Dec 2012 19:20:00
Closed 13 Dec 2012 19:40:00

Latency on some BT 21CN lines on Guildford BRASs - Closed 12 Dec 2012 21:00:00
Details
11 Dec 2012 19:31:28

Since 7pm there has been notably high latency on many BT 21CN lines on Guildford BRASs. We are investigating with BT.

Update
11 Dec 2012 20:02:38

Our analysis shows this to be a fualty LAG in BT core network somewhere near the Guildford BRASs. We are chasing this with BT.

Update
11 Dec 2012 20:46:21

We have managed to manually juggle IP addresses of LNS L2TP enpoints to work around this on most affected lines now.

Update
12 Dec 2012 04:06:19

Looks to be fixed at 21:00

Update
12 Dec 2012 10:17:42

Here is some more information about this particualr issue (originally posted to Facebook)

I do think we should write up more about some of this stuff. At 7pm last night we had over 100 customer's Internet start being unusable with massive latency. See graph below, shows it.

Within 20 minutes we hadidentified which lines were affected, and were on to BT (as they were all BT lines). Customers were updates by status pages and on irc.

We managed to work out that there was a pattern in the affected lines.

Not only were they all BT lines, on a specific area metro node (Guildford), and all 21CN lines, there was a pattern where sets of lines on combinations of LNS at our end and and BRASs at BTs end had issues.

We tested changing a line to a different IP on the same LNS our end, and as we expected, the problem went away. We were also able to confirm with affected customers on irc that they saw the fix work as well.

We were able to tell BT that the problem must be in a link aggregation group (LAG) somewhere in the back-haul from Guildford. Basically, this means multiple links are used to handle the total capacity. The equipment picks the link based on a hash from things like IP addresses, so the IP address of the BRAS and LNS together decide the link. One of the links was ill and so affected a set of lines.

We then proceeded to manually change affected lines to a different IP for the LNS (not a change to the customer IP, this is internal). This caused their service to go back to normal.

BT did fix the issue (still waiting for a detailed report from them) at 9pm.

From experience, this is the sort of issue BT have no way to detect themselves. Maybe they have better monitoring now, but when we reported it they did not know about it, and other ISPs had not spotted it.

It is worth bearing in mind, this fault will have affected every ISP using BT 21CN lines, not just us. The fact we got BT on to it so quickly helped all of the affected lines on all affected ISPs. You're welcome!

It is only because we have this detailed monitoring and graphing of every line that we can identify and diagnose issues like this at all, let alone so quickly.

Started 11 Dec 2012 19:00:00
Closed 12 Dec 2012 21:00:00

Line Drop and Packet Loss on SHOREDITCH Exchange 21CN Lines - Closed 11 Dec 2012 16:36:20
Details
11 Dec 2012 11:58:37

Temporary loss of service followed by packet loss on the Shoreditch Exchange. BT currently investigating the issue.

Update
11 Dec 2012 16:36:38

IPMB card switchover carried out under ALS 500428 at 12:53, BT fixed fault.

Started 11 Dec 2012 11:56:17
Closed
11 Dec 2012 16:36:20

IPMB card switchover carried out under ALS 500428 at 12:53


Some 21CN Guildford RASs down - Closed 11 Dec 2012 16:32:11
Details
11 Dec 2012 15:21:07

At roughly 3pm a few guildford RASs went down.

We are rpeorting this to BT at the moment. 

Update
11 Dec 2012 15:23:19

The RASs that are affected are:

21CN-BRAS-RED9-GI-B

21CN-BRAS-RED10-GI-B

21CN-BRAS-RED12-GI-B

Update
11 Dec 2012 15:28:46

BT have raised an incident.

Trying to get more information. 

Update
11 Dec 2012 15:35:03

BT say diagnostics are still ongoing update should be around 16:00

Update
11 Dec 2012 16:06:28

Most lines are back up now.

Update
11 Dec 2012 16:32:43

fault fixed, Missing vlan configuration was reapplied.

Started 11 Dec 2012 15:00:00 by AAISP Pro Active Monitoring Systems
Closed
11 Dec 2012 16:32:11

Missing vlan configuration was reapplied

Cause BT

Status pages, etc, updating - Closed 06 Dec 2012 09:39:28
Details
19 Nov 2012 16:50:31

We carried out some major "behind the scenes" updates to the main control systems over the weekend, and whilst most of these have gone seamlessly, a few have not quite gone to plan.

As such, some of the controls and buttons and text in the information packs are not quite right.

Feel free to point out any issues on irc and we'll catch them all.

Sorry for any inconvenience.

Started 18 Nov 2012 12:00:00
Closed 06 Dec 2012 09:39:28

Home::1 users offline briefly - Closed 06 Dec 2012 09:39:13
Details
05 Dec 2012 11:18:12

Home::1 users were offline for about 5 minutes from 11:00 this morning.

It's now fixed.

We've given all Home::1 users a 1 gigabyte extra quota this month by way of apology.

Closed 06 Dec 2012 09:39:13

21CN Outage at the SOUTHAMPTON Exchange - Closed 03 Dec 2012 15:02:19
Details
03 Dec 2012 09:52:01

At around 8:30 all 21CN lines at the SOUTHAMPTON exchange went down.

We are reporting this to BT. 

Update
03 Dec 2012 10:09:36

BT have raised a task to investigate this.

Update
03 Dec 2012 11:42:37

Please see http://status.aa.net.uk/apost.cgi?incident=1655

Update
03 Dec 2012 15:02:45

Fixed

Started 03 Dec 2012 09:51:08
Closed
03 Dec 2012 15:02:19

http://status.aa.net.uk/apost.cgi?incident=1655

Cause BT

21CN outage at WANDSWORTH exchange for line on the 21CN-BRAS-RED3-SL RAS - Closed 03 Dec 2012 15:01:55
Details
03 Dec 2012 10:05:52

All 21CN lines at WANDSWORTH exchange and the 21CN-BRAS-RED3-SL RAS went down at roughly 9:30.

We are reporting this to BT. 

Update
03 Dec 2012 10:35:55

BT are investigating

Update
03 Dec 2012 11:42:33

Please see http://status.aa.net.uk/apost.cgi?incident=1655

Update
03 Dec 2012 15:02:15

Fixed

Started 03 Dec 2012 10:01:53 by AAISP Staff
Closed
03 Dec 2012 15:01:55

http://status.aa.net.uk/apost.cgi?incident=1655

Cause BT

21CN outage at UPPER HOLLOWAY 21CN-BRAS-RED12-L-WAT - Closed 03 Dec 2012 15:01:41
Details
03 Dec 2012 11:03:35

At around 09:19:57 21CN line on UPPER HOLLOWAY on the  21CN-BRAS-RED12-L-WAT lost PPP. We are reporting this to BT. 

 

Update
03 Dec 2012 11:42:29

Please see http://status.aa.net.uk/apost.cgi?incident=1655

Update
03 Dec 2012 15:01:49

Fixed

Started 03 Dec 2012 09:19:57 by AAISP Staff
Closed
03 Dec 2012 15:01:41

http://status.aa.net.uk/apost.cgi?incident=1655

Cause BT