Order posts by limited to posts

Currently Open Posts

Email and Web Disk Storage Maintenance - Open 5 Mar 21:00:00
Details
5 Mar 13:49:51

Please see http://status.aa.net.uk/1750 for information regarding planned work that will affect web hosting on the evening of March 7th.

Started 5 Mar 21:00:00

Maidenhead Core Switch Upgrade Sunday 10th Feb - Open 10 Feb 13:00:00
Details
15 Jan 14:19:37

We will be performing a swap of the core network switches in our Maidenhead datacentre on the afternoon of Sunday 10th February.

The work is expected to take up to half an hour, during which there will be periods of time when service is affected.

The affected services will be:

Email, both incoming and outgoing
VoIP calls
Ethernet services terminated in Maidenhead
Web page hosting
Control panel and Billing system access
Customer hosted servers

General internet connectivity over DSL and Ethernet terminated in London will be unaffected by this work.

This staus page will be updated during the work.

Update
10 Feb 12:25:39

This work will be done a little earlier than initially scheduled. The work will take place between 12:30pm and 2pm, but actual outage is only expected to be a couple of minutes.

Started 10 Feb 13:00:00

At Risk - Carrier Transit Work - Open 15 Jun 2012 00:01:00
Details
14 Jun 2012 17:54:03

One of our carriers that provide us connectivity in Maidenhead have planned (short notice) maintenance on parts of their network that provide us connectivity.

This will happen between midnight and 2pm.

We are not expecting this to impact customers, but is to be considered 'at-risk'.

Started 15 Jun 2012 00:01:00 by Datacentre

Database Server Maintenance - Open 01 Jun 2012 17:00:00
Details
01 Jun 2012 13:29:22

We'll be rebooting one of our database servers shortly after 5pm today. This is to replace a faulty disk drive. 

There will be a period of a few minutes where some usage/logs won't be viewable from our control pages.

Update
01 Jun 2012 17:09:52

This is happening now, it should take about 10 minutes.

Update
01 Jun 2012 17:12:52

The server has booted up ok.

Started 01 Jun 2012 17:00:00

IPv6 changes - Open 21 Apr 2012 12:00:00
Details
21 Apr 2012 13:55:09

Apologies for some minor distruption with IPv6 today in relation to services we run in the Maidenhead Data Centre. There have been a few minutes during today where routing has not been quite right for IPv6 for some of the services.

We have renumbered our office on to a separate /48 to improve native IPv6 routing. (A /48 is visible to more of the Internet).

Over the next few days or weeks we will be renumbering services in the data centre from 2001:8b0::/48 to 2001:8b0:1::/48 addresses. Thanks to a customer (Lawrence) for allowing us to change is allocation to free that up.

If you want to trust an IPv6 as being from A&A infrastructure you should be able to trust 2001:8b0::/47 as this includes the /48 was have in London and the /48 we have in Maidenhead and our offices.

At present there may be a handful of machines which are not technically A&A infrastructure within this space that will be renumbered this week as well. A&A infrastructure should have reverse DNS under aa.net.uk as well.

We aim to parallel run the old and new IPv6 during DNS changes so that there are no issues with any services. If anyone has any issues please contact support.

We are not planning any IPv4 changes.

Note, customers that have IPv6 on fibre from Maidenhead can now request a /48 if you need, routed via your existing /64 link.

Started 21 Apr 2012 12:00:00
Previously expected 01 May 2012

IPv6 routing improvements - Info 18 Apr 2012 14:51:28
Details
18 Apr 2012 14:51:28

We are pleased to confirm the hosting in the Maidenhead Data Centre now has full 1500 byte MTU native IPv6 transit in operation. This improves the MTU to our various servers, to customer servers, and to Ethernet transit customers. If you have any issues, please do contact support, but this should be completely seamless.

We may also be making some changes to the IPv6 addresses for A&A servers in Maidenhead in due course. We'll post more details when that is all decided. This should not have any impact as we will, of course, overlap assignments and make necessary changes to DNS records.

Started 18 Apr 2012 14:00:00

All Maidenhead Data Centre Posts

Web and email outage - Closed 5 Apr 09:14:46
Details
4 Apr 15:47:25

This is ongoing. We're investigating.

Update
4 Apr 16:07:41

This should now be fixed. Please let support know if you see any problems, or have any questions.

Update
5 Apr 09:15:04

This was resolved yesterday afternoon.

Started 4 Apr 15:46:11
Closed 5 Apr 09:14:46

At risk - switch upgrade - Closed 10 Feb 14:00:00
Details
7 Feb 08:57:38

We have a switch upgrade planned for Sunday - but it seems that somehow it knows!

There was some sort of issue yesterday at 9am and again at 11:20pm. Monitoring suggests something not quite right since 9am yesterday and seems to point to the switch which is going to be upgraded on Sunday.

We are monitoring carefully and if there are any issues during the day we will have staff carry out the upgrade as an emergency during the day.

This could affect several services, including VoIP. Hopefully this will not be an issue, and the upgrade will go ahead as planned on Sunday.

Update
10 Feb 12:57:20

Work is progressing, some servers are already up and working on the new switch.

Update
10 Feb 13:21:42

Everything seem sto be working ok - Customers may have seen an outage of around 2 minutes.

Update
10 Feb 13:52:56

Some customer hosted servers are offline at the moment, checking the configuration for them at the moment.

Started 7 Feb 09:00:00
Previously expected 10 Feb 12:00:00
Closed
10 Feb 14:00:00

The swap over has been sucsessful. Most services had a couple of minutes of outage, and some hosted servers were a longer due to a configuration issue.


Connectivity problems in Maidenhead - Closed 6 Feb 12:49:46
Details
6 Feb 09:15:19

There's a connectivty problem in Maidenhead. We're investigating.

Update
6 Feb 09:34:51

Connectivity looks normal again now. We're still investigating the cause of the problem.

Closed
6 Feb 12:49:46

The problem was resolved by power cycling some hardware. Although we're still not sure exactly what caused this, we are suspicious of one of our switches. We already have planned work to replace the switch on Sunday.


Disk server issues - Closed 22 Aug 2012 21:18:59
Details
22 Aug 2012 21:18:04

Disk server playing uo affecting web pages and email.

Update
22 Aug 2012 21:18:49

Staff are working on this.

Started 22 Aug 2012 21:00:00
Closed
22 Aug 2012 21:18:59

All sorted


Email and Web Server Disk Storage Problem - Closed 21 Aug 2012 14:50:03
Details
21 Aug 2012 14:47:26

We are just rebooting one of our Disk storage servers due to a problem affecting email and web page hosting services.

We expect services to be restored in a few minutes

Update
21 Aug 2012 14:49:24

Service restored.

Started 21 Aug 2012 14:44:03
Closed
21 Aug 2012 14:50:03

service restored within a few minutes, appologies for this outage.


IPv6 packet loss affecting AAISP offices - at risk period - Closed 25 Jul 2012 20:45:22
Details
25 Jul 2012 19:29:39

We're seeing packet loss affecting our offices at the moment.

We are investigating and have an engineer on their way to site, and so this should be considered an at-risk period for connectivity in Maidenhead until we isolate the problem.

Update
25 Jul 2012 20:45:34

This has now been resolved.

Started 25 Jul 2012 19:24:31
Closed 25 Jul 2012 20:45:22

IPv6 Routing Problem in Maidenhead Datacentre - Closed 27 Jun 2012 13:27:00
Details
27 Jun 2012 09:30:41

There is currently a problem with IPv6 routing in our maidenhead datacentre. This will be affecting Ethernet customers as well as access to some of our server over IPv6.

Update
27 Jun 2012 09:32:01

IPv6 is now routing again.

Update
27 Jun 2012 09:35:47

This looks like one or our transit links was announcing IPv6, but not routing it. We've taken the link down for the time being whilst we investigate further.

Update
27 Jun 2012 09:42:25

Transit provider confirm that they are investigating.

Started 27 Jun 2012 09:24:00
Closed
27 Jun 2012 13:27:00

There was a problem with a link to a transit provider, they have now resolved this, are are looking to how this problem can be prevented in the future.


Email & Web Server Problem - Closed 20 Jun 2012 12:00:09
Details
20 Jun 2012 11:54:42

One of our stoarge servers used for email and web pages is needing to be rebooted. This is happening now.

Web pages and email access will be affected for a couple of minutes.

Update
20 Jun 2012 11:57:32

The Disk server is booting back up now, service will be restored very shortly.

Update
20 Jun 2012 12:00:03

Web and email servers are now back online.

Started 20 Jun 2012 11:54:02
Closed
20 Jun 2012 12:00:09

This problem was caused by some kind of file system problem which we will investigate.


Power outage in Maidenhead data centre - Closed 31 May 2012 10:20:57
Details
29 May 2012 22:23:38

IT looks like a few severs rebooted, so suggests a power problem. Trying to get some more information now.

IT would affect web and email hosted services as well as VOIP.

Update
30 May 2012 00:09:53

It has been confirmed that this was a problem with power. It is being worked on.

There are a few minor problems, like one of our spam checking servers is offline (the other servers should cope), one of our customer facing SMTP relays died (no longer in DNS so not customer affecting) and customised voicemail messages are not working on our VoIP service.

The accounts system web interface is down too (although the accounts system itself is fine).

These are not major problems, and we'll continue to work on them in the morning.

Update
30 May 2012 00:54:11

It looks like there are some ongoing connectivity problems with Palsant's transit feed. They are manifesting themselves as brief outages, but it doesn't look too serious.

We'll post details when we know more.

Update
30 May 2012 08:58:27

Power went off at 22:07:47 and was restored at 22:10:59

Engineers are on site this morning to resolve remaining issues with some servers.

Transit seems to be stable now.

Update
30 May 2012 09:55:53

Most services are working correctly - some issues with voicemail/announcement messages at present, but this is being worked on. One of the outgoing mail servers has not come back to life and will be worked on this morning so that the mail queue can be sent ASAP.

Update
30 May 2012 18:30:13
All services are working again now. The data centre have completed their investigations and have determined that the outage was caused by a faulty UPS component which failed during planned maintenance. The power feed to the data centre is currently running without UPS backup and should be seen as at-risk in the event of a mains power outage. The data centre staff expect to have the UPS back in-line by 22:00 this evening.
Started 29 May 2012 22:07:47 by AAISP automated checking
Closed 31 May 2012 10:20:57

Maidenhead transit planned work - Completed 31 May 2012 10:19:05
Details
25 May 2012 08:59:04

Pulsant are carrying out work on transit in Maidenhead.

This work will affect all services we host in Maidenhead, such as email, web and VOIP.

The work is scheduled for between:

23:00 30/05/2012 and 04:00 31/05/2012

A 10 minute disruption to service is expected.

Started 30 May 2012 23:00:00 by Datacentre
Previously expected 31 May 2012 04:00:00
Closed 31 May 2012 10:19:05

Power blip - Closed 29 May 2012 22:11:08
Details
29 May 2012 22:25:56

Not sure what happened, but both core routers reported a power outage.

We'll try and find what happened.

Started 29 May 2012 22:07:47
Closed 29 May 2012 22:11:08

Intermittent Packet Loss affecting Maidenhead Hosted Services - Closed 16 May 2012 12:00:00
Details
16 May 2012 10:59:36

 

We are are seeing intermittent packet loss affecting services in Maidenhead - this will be most noticeable on VoIP calls where they may be short periods of silence during a call.

The loss is sporadic, it lasts a few seconds and there are long periods of time where the loss is not present, as yet there is no clear pattern to pinpoint the cause -a tricky one to track down!

We are investigating and working with the Datacentre to fix this.

 

Started 16 May 2012 09:00:00
Closed
16 May 2012 12:00:00

We had put in a temporary fix whilst we get to the bottom of the problem.


Web & email storage problem - Closed 04 May 2012 21:15:57
Details
04 May 2012 19:00:58

There's a problem with web and email disk storage in Maidenhead.

We're investigating.

Update
04 May 2012 19:16:08

Now back. We're investigating the cause of this.

Closed 04 May 2012 21:15:57

Routing issue - Closed 21 Apr 2012 12:05:00
Details
21 Apr 2012 12:06:35

There was a rather odd issue with routing in Maindenhead affecting several of our servers there. We think this may be an issue with a switch, but we have managed to get things working again. There may be some planned maintenance as a result later.

Started 21 Apr 2012 12:00:00
Closed 21 Apr 2012 12:05:00

Email and Web Server Disk Storage Change - Completed 29 Mar 2012 20:12:03
Details
22 Mar 2012 16:35:18

Email and web servers will have about 30 minutes of down time as we move the file server that they use.

During this time access to email and web pages will not be available.

Date: Thursday 29th
Starting time: 8pm
Reason: Moving to secondary file server to allow for software updates 

Update
29 Mar 2012 19:52:27

This work will be starting shortly.

Update
29 Mar 2012 20:00:33

Web and incoming email (pop3/imap) servers will be offline for about 30 minutes form now.

Update
29 Mar 2012 20:08:51

Our webserver is now back online.

Update
29 Mar 2012 20:11:29

Email services are back up now.

Started 29 Mar 2012 20:00:00 by AAISP Staff
Previously expected 29 Mar 2012 21:00:00
Closed
29 Mar 2012 20:12:03

The move has been completed as planned, services were restored within about 11 minutes.


Maidenhead Datacentre Packetloss - Affecting VoIP/Email/Web Server - Closed 27 Mar 2012 15:10:25
Details
27 Mar 2012 14:54:33

There is a network problem in our Maidenhead datacentre at the moment, this will be affecting some of our services, mainly VoIP, email (sending and receiving) our web servers and hosted servers.

VoIP customers will be most affected and may have some calls with audio problems (breaking up)

 

Update
27 Mar 2012 14:55:10

We are tracking down the source of the problem

Update
27 Mar 2012 15:04:17

We have tracked the source and are in the process of shutting down the compromised customer machine.

Update
27 Mar 2012 15:12:16

The compromised machines have been shutdown.

Started 27 Mar 2012 14:22:47
Closed
27 Mar 2012 15:10:25

Compromised customer servers have been shut down.


Email and Web Server Problems - Closed 22 Mar 2012 16:30:00
Details
22 Mar 2012 16:13:47

Our main disk storage server is having problems at the moment, this is affecting email and web server storage.

Engineers are investigating.

Update
22 Mar 2012 16:19:44

One of our disk server is being rebooted.

Update
22 Mar 2012 16:20:29

POP/IMAP services back

WWW Service back

Update
22 Mar 2012 16:24:10

We believe we know what caused this. We will do planned work to apply an update.

Update
27 Mar 2012 10:18:24

Please see http://status.aa.net.uk/apost.cgi?incident=1450 for planned works

Started 22 Mar 2012 16:10:00
Closed 22 Mar 2012 16:30:00

UPS Work (No user impact) - Completed 16 Jan 2012 23:59:00
Details
09 Jan 2012 12:32:21

We have been informed by the Maidenhead datacentre that they have some planned work on their UPS systems.

 

They will be carrying out a replacement exercise on end-of-life capacitors in the UPS.

The unit will be taken off-line for the duration of the work however due to the configuration of the installation there will be no impact to the electrical supply or resilience to equipment 

 

The work will be carried out on Monday 16th January, starting at 21:00 and should last 5 hours.

This work is factory advised and forms part of their robust Planned and Preventative maintenance regime. 

Started 16 Jan 2012 21:00:00
Closed 16 Jan 2012 23:59:00

Possible Routing Problem - Closed 09 Feb 2012 14:57:09
Details
09 Feb 2012 14:49:23

We're investigating reports of strange routing problems involvng access to/from our Maidenhead data centre

Started 09 Feb 2012 14:35:00
Closed
09 Feb 2012 14:57:09
Post moved over to http://status.aa.nu/apost.cgi?incident=1394

Problem in Maidenhead - affecting VoIp/ethernet/email/etc - Closed 26 Jan 2012 13:10:00
Details
24 Jan 2012 21:16:14

We are seeing a major issue in Midenhead - high levels of transit packet loss that will be affecting VoIP, email, and Ethernet customers (as well out our offices).

Update
24 Jan 2012 21:47:11

Looking like some sort of denial of service attack.

Update
24 Jan 2012 22:34:04

Sorry for the delay posting more details. This affects our links directly and making it difficult. The problem appears to be a huge denial of service attack invovling tens of thousands of sessions and filling gigabit links.

We have identified the target and disconnected it, black holed the target and even tried to divert traffic but to no avail as yet.

We are still working on this.

Update
24 Jan 2012 23:03:38

Just an update to say that this is still being worked on...

Update
24 Jan 2012 23:23:30

Still working on this

Update
24 Jan 2012 23:28:57

The problem is now only affeting Ethernet customers on the same block as the address being DDOS'd. We are still working on the issue. Most other services will be working fine now.

Update
24 Jan 2012 23:57:55

Some side effects on other services from Maidenhead, but we are still working on narrowing down the issue.

Update
25 Jan 2012 00:01:48

DOS attacks are, thankfully, rare. This has to be the biggest we have seen.

We will, of course, be talking to the customer who is being DOSed to fine what could have provoked such a major attack. There is usually a reason.

Update
25 Jan 2012 13:20:48

The blackhole for the target machine was removed at 1pm today, however the traffic was still being sent and affected VoIP, email and Ethernet services.

The block is now in place again, and we'll continue to investigate.

Started 24 Jan 2012 21:00:00
Closed 26 Jan 2012 13:10:00

Maidenhead datacentre problems again - Closed 25 Jan 2012 13:14:06
Details
25 Jan 2012 13:12:18

Similar to last night, access to servers and services in Maidenhead have high packet loss.

 

This will affect email, voip and Ethernet services.

Update to follow shortly.

Update
25 Jan 2012 13:13:19

Datacentre staff are working to blackhole the IP address that is the target of this attack.

Update
25 Jan 2012 13:14:43

The target IP address has been blackhole'd and service has been restored.

Started 25 Jan 2012 13:02:00
Closed
25 Jan 2012 13:14:06

We'll update the initial post from yesterday with further updates to this. http://status.aa.net.uk/apost.cgi?incident=1364


Server Moving - Not Customer Affecting - Completed 19 Jan 2012 13:00:00
Details
18 Jan 2012 14:48:38

We'll be moving some servers around between racks in the Maidenhead datacentre on 19th January. These servers are mainly related to email services, but we are not expecting customers to notice anyting whilst the moves take place.

The servers moving will be:

tertiary-mx.co.uk (the backup email relay)
A couple of the Spam checking servers (there are many, so one being down at a time won't cause a problem)
One of the outgoing SMTP servers (which is offline at the moment)

Started 19 Jan 2012 11:00:00
Closed
19 Jan 2012 13:00:00

This work has been completed.


Web Services - Closed 28 Nov 2011 13:46:02
Details
28 Nov 2011 13:04:06

We have problems with Web and Email services at the moment, please see this post:

http://status.aa.nu/apost.cgi?incident=1298

Closed 28 Nov 2011 13:46:02

Email and Web Server Work - Completed 20 Nov 2011 11:29:06
Details
17 Nov 2011 16:09:45

On Sunday morning we'll be moving over the storage server that the email and web services use. The work should take less than 30 minutes, but as the work is carried out access to email and web sites will be unavailable.

We aim to make this move as quickly as possible.

Update
20 Nov 2011 11:01:03

FTP & RSYNC access to our web server has been stopped. 

The work of moving over the storage servers will begin shortly. We anticipate this to take less than 30 minutes.

Update
20 Nov 2011 11:04:20

The Work is starting now, access to websites we host and email will be unavailable whilst this is carried out.

 

Update
20 Nov 2011 11:23:16

Web pages are now being served again.

Update
20 Nov 2011 11:24:28

Email is back up

Update
20 Nov 2011 11:28:02

FTP and rsync access to our web server is now running.

Update
20 Nov 2011 11:30:38

This work is almost complete, customers should be back on email and their web pages being served again.

 

Started 20 Nov 2011 11:00:00
Previously expected 20 Nov 2011 12:00:00
Closed 20 Nov 2011 11:29:06

Network glitch affecting voice and ethernet - Closed 02 Nov 2011 11:25:49
Details
02 Nov 2011 11:30:52

There appears to have been a severe network glitch affecting both diverse routes out of the Maidenhead data centre. Routing is recovering now, but this would have affected Ethernet customers and VoIP customers the most. Some authentication of DSL lines may have been delayed. Access to our email and web servers and other hosted services would also have been affected.

The incident appears to have lasted a few minutes. We are trying to get more details.

Update
02 Nov 2011 11:49:03

The carriers have confirmed they had an outage and should send an explanation shortly.

Update
02 Nov 2011 21:49:20

Carriers explain the fault as:

The cause of this incident was traced to events on the network which caused high CPU load on the transit routers. This then resulted in router protocol instability which affected transit services.

We have since stabilised the network and are developing solutions to be implemented which should reduce the impact of such events in the future.

Loss of connectivity was detected at 11:24 with service restored by 11:27.

Please accept our apology for any inconvenience caused.

Started 02 Nov 2011 11:24:02
Closed 02 Nov 2011 11:25:49

Moving Servers - Legacy Email, Primary DNS, Wiki - Completed 23 Sep 2011 15:50:12
Details
07 Jun 2011 09:54:26

We'll be moving a couple of servers between datacentres. Their IP addresses will not be changing, it's just a physical move.

The servers are:

A.Hopeless - One of our legacy email servers
-Access to receiving email during this time will not be possible.

Primary-dns.co.uk -  One of our internet facing DNS resolvers
We don't anticipate that service will be affected as the secondary DNS server will be available.

Customer Wiki
Access to wiki.aaisp.org.uk will be unavailable during this time

A couple of other internal use servers will be moved at the same time.

Start Time: 17:30
Duration: 1 hour 

Update
10 Jun 2011 18:15:40

This work is nearly complete - just waiting for various email services to start up.

Started 10 Jun 2011 17:30:00 by AAISP Staff
Closed 23 Sep 2011 15:50:12

Minor blip over night - Closed 08 Sep 2011 03:08:00
Details
08 Sep 2011 03:46:51

Our routers in the Maidenhead data centre had some issues over night. These may have caused a few seconds outage in some services and Ethernet access. Broadband services not affected.

Started 08 Sep 2011 03:02:00
Closed 08 Sep 2011 03:08:00

At risk - router upgrade - Completed 19 Jul 2011 17:38:58
Details
19 Jul 2011 17:25:55

We are again upgrading a router this evening - it should have little or not impact as the system should fall back to the secondary router. There is, as always, a risk.

These upgrades, when complete on both routers (not both done together!) will mean faster fallback in the event of a failure in future as we are upgrading to support VRRP3 with sub second timing.

Update
19 Jul 2011 17:34:25

Arrrg, why is this never simple.

Update
19 Jul 2011 17:39:06

Abandoned for now - maybe later.

Update
19 Jul 2011 18:17:34

All sorted

Started 19 Jul 2011 17:30:00
Previously expected 19 Jul 2011 17:35:00
Closed 19 Jul 2011 17:38:58

Router upgrade (short notice) - Completed 17 Jul 2011 19:08:56
Details
17 Jul 2011 18:45:47

We will be upgrading one of the main routers this evening to latest release. Sorry for short notice. As usual this hsould have little or no impact on services.

Update
17 Jul 2011 19:05:24

Not playing the game quite - so may have a few seconds disruption...

Update
17 Jul 2011 19:09:04

Completed

Started 17 Jul 2011 19:00:00
Previously expected 17 Jul 2011 19:05:00
Closed 17 Jul 2011 19:08:56

Scheduled Power Systems Maintenance 10 March Evening - Completed 10 Mar 2011 23:00:00
Details
09 Mar 2011 11:02:56

Please be aware that testing of the redundant power ATS (Automatic Transfer Switch) equipment at the maidenhead datacentre bill be carried out on Thursday 10th March at 19:00 and conclude by 23:00 on the same day.

Whilst the work taking place is non-intrusive and power redundancy will still be available at all times, customers should treat this as an at risk period.

Started 10 Mar 2011 19:00:00 by Datacentre
Closed 10 Mar 2011 23:00:00

Incident in maidenhead - Closed 18 Mar 2011 11:54:30
Details
17 Mar 2011 10:22:00

We have lost comms with Maidenhead and we have an engineer going to site now, we are not sure what the issue is but it may be power related.

 

Email, VOIP and some others services will be affected.

 

This is also affectig Ethernet customers and hosted servers in Maidenhaed

 

There appears to haver been a fire alarm that has gone off and data center has been evacuated. No evidence of a fire though but power is down

Update
17 Mar 2011 10:25:11

Staff are just approaching the data centre now.

Update
17 Mar 2011 10:37:59

Power is being restored now

Update
17 Mar 2011 10:49:15

Our engineers are on site and power has been restored, servers of ours are coming back on line, further updates will be posted when we get them

Update
17 Mar 2011 10:56:15

Not all power has been resotred yet. Some services (control pages, VOIP, web) are still down. They should be restored shortly.

Update
17 Mar 2011 11:13:10

VoIP and control pages are back. Email and web should be back soon.

Update
17 Mar 2011 11:22:00

The A viop server is still down.

Update
17 Mar 2011 11:56:02

Email servers are mostly back, and web services are back. We've still got some voip problems and are working on it.

Update
17 Mar 2011 11:58:56

The A voip server has a database problem, and won't let customers register.

Update
17 Mar 2011 12:02:12

There is now a database problem on C SIP server too. Investigating.

Update
17 Mar 2011 12:08:07

Database fixed on C.

Update
17 Mar 2011 12:21:07

Database problems fixed on A and C servers.

Update
17 Mar 2011 15:44:43

Most services are back up now, we have had a number of hardware fail as part of the power outage incident. 

Currently the main problem is our email ticketing server - this is affecting emails to support/sales/accounts etc - and so is causing a delay in email replies.

There are also problems with:

The online ordering system
ADSL usage reporting
ADSL line status on Clueless

Other servers still have problems which we are working through, but other servers are managing with the load (may services have multiple servers).

Update
17 Mar 2011 17:15:23

The odd effect with lines not showing as on-line properly on clueless is fixed, and lines will clear properly over night as a result. PPP restarts of lines are needed but this is done automatically in stages to minimise disruption.

Update
17 Mar 2011 17:15:37

On-line ordering restored a little while ago.

Update
17 Mar 2011 17:18:23

I would just like to say that I am very pleased with how my staff have handled this today - tackling the issues in a sensible priority and updating status pages. This is a major issue with not just a power outage, but issues with access to the building, and possibly even a power surges as several pieces of equipment have failed totally. The backup arrangements for critical systems have worked as expected as has the maintenance of broadband internet access, DNS, and RADIUS authentication. Well done everyone. We'll try and get a more detailed explanation from the data centre in due course. Staff are working on the last of the issues now.

Update
17 Mar 2011 18:38:16

thankless (ticketing) still down and being rebuilt now.

Update
18 Mar 2011 00:46:47

We have now got our email ticketing system back online - we do apologise for the time this has taken, and the delay this has caused to email to support, sales and accounts.

Update
18 Mar 2011 11:55:08

We'll close this incident for now - but will add the official response fron BlueSquare when they have let us know.

Update
21 Mar 2011 11:50:27

This is the official report from BlueSquare (Our racks are in the building called BS2)

 

This is a Reason for Outage Report with details regarding the power supply in BS2/3 with BlueSquare Data Services Ltd.

 

At 10:06 on Thursday 17th March one of the six UPS modules located in BlueSquare 2/3 suffered a critical component failure which resulted in a dead short on the output side (critical load side) of the UPS. This failure also caused an amount of smoke to be released by the failed UPS system which resulted in the fire alarm activating and the fire service attending. Once the fire service was happy with the situation we were able to restore power to the site via the generators with the UPS system bypassed whilst we investigated the fault further.

 

Due to the short circuit occurring on the output side of the UPS this meant the other UPS’s immediately went into an overload condition which then switched all modules into bypass mode, as per the design of the system. This overload then transferred to the raw mains and tripped the main incomer to the site. This caused the overload condition to cease and power was lost to the site. The UPS manufactures then worked to check all the remaining UPS modules to ensure the same component was within specification, and to fully test each UPS system, replacing some components where necessary. No further faults were found on the remaining UPS modules, and load was then switched back to full UPS protection at approx 02:15 and building load was transferred back from the generators to utility mains at approx 02:25.

 

Due to the size of the failure we have commissioned an independent organisation to forensically examine the failed UPS module. This work is scheduled to be completed next week and we will provide further details once we receive their report. This was an extremely unusual type of failure and the manufactures have not experienced such a problem before, despite over 3,000 similar UPS units being deployed. This suggests there isn’t an inherent design problem in the units but we will not reach any conclusions until the forensic examination is complete.

 

The failed UPS module will be replaced within the next 4 weeks and until that time we will remain on ‘N’ redundancy level at BlueSquare 2 & 3. Further updates will be provided before this replacement work takes place.

 

A number of customers have asked as to why this failure could occur when we operate an N+1 UPS architecture. The reason for this is that all of the six UPS modules in BlueSquare 2/3 are paralleled together as one large UPS system. BlueSquare 2/3 only requires 5 modules to hold the critical load to the site, however we have an additional unit which provides the redundancy in the event of a UPS module failure. However, as this failure was on the common critical load side of the UPS (the same output that feeds the distribution boards which then in turn feed the racks) and all the UPS systems are paralleled together, this had the effect of causing all UPS modules to go down.

 

As an example, in a N+N configuration, such as in our Tier IV Milton Keynes site, a failure of this nature would not be possible as two banks of independent UPS systems operate providing true A&B feeds to each rack.

Started 17 Mar 2011 10:00:20
Previously expected 17 Mar 2011 11:20:20
Closed 18 Mar 2011 11:54:30

Urgent router maintenance - Completed 11 Mar 2011 17:04:54
Details
11 Mar 2011 16:31:37

We are expecting to do some work on routers in Maidenhead. This should have minimal impact as we can work on one at a time.

Sorry for the very short notice.

Update
11 Mar 2011 17:28:54

We are actually doing another restart now.

Started 11 Mar 2011 17:00:00
Closed 11 Mar 2011 17:04:54

Web Server Disk Storage Work - Completed 30 Jan 2011 22:54:32
Details
26 Jan 2011 11:08:57

We will be doing some work on the storage servers that are used by our  Web Server on Sunday. This will mean a couple of hours where FTP and RSYNC access will be disabled.

Update
30 Jan 2011 21:26:13

This work is currently being started. FTP and rsync access to our webserver will be unavalable for the next couple of hours.

Previously expected 30 Jan 2011 23:00:00 (Last Estimated Resolution Time from AAISP)
Closed
30 Jan 2011 22:54:32

This work has been completed. The webserver is now using new storage severs.

Please report any problems to support.


VoIP and Email Problems due to Datacentre Connectivity - Closed 29 Dec 2010 12:43:00
Details
29 Dec 2010 12:35:23

We currently have routing problems to our datacentre in Maidenhead, this will be affecting access to:

  • Email - incoming and outgoing
  • VoIP
  • Hosted server
  • Control Pages (Clueless)

We have engineers looking in to this at the moment, and will post anohter update shortly.

Update
29 Dec 2010 12:47:47

This is now working. It seems to be some routing/peering problem outside of our and BlueSquare's network - If we get any more details we'll post an update.

Started 29 Dec 2010 12:20:00
Closed 29 Dec 2010 12:43:00

Router restart - at risk - Completed 11 Oct 2010 17:11:51
Details
11 Oct 2010 11:52:45

We are restarting one of the routers in Maidenhead this evening. It should be seamless in terms of routing though we have seen blips on IPv6 when doing this in the past so there is a risk of disruption for a few seconds.

Maindenhead handles ethernet customers, our offices, and access to many of our servers including VoIP for call set up and recorded calls.

Closed 11 Oct 2010 17:11:51

Switch Reboot - Completed 02 Mar 2010 19:10:00
Details
02 Mar 2010 19:05:36

We are just about to reboot a switch in our Maidenhead datacentre. This will affect a few hosted customer servers and some AAISP servers. It's expected to only mean a few minutes of downtime. Sorry for the short notice.

Started 02 Mar 2010 19:06:07
Previously expected 02 Mar 2010 19:10:00
Closed 02 Mar 2010 19:10:00

Switch problem in Maidenhead, affecting some Hosted customers - Closed 02 Mar 2010 21:21:52
Details
02 Mar 2010 20:56:44

Not the switch that we rebooted earlier this evening, but another switch is currently being rebooted. This was not planned and not related to work we've been doing this evening with email.

This will affect some hosted customers for a few minutes, we do apologise.

Update
02 Mar 2010 21:02:49

this will also affect outgoing mail (sending mail via smtp.aaisp.net.uk)

It's expected to be resolved in a few minutes though

Closed
02 Mar 2010 21:21:52

This took rather longer than expected, but has now been done and servers are now visible again.


Loss of interconnect routing - Closed 15 Feb 2010 20:15:55
Details
15 Feb 2010 19:50:00

Some minor router reconfiguration work this evening to add additional routing resilience resulted in an unexpected side effect that took a few minutes to rectify.

This affected specifically the routing between London and Maidenhead meaning broadband customers lost access to email and VoIP, and Ethernet customers lost access to A&A servers in London and broadband lines.

Started 15 Feb 2010 19:50:00 by AAISP Staff
Closed
15 Feb 2010 20:15:55
Routing configuration has been corrected.