Exec reveal's Telecom's 'remedial action'; 38 of 54 affected sites still down
In an email to corporate clients around 11.30pm last night, Gen-i boss Chris Quin went into an unprecedented degree of detail about fixes carried out so far - including attempts to manage "registration storms" as base stations are restarted.
Mr Quin said 38 cellsites were still down, including locations in Wellington, Dunedin and Queenstown (see full list end of article) but that technicians would work thought the night as the XT outage crisis enters its third day.
Of Telecom’s 986 XT cellsites, up to 54 were down at the peak of the outage that started Wednesday morning.
In seems in terms of population coverage, Telecom has made a lot of progress - but is still less than half way through restoring actual cell sites.
Mr Quin confirms that a fault with Telecom’s Mobile XT RNC switch in Christchurch (which controls all network Taupo south) was the cause of the outage.
A second RNC switch controls all XT traffic north of Taupo.
But the Gen-i boss disputes the theory, spread by Vodafone’s allies, that Vodafone is better positioned with its six RNC switches:
“We have seen a large number of clients move to XT in a relatively short space of time. However, more RNCs alone do not deliver greater redundancy – each RNC is equipped with its own hardware redundancy but more RNC’s will increase resiliency as we spread the load and that’s our plan.”
Mr Quin also reveals that staff at Alcatel Lucent - the telco infrastructure company that designed and built XT - have been involved at the highest levels - including the Franco-American company’s global chief executive.
The Gen-i boss also says he will host a webcast explaining issues at midday today. The webcast will be invitation-only for the large XT customers (thought to number 300 or so) receiving his email updates.
Mr Quin's latest update, in full:
Latest XT Mobile Network Situation Update
This is a further email to update you on the restoration of services for XT mobile users between Taupo and the bottom of the South Island in the last two days, and an update on the actions we are taking.
I am conscious that to date we have been giving you updates on the status of the outage, and I now want to start explaining the issues and to set out some of our commitments going forward.
To give an overview, we have a total of 986 cell sites nationwide, of which 453 are linked to our Christchurch XT Mobile RNC switch.
Since 10.30am yesterday (27 January), at times up to 54 cell sites have been out of service, and XT users will not have had access to voice, SMS or mobile broadband services. At the time of writing this update (approx 11.30pm, 28 January) we had successfully reduced the number of sites out of service to 38 and further work continues through the night which we will update in the morning.
The full list of impacted cell sites is included below. They are a small set of cell sites in the area from Taupo south, the majority in the lower South Island, including Dunedin, Invercargill, Timaru and Queenstown, as well as some parts of Wellington central, Taranaki and Ruapehu.
While the remaining 94% of our cell sites nationally have not been affected, traffic loading on the network may mean that a percentage of calls in the Taupo south area may not get through.
Our average statistics for call accessibility in the southern region and retention in the area have been above 90% in the last while.
Our absolute focus right now remains on restoring service in all these locations, and ensuring that restoration is sustained and that we deliver a stable and world class XT network as we know we can.
The intense level of focus and monitoring from the last few days will continue. We will continue our optimisation of XT to ensure the coverage, speed and capability is world class.
The root cause is now under extensive investigation, but is suspected to be within the physical and logical paths in the transport layer between the cell site and the Christchurch switch. This is a different issue to the one that caused a disruption of services in December. The technical teams are working through a range of possible causes for the failure.
The remedial action over the last two days has included:
- Rewrite of the MIB which requires an RNC reset to install.
- A reset of the UTRAN router to clear a path.
- Reset an interface.
- Lockout of the base stations connected to suspect links.
- Managing the registration “storms” resulting from mobile re-starts from these base stations.
Further work that will occur overnight includes analysis of traces on the suspect links to clear them so base stations can be unlocked and returned to service.
We cannot currently accurately forecast a time of full and final restore.
I wanted to comment on the questions I have received about the redundancy of the network and issues like the number of “RNC’s”.
Design of resiliency and redundancy of a mobile network is not only to do with RNC’s. In fact within the UTRAN there are several factors including the number of RNC’s, routers, TMU’s, links as well as the number of users and type of devices in use.
We have always expected we would increase the capacity of the XT network as we add more clients on the XT network, and we have seen a large number of clients move to XT in a relatively short space of time. However, more RNCs alone do not deliver greater redundancy – each RNC is equipped with its own hardware redundancy but more RNC’s will increase resiliency as we spread the load and that’s our plan.
The Telecom Tier 2 and 3 teams are currently heavily engaged with Alcatel-Lucent, including “24/7” support capability from China, Singapore, Germany, Netherlands, USA, Canada, and France. In addition, Alcatel-Lucent management at the highest level from around the world are engaged on this issue locally and in France, including the CEO of Alcatel-Lucent.
Our Gen-i client teams are continuing to work directly with affected clients today to support their business needs. Our immediate focus is on service restoration and we will look closely at claims for compensation once the service is restored.
We will also continue the open and honest communication on the event and our actions as we go forward. Tomorrow (Friday 29th January) at 12.30pm I will be leading a webcast for all our clients, and invitations to join me will be sent to you in the morning. I hope to see you there, and answer any questions you may have, and provide the latest update personally.
I am continuing to work with my Telecom Exec colleagues and our clients to resolve the issues and deliver the actions going forward. Once again, we apologise and deeply regret any inconvenience or problems you are experiencing due to the interruption of the XT service.
Alongside a design review by the R & D teams, Paul Reynolds has commissioned an urgent and independent review to ensure we have taken all steps to assure our XT client experience. Any feedback you have provided to me this week will be collated and added to that review.
In addition, we will be seeking your input to understand what further requirements will best meet your needs on the XT network. We will openly share the recommendations of the review with you.
Finally, I have been keen to communicate to all our clients and all New Zealanders on this issue, including those that are not using the XT mobile service. While the impact is not direct for other network users, a key part of why we update executive level clients whenever we have issues like this is to ensure that people trying to contact affected XT clients are aware of the issue. And it’s important to us that this issue is communicated directly by us.
We will continue to equip our client teams with information on the event and future actions. If you want to discuss this further with me, or any of the Gen-i team, please let us know.
Affected sties, as of 11.30pm last night:
Clyde Cell Site
Invercargill State Ins.