A short power outage triggered prolonged network downtime today when a system enabling a smooth transition to backup power failed to cope.
E-mail, campus telephones, wi-fi, and Bowdoin websites were all offline for several hours — from around 10:30 a.m. to, in the case of wi-fi, 3:30 p.m.
The uninterruptible power supply (UPS) system in the Hubbard Hall data center had been known to be in need of new batteries for about a week. Parts were ordered and shipped, and the replacement was scheduled for this morning. People were onsite to replace the batteries when, around 10:30 a.m., a fallen tree branch took out a power line on Maine Street.
"If that power blip had happened one hour later, the batteries would've been replaced and there wouldn't have been a problem," said Jason Lavoie, director of networking and telecommunications; it was a "really bad coincidence."
"It really identified a single point of failure for the college infrastructure," said Lavoie.
Since power returned, services have returned piecemeal. "Part of the reason the outage is longer is we don't really ever have an opportunity to test everything failing and everything coming back up" all at once, said Lavoie.
The guest wi-fi network returned faster than the standard authenticated wi-fi. Phone service returned around noon.
The fallen tree limb cut power to 2,075 customers of Central Maine Power (CMP), the Times Record reported, including the South Loop of campus. Ordinarily, the data center in the basement of Hubbard Hall would switch over smoothly to the North Loop; if that failed, the data center is supposed to go off the CMP grid and onto a backup generator.
"You want to increase availability by eliminating all single points of failure," said Lavoie. "We've got very redundant power," which allows network systems to continue working throughout most power outages. But, "the problem is that power distribution and power delivery is connected through a single UPS."
Lavoie said having redundant UPS systems is something that is possible and has been considered in the past.
Base site will alleviate some problems
On a more "macro" scale, Lavoie noted another single point of failure: the data center itself. But that's in the process of being remedied by establishing a data center on the site of the former Naval Air Station.
"To a certain degree, the base data center will help address that problem," said Lavoie.
Financial auditors have referred to the Hubbard Hall data center as "the best possible design in the worst possible location," said Steve Blanc, director of client services and IT security officer, who is managing the base data center project. The Hubbard basement is susceptible to flooding and other physical interruptions.
The site on the base, in contrast, is considerably more hardened, as befits a facility formerly run by the Navy.
"It's quite a bunker," said Lavoie.
Emergency notification system at partial strength
Students were notified of the outage by text message and automated recording by the emergency notification system around 11:15 a.m. The alert system, called Blackboard Connect, continued to work because it's run and hosted by Blackboard, not Bowdoin.
"It's a hosted system for this reason," said Tina Finneran, director of academic technology and consulting.
However, as the campus VoIP phones are dependent on the network, the emergency public address system was unusable during the outage. The green-striped phones, which use the regular POTS network, continued to work.