The Bowdoin wireless network was down last Thursday evening until early the following morning. The outage was the second of the week, but it was much longer than the first incident on Monday.
Chief Information Officer Mitch Davis said the root of the problem was the failure of a server in the basement of Hubbard Hall. He explained that the server failure revealed multiple other errors, causing the wireless system to go offline.
Davis and Director of Networking and Telecommunications Jason Lavoie explained that they are in the process of making a number of changes to the network to address the failures.
Bigger projects include upgrading the network and server hardware on campus, as well as moving infrastructure out of the basement of 111-year-old Hubbard Hall into a new commercial data center.
This new $6 million data center, owned by Oxford Networks, opened in mid-September and is located in an old communications building at the former Naval Air Station at Brunswick.
Bowdoin currently operates 10 server racks at Oxford Networks’ facility and 5 server racks in Hubbard. The College has established a direct 100-gigabyte fiber-optic connection to the facility.
Because the servers are currently operating on both old and new hardware, there is a greater potential for network issues.
Last Thursday, one of two Dynamic Host Configuration Protocol (DHCP) servers running on old hardware failed completely. DHCP servers are essential to Internet access because they provide each device with an IP address.
“When your computer boots up, it doesn’t know what its address is on the network,” said Lavoie. “So it will associate to a wireless network and the first thing it does is send out a broadcast saying, ‘Can I have an IP address please?’ It can’t do anything until that process happens.”
Normally, because Bowdoin operates two DHCP servers, one server failing does not cause service to go down—the other server simply takes over.
In this case, the second server did take over but, according to Lavoie, “there was a problem with the path between [the second server] and the wireless controller that prevented all of the requests from getting back to the clients. That problem was being masked by having two servers.”
Lavoie explained that these systems are always designed redundantly to account for such failures.
“With most system failures, it’s never one small thing that fails, it’s always a cascading failure. It’s usually five to six things before you actually have a problem,” he said.
Last Thursday’s outage revealed a configuration problem that was created during some of the recent hardware upgrades.
The failure did, however, create an opportunity to fix an error that may have been causing log on delays, and pushed the network team to install new hardware earlier than it had intended.
“In some sense the failure allowed us to solve a lot of problems. It created a disruption that we would have never caused ourselves, so we could see it and fix it,” said Davis. “Everything that was old is gone.”
Davis and Lavoie both stressed the difficulties of upgrading a network that they cannot turn off. It’s like “changing the tires on a car that’s going down the highway at 65 miles per hour,” said Lavoie.
“If we would have had the time, we would have been able to shut the system down and we wouldn’t have had that problem,” added Davis. “But that isn’t the nature of the game.”
Many students were irritated while the problem persisted, but were satisfied once service resumed.
“I think they handled it as well as they could have,” said Logan Simon ’18. “It’s not the end of the world. It’s inconvenient, but it all got fixed eventually.”
Davis said he understands students’ frustration and aims to provide reliable service.
“We built this so that it can be dependable. We’ve had some problems. We’ve been busting our ass to try to get it right. We’ve created a complexity that made it very difficult for us to determine what was wrong,” he said. “I believe we have it right now.”
Information Technology will be further upgrading network infrastructure in January while the student body is off campus.