Students experience record internet outage
September 10, 2015
Several students may have experienced a state of panic as they quickly discovered their Blackboard quiz was no longer responsive. Students and faculty encountered issues Tuesday night as a network outage affected WKU’s campus.
Chief Information Technology Officer Gordon Johnson sent an email to faculty and staff Wednesday morning to explain what had happened.
According to the email, the network outage began at approximately 9:30 p.m. Tuesday and lasted until 2 a.m. the following day.
“Our primary and secondary Domain Name Servers (DNS) experienced an unusual condition simultaneously that brought them down,” Johnson said in the email.
Johnson went on to explain that while the rest of network infrastructure was still operational, without DNS to translate domain names to IP addresses of servers and other components, no network traffic could be routed.
“So the entire network was essentially down and unreachable both from on campus and off campus,” he said.
He said the condition that brought the DNS down was so obscure it needed to be escalated to the top engineering level of the vendor who supplies WKU’s DNS appliances—Infoblox. WKU brought in Infoblox DNS over a year ago for their industrial strength and built-in redundancies.
“They can handle lots of traffic without any degradation in performance,” Johnson said.
The problem that occurred, Johnson stressed, was statistically rare. The redundancies of the two systems meant that if one suffered a physical attack and went offline, the second would take over without incident.
This particular issue was a combination of an internal software problem and a burst of outside traffic that caused the systems to behave erratically; both DNS went down at the same time. IT technicians are still investigating exactly what caused the breakdown.
They were able to resolve the incident with the joint effort of Infoblox and WKU’s own IT department.
“Basically, the way we fixed it was with the help of Infoblox. They identified configuration settings whose results were creating problems, and based on their recommendation, we changed those, and this seemed to pretty largely fix the problem,” Johnson said.
Johnson said this incident was one of the largest in over a decade.
“I’ve been here 26-plus years, and I do not remember an outage that long over the past 10 years,” he said.
He added that technicians are working to ensure sure such an event doesn’t happen again.
“We’re spending whatever amount of time it takes to mitigate this and prevent it from happening again,” Johnson said.