User Story University Graz
The verification of peak loads is important for the long-term quality assurance of the network
With more than one hundred institutes and organisational units, the University of Technology, Graz, has an extensive and heterogeneous network. More than 15,000 students, more than 3,000 employees and guests from all over the world expect a stable network and fast Internet connections. A worldwide information exchange via the Internet around the clock is essential, especially in the field of research. The responsibility for maintaining a smooth network belongs to the team of Philipp Rammer, Service Owner Network at the Central Informatics Service (ZID) of Graz University of Technology. He explained how to handle such a task and why he used the Allegro Network Multimeter for continuous quality assurance and fast error debugging.
The computer centre of a university is comparable to the nervous system in the human body, because all networks and data converge here. Smooth operation is therefore important because a failure directly in the data centre would paralyse large parts of the university. All in all, almost every conceivable application is in use, regardless of whether it is a variety of file and web services, video conferences, VoIP, license servers or large backups.
Until quite recently, network owners used multiple redundant tools to keep the network running, requiring a great deal of maintenance and additional applications, servers and databases. So the management team looked for an alternative and tested several products. One of them was an Allegro Network Multimeter 1200.
"Even with this device, which is quite small for a university data centre, it quickly became clear that the tool exactly met our requirements," Philipp Rammer explained in the first test phase.
"It allows us to keep track of and identify unexpected traffic. Peak loads can be examined in a granular manner and errors detected before they become a problem. If an issue does occur, the relevant subscribers can be identified very quickly and easily and connections can be analysed via a packet capture. The test run with the Allegro 1200 convinced us to use the appliance to monitor the entire data centre."
Recently, the bigger brother Allegro 3500 has been running continuously in the data centre. It was optimised to analyse and monitor Gigabit connections in the data centre. The Allegro system specialises in high recording, analysis and storage rates and has a throughput of up to 100 GBit/s. It is used by IT staff for troubleshooting if problems arise and for long-term network optimisation. The data is stored for a short time and on demand - usually only the header data. When the memory is full, the data is overwritten.
According to Philipp Rammer, the installation of the Allegro 3500 in the computer centre of Graz University of Technology was simple and problem-free. "We put the Allegro 3500 into operation in just a few minutes. Using the WLAN access point, the initial basic configuration can be carried out directly without further cabling. Otherwise, an IP is automatically assigned via a connected LAN cable, so that the basic configuration can be easily carried out. Part of the operating concept is that only a few settings have to be made and the measuring instrument can be used directly in the basic settings.”
The web interface dashboard displays the most important parameters at a glance including the most active IP and MAC addresses, the most bandwidth-intensive connections and protocols. The menu is similar to the OSI Layer model, so the individual analysis modules are easy to find. The Allegro Network Multimeter provides real-time statistics and selective packet filtering across Layers 2 through 7 in real-time and history mode.
Fast Problem Diagnosis
Philipp Rammer described how he discovered the first conspicuous parameters after a few minutes: "After just ten minutes, we found the first faulty clients via the real-time network statistics of the analysis tool that had affected our DHCP server. This was immediately apparent from a glance at the protocol. For example, at the start of commissioning, we looked out of interest through the various Quality of Service classes and discovered a large 10 Mbps connection, which is very high for network-critical DHCP traffic."
So, the first weak spot was quickly identified. Basically, a client that sends 10 MBit/s DHCP traffic through the network does not normally present a problem - provided the server can withstand the traffic. But is unusual. It could develop into a problem over time if the DHCP server fails and the IP address assignment to the end devices no longer works. With the help of modern analysis tools, such a vulnerability can be detected and averted before a problem arises. Early detection prevents possible downtime. In this case, the problem where the client was identified and correctly configured was resolved.
Early Detection of Possible Problems
Another use case concerns a conspicuous amount of data that was noticed during daily monitoring. This was an unusually large 1.3 Gbit/s connection in the Internet downlink for a computer. In the Allegro graphical statistics display, this was immediately noticed as a large outlier from the normal network load. Such incidents can be indications of security problems or misconfigurations.
"With one click we navigated from the dashboard to the peers and immediately recognised between which two systems the traffic took place", Philipp Rammer explained. "A phone call to our colleagues was enough to assure us that the traffic was planned. So there was no problem, but it could have been. The knowledge and analysis of such load peaks is valuable for the long-term quality assurance of our network."
No global overview of network utilisation is sufficient to identify such problems. The Allegro Network Multimeter can resolve the load for individual network participants in real-time and, together with other tools, provide valuable information about traffic anomalies in the future.
The Allegro 3500 was installed in the data centre, between the data centre router and the Application Delivery Controller (ADC) or firewall, so that in addition to client-server connections, data between servers can also be measured and debugged. This is where the third application case occurred. A service provided via a HAProxy/Reverse Proxy on the ADC showed some inexplicable behaviour between several clients and the server. For the analysis, 16 packets were filtered out of 10 TB of data recorded at short notice to provide information about the malfunction. This immediately revealed that the TCP port on the ADC to which the data was routed was incorrectly configured.
"Once we had identified the problem, the solution was very simple," Philipp Rammer described the quick troubleshooting with the Allegro Network Multimeter. "Due to the high granularity of the measurements, errors can be diagnosed extremely quickly, in this case it only took 2 minutes.”
Adapting Data Volumes to Requirements
At the TU Graz there is a large traffic volume. Although the Allegro 3500 can record large amounts of data for live analysis and subsequent troubleshooting, there are several reasons why it does not make sense to record all traffic. In the first step, Philipp Rammer and his team adapted the amount of data generated during the recording with the help of filters and reduced it to the really important parameters. The Allegro Network Multimeter can be controlled easily and with fine granularity. For example, they cut the packet lengths, excluded some VLANs, adjusted the RAM cache for short load peaks, etc., until a configuration was reached that stored considerably less data and still provided all relevant information.
In the second step, they equipped the Allegro 3500, which can be sold with or without hard disks, with additional hard disks. For this purpose, commercially available hard disks are suitable, which are often in stock in many enterprises. The Allegro 3500 can support up to 36 hard disks.
Advantages for TU Graz at a glance
- Proactive error detection
- Continuous overview of the large data centre
- Long-term quality assurance of the network
- Individual configuration
It's Not Always the Network's Fault
Philipp Rammer is extremely satisfied with the Allegro 3500. So far there have not been any serious network problems. Nevertheless, it has already become an important tool for administrators. With its help, problems with the network can be found, but also excluded, if necessary, stated Philipp Rammer. "We find Response Time Charts a great help, which inform exactly whether a problem really exists in our own network or perhaps in other external locations. If TCP statistics for the last three hours show a handshake time of 20 or less milliseconds, this is a helpful indication of whether a network problem might exist or not, e.g. because something is going wrong during data processing at operating system or at the application level. This makes it much easier for both the network and application operating teams to locate the error by quickly isolating the error domain.“
For such cases, the Allegro Network Multimeter offers the TCP Zero Window analysis module. A client sends a large quantity of data, the server receives it and confirms its reception. However, the higher-level application cannot process this data. The server then reports a TCP zero window. This means that the network is working properly, but the endpoint is not able to process the data.
A good example of this is a backup: The computer sends the backup at 1 GBit/s, but the server cannot accept it so quickly. The TCP receive buffer eventually fills to capacity and finally the server's network driver reports that it has no more room to receive data because the parent service is too slow. Such a scenario is detected by the Allegro Network Multimeter. The measurements show that it is not a network problem, but that the terminal is at its performance limit.
All in all, the Allegro Network Multimeter makes life much easier for the IT managers at Graz University of Technology, as Philipp Rammer confirmed: "The great thing is that it is so uncomplicated despite its powerful performance. We like the fact that it uses a web interface without additional local applications, servers, databases and other things that need to be maintained. That was an essential selling point for us, because it's new. Before using the Allegro Network Multimeter, network analysis was very tedious."
(Editor's note: Our thanks go to Philipp Rammer for his commitment in creating this user story and to Schoeller network control Datenverarbeitung GmbH, who initiated the cooperation.)