Technical article in LANLine September 2019

Quality Fluctuations in Networks

2019-11-22

A small error can quickly lead to performance fluctuations in a network and affect application delivery. Accurate measuring and analysis tools help to detect the causes of such quality fluctuations, enabling network administrators to prevent them.

A juddering video at home can spoil the evening, a frozen input screen at the health insurance service counter annoys customer and consultant, the lame robot arm causes productivity losses. Whether at home, in the office or in the factory, a trouble-free network is the holy grail of every network administrator. The goal is to manage a network in such a way that isolated errors do not create major issues. This requires an understanding of the causes of quality fluctuations as well as deploying suitable measurement equipment.

Fluctuating network problems can be diverse. At the packet level, there are three main issues; packets are either lost, delayed, or arrive with irregularities known as jitter, especially in a Voice over IP (VoIP) connection. There is no guarantee in VoIP connections when packets will arrive. since it relies on the UDP/IP protocol.

Two network protocols are fundamental: TCP/IP and UDP/IP. TCP/IP is used to guarantee error-free data delivery. Packet loss due to overload or poor connections will not result in data loss but the user will experience noticeable retransmission delays.

UDP/IP on the other hand, a ‘best effort’ protocol is used when the loss of a packet or two is not critical. In audio, VoIP, online gaming, and streaming content such as YouTube; such packet losses do not require retransmission. Instead, service quality drops resulting in poor voice calls, pixel freeze and video frame losses. Network problems can therefore lead to different user service quality depending on the protocol used.

Problems due to variable bandwidths

Variable bandwidths can cause quality fluctuations. The bandwidth required for end devices is now so high that it is easy for individual users to max out their Internet access. For example, one user may utilise an Internet downlink at 1 GBit/s in their internal network. A second user in the same network may only receive a maximum of 50 percent of the Internet bandwidth due to poor resource management.

With Wi-Fi 5, a wireless LAN (WLAN) can achieve data rates similar to a LAN cable. Mobile devices can generate high bandwidth at short notice which internal routers and services must handle.

From the user point of view, multiple transmission paths (cable versus wireless) have a high bandwidth advantage. However, WLAN reception can suddenly collapse and the user notices delays because the connection becomes unexpectedly slow. If a download is running at 10 MBit/s and suddenly drops to only 1 MBit/s, this delay can be frustrating. The reason for such fluctuating bandwidth can be poor wireless reception, a high internal network load or a combination of both. The bandwidth may fluctuate at random, for example, when video delivery dynamically adapts display resolution to match the available bandwidth. High available bandwidth leads to high utilisation, even if this may not be good for the user.

Until SSDs became widespread as data storage devices, networks could often transfer data faster than a single device could deliver or receive it. Current SSDs are now more transfer efficient making best use of full line speed. The latest NVMe SSDs can achieve data transfer faster than a 10 Gigabit line can sustain,so congestion can occur. Consequently, even a single data transfer can push a network to its limits.

In addition, routers and firewalls can negatively influence network quality. This is because routers and firewalls are more than just hardware. They have evolved into small computers handling a multitude of tasks. They scan for viruses, forward packages, manage the WAN links etc. These parallel processes result in varying workloads and potentially, fluctuating quality. If, for example, a router is writing a large log file to a hard disk, it can lead to delayed packet processing in the short term. If other performance curves can be measured relatively easily, fluctuations caused by routers or firewalls can be difficult to capture.

Network quality fluctuations can be caused by numerous issues. Errors can increase along with the number of network services. Network administrators often have to waste a great deal of time searching for such errors.

Individual quality parameters such as WLAN speed can be measured using free Open Source tools. However, such measurements have the disadvantage of generating only a snapshot. Therefore, measurement tools that allow real-time and historical data capture and display are more valuable for analysis and troubleshooting purposes.

Such tools may combine multiple analysis modules and can speed up troubleshooting. With such a device, a glance at the most important parameters often helps to detect outliers in an overview of the most active protocols, the largest connections or top IPs. From there, administrators can navigate to potentially suspicious traffic and make granular measurements.

TCP protocol monitoring is useful for identifying packet losses or high retransmission rates that indicate overloaded network nodes. In such a case it is possible to determine which applications or protocols used how much bandwidth and whether this occurs frequently.

The TCP Zero Window Statistics shows the sending of zero window packets during data reception. Picture: Allegro Packets

Fully Utilised End Devices

Another TCP parameter is the TCP Zero Window (see figure). This typically occurs when endpoints are busy. If the TCP window is zero, it indicates that an application cannot process received data. In this case, the network can be excluded as a source of error, the destination server or cache is unable to respond. This does not pose a problem per se, but rather indicates the processing limit. A more powerful server would not necessarily improve the situation; an investment in more powerful end devices may be needed.

In addition to average network load analysis, troubleshooting tools can provide detailed information about peak loads. Short bursts in particular may be statistically lost in displaying average load, but may have a considerable influence on quality for a short time if, for example, VoIP bandwidth is insufficient due to traffic bursts. Analysis tools can show which application and which subscriber caused the bursts, whether they occur regularly and how long they last. Smart troubleshooting appliances are vital to improve network performance.

For VoIP protocols such as SIP, monitoring jitter and packet loss is important. This can have a negative effect on voice quality. In a phone call it can sound like the other party is stuttering. The cause may be classed as microbursts. Jitter can be caused by variable WLAN reception quality where erratic data buffering may cause transient voice and video interruptions.

Fast and efficient troubleshooting is feasible. State-of-the-art tools which measure and correlate significant quantities of data, allow you to quickly recognise and isolate errors. A lengthy analysis of GByte-sized capture files may no longer be necessary. Rather, suspicious packets can be filtered in transit and subjected to detailed analysis. This represents an enormous time saving. Once a problem is detected, adjustments can often be made quickly to restore normal network service.

QoS rules

In many cases, bursts can be minimised in advance with the help of quality classes. A network administrator can set up Quality of Service(QoS) rules. These rules determine how much bandwidth is allocated to which services. For example, a rule could restrict Android updates from a specific IP address range or give UDP traffic precedence over TCP traffic.

The prerequisite for setting QoS rules requires precise knowledge of network traffic. When it is clear which services need to be prioritised and where burst traffic causes problems, rules can be set. The effect on setting such rules can be analysed and re-ordered as required.

If a network administrator only knows the average load, each rule change may lead to a lengthy trial and error process. However, if they have the information at their fingertips thanks to smart network analysis tools, QoS rules can be tweaked accordingly.

QoS rules thus defined in a network should be monitored by the IT manager using analysis tools. Being able to retrospectively view the effects of QoS prioritisation helps a network administrator provide the best service to the organisation.

The original article can be found on LAN-Line Website.

back to the papers