Measuring handshake times using TCP analytics

How to determine handshake times with TCP analysis from Allegro Network Multimeter

TCP analysis with Allegro Network Multimeter

2022-02-10

How much time do your handshakes take?

In Figure 1 you can see the client handshake times of the last 10 minutes in the TCP statistics of the Allegro Network Multimeter. Here you can see at a glance that there have been extended response times in the specified period. But for what reason? Is the server on the Internet too far away? Or maybe the WLAN is too weak? Such questions are quickly a thing of the past with the Allegro Network Multimeter. With it you can easily and quickly find out where the response times are too long and what the cause is.

Reasons for longer handshake times

In the table in Figure 2, all data is displayed in tabular form. Here you have the option of sorting in ascending and descending order according to various parameters and can thus quickly see which server or client has the longest handshake time on average.

This is possible because the Allegro Network Multimeter permanently records and analyzes the handshake times. The advantage of this? You can see at a glance if there are latency issues or perhaps even quality issues with the virtual machine. This is often the case with virtual machines in particular, because they operate according to the principle of "best effort."

Best Effort means that as much computing power is distributed here as is currently available.

This may be fine for one service or another, such as backup, because it does not matter how large or small the time slices are here. For services like an ERP system, on the other hand, things look different. The ERP system sends many small requests, for which it needs computing power immediately.

It is also good to have a quick look at the handshake times for this. We once had the case that the handshake times regularly went way up at certain times. From this we could tell very quickly that something was wrong with the virtual machine. We realized that the host was not allocating enough processing time to the virtual machine. Thus, there were major pauses in which the machine virtually stood still and did not answer any requests.

Measuring response times of a specific IP address

Another useful feature of the TCP statistics is that you can search for any IP address in the table and see its average as well as minimum and maximum response time.

By response time we mean the time it takes for the other side to acknowledge receipt of the TCP data we send out. It is not uncommon that the other side waits a bit with its response to save packets. This is the case with the Linxus operating system, for example. By default, the Linux operating system waits 40 milliseconds before responding. The reason for this is that it hopes to save a few packets if the server responds faster. So it can send the acknowledgement directly.

Figure 2: Table for sorting important parameters

What if the handshake times go well beyond 40 milliseconds?

This is where you should pay attention. In such a case, it usually means that the data packets have reached the server, but the server is either under very high load or the connection is too slow. The same is true in the client direction. If the client is slow to acknowledge the data it receives, it may be that the client or the link is overloaded.

TCP retransmissions

The Allegro Network Multimeter gives you the ability to look into the TCP statistics at any time. This allows you to narrow down on which side the problem lies. This is analogously also possible for TCP retransmissions. As shown in Figure 3, you can see all data packets and retransmitted data of a connection under the menu item TCP Retransmission. This allows you to immediately see what percentage of the data is duplicated and how much data was transmitted in total.

When do data appear twice?

If data appears twice in one place, it means that the remote station did not receive the data. In this case, there was an overload between the device and the receiving system, which caused the data to be lost in the Allegro Network Multimeter.

Typical use case from practice:

There is a complaint that the network is too slow. Now, using the Allegro Network Multimeter, you can measure directly at the server to see the current response time from it. Conversely, you can also see in what time the data is sent out. If no retransmissions are shown here, you can assume that there is no network bandwidth problem. In addition to this, you can also look at the response times. If these are low, you can rule out the network as the cause of the problem altogether. The problem is therefore one of the other sides, in the server or directly in the client, which takes a long time to process the data.

How to find invalid connections?

In Figure 4, you can identify such invalid connections at a glance under the TCP statistics on the "TCP servers with invalid connections" tab. This way you always know immediately which IP address is sending the invalid request and can take action if necessary.

An invalid connection is when a TCP request is sent but no data is displayed. One cause of this could be an attack from outside. But it could also be that someone is sending connections but does not want to transmit them at all and something is disturbed in the client-server communication.

In the table you may also see that some connections contain the status "not valid". This may be the case if only a few bytes were transferred and the handshake is there, but the connections have been open for 20 hours and never closed cleanly. Again, be alert as this could be an attack. Please note that this is not a security feature, but rather a kind of early warning system.

Functions of the TCP flag evaluation

Through this you can easily and quickly see how many flags were used at what time.

This can be an indicator that something is wrong in the network, e.g. if suddenly the reset rate increases very much. In such a case, you can sort the table by the IP that sends or receives the most resets to find the culprit.

When does a zero window occur?

A zero window always occurs when the data has arrived at the server, but the application does not fetch the data fast enough. This is related to the buffer in the core of the operating system. Whenever data arrives too fast at the operating system, the buffer gets smaller. As soon as the buffer is used up, the TCP sends the message "Buffer is 0", a zero window. This has the advantage that the network can be excluded as a problem. This is because the network between the two devices is fast enough, the server just can't keep up.

There are two possible reasons for this:

The window is too small, in which the data may be sent.
Or the application is too slow to accept the data

Under the menu item "TCP Zero Window" shown in Figure 5, you can view at any time which zero windows are present and also track how many have been sent and received. At the same time, you can see how large the amount of data that the operating system can cache, the so-called window size. This is negotiated at the beginning of the TCP connection via the Windows scaling factor. The Windows scaling factor determines the maximum size and cannot be changed at runtime of a connection.

In general, these are all good indicators that the physical cabling, switches, routers, firewalls, etc. are not the problem. Here, the problem is clearly with the end device and its performance. So, as you can see, TCP analysis helps you quickly rule out possible problem originators and get closer to the real problem. The big advantage of TCP is that it also works with a large amount of protocols, especially with fully encrypted traffic like SSL, because TCP is used here as well.

An application example from the Leipzig office:

With Allegro Network Multimeter you can easily sort by the application that sends the most TCP Zero Windows. In our case, there are many from the back-up system. We could see by looking more closely at the times that 500 Zero Window packets were sent per second. At the same time, the response time is very slow. What was the reason for this?

Under the "Peers" item we saw that there was a large transfer of 66 GB from our disk station. In this case, the reason was that once a night a backup is made from our central NAS to our old NAS. Now the new NAS is faster than the old one and can send the data faster.

Exclude traffic using filters

Often there are installations where you either get a large mirror port or have a lot of data from a packet broker. So that you can analyze this, we have built in a network filter. With this you can easily ignore certain traffic that you do not want to record or analyze.

Such connections can additionally be defined as blacklist or whitelist. Maybe you have certain IP or MAC filters that are relevant for your measurements. Or maybe the opposite is true and you want to exclude certain computers that should not be analyzed under any circumstances. Again, this is not a problem with the Allegro Network Multimeter. Please note that even if the individual packets have been excluded, they can still be seen in the interface statistics. This is because the packets were present and registered. However, before they are processed, they are filtered out and discarded internally. To help you keep track of this, we have installed the "filtered traffic" section into the dashboard.

As can be seen in Figure 6, you will find the filter function for the following areas here: IP addresses, subnets, IP pairs, MAC addresses, VLANs, ports, network interfaces filters.

The linking when filtering is OR-based, which means that each filter is applied on its own and separately. For example, if a MAC filter and an IP filter are applied at the same time, then as soon as the address encounters that traffic, it will be filtered out. In the opposite case, if you have added many IP addresses, they will then be Or-linked and the filter will be applied as soon as an IP address is hit.

Conclusion

The function of TCP analysis and handshake times measurement in Allegro Network Multimeter allows to quickly analyze errors and detect possible attacks. We recommend you to this topic the video of our managing director Klaus Degner. The video was recorded in German language. Furthermore you can find instructions in the product wiki.

back to the blog