Path measurement: Difference between revisions
| No edit summary | No edit summary | ||
| Line 131: | Line 131: | ||
| # Due to technical reasons, large clock adjustments cannot be filtered out so if that happens, a very large two-way-latency is measured. Both devices does not need to be time synchronized but large adjustments shall not happen. That means that time synchronization (for example via NTP) should be enabled on both devices or disabled on both devices for best results. However, such clock adjustment miss-measurements are one-time events and will not lead to false values for the following packets. | # Due to technical reasons, large clock adjustments cannot be filtered out so if that happens, a very large two-way-latency is measured. Both devices does not need to be time synchronized but large adjustments shall not happen. That means that time synchronization (for example via NTP) should be enabled on both devices or disabled on both devices for best results. However, such clock adjustment miss-measurements are one-time events and will not lead to false values for the following packets. | ||
| # The maximum supported packet size for the path measurement is currently 2048 bytes. Larger packets are truncated for the measurement. | ## The maximum supported packet size for the path measurement is currently 2048 bytes. Larger packets are truncated for the measurement. | ||
| '''3.''' NAT setups and different VLAN combinations on master and client are not supported at the moment. Such flows will be accounted as unmonitored flows in the debug view. | '''3.''' NAT setups and different VLAN combinations on master and client are not supported at the moment. Such flows will be accounted as unmonitored flows in the debug view. | ||
Revision as of 06:54, 23 April 2020
Path measurement
The path measurement module allows to passively measure the packet loss and latency between two Allegro Network Multimeter installations. For example, when there are problems with remote offices, the network connection between the main office and the remote office can be analyzed by installing one Multimeter at the main office and another Multimeter at the remote office. Only those connections going through both Multimeters are analyzed and the packet loss and two-way-latency is measured and shown in graphs.
Overview
The main (or master) device captures packet meta data from the remote (or client) device which takes only a fraction of the total traffic. Approximately 5% additional bandwidth is required for this capture connection. So for fully loaded 100 mbit/s connection to the remote location an additional load of ~5 mbit/s is required to get packet information to the master device. The measurement connection can be a separate line or can be run over the line that is measured, the capture connection will be automatically ignored for the measurement. The measurement module must be configured with a maximum packet delay. This delay describes the amount of time the master devices waits for packet information to arrive from the remote device. The delay must be large enough to cover the actual latency of the connection and delay of the capture connection. Typical values are between 2 and 5 seconds. Larger values requires more memory to buffer packet meta data so very large values might only be selectable on larger Multimeter devices (Allegro 1000 or greater).
Configuration
Web interface
1- Settings
- Enable analysis: This will disable or enable the path measurement feature. When disabled, no additional memory is used. When enabled, memory for the packet buffer is used which cannot be used for other analyzing modules thus reducing the maximum time the device can go back in time.
- Description of this master device: This field is only used for informational purposes to identify the master device. It can be freely chosen, for example to the location the device is installed. This field can also be left empty to use the default name master.
The following Remote device configuration section configures the access to the remote device:
- Device IP or name: This is the IP address or host name of the remote device under which the master device can contact the remote device. The remote device must be accessible at the location of the master device.
- Device description: Similar to the description of the master device, this field is for informational purpose only and has no other effect than helping identifying the remote device in the statistics. Usually the location of the remote device is entered.
- Port: This is the TCP port under which the web management is accessible. The Multimeter uses port 443 for its SSL encrypted web site but if a firewall with port-forwarding is used, a different port might be necessary.
- Username/Password: These are the login credential as configured on the remote device.
Measurement settings:
- Maximum packet delay: This field describes the maximum amount of seconds to wait for packet information from the remote device. The settings must be saved but to actually take effect, a restart of the packet processing is necessary. If this step is required will tell so at the bottom of the page under Required actions.
- Parameters currently in use: This section shows the current state of the measurement engine. The engine might be inactive even if the feature is enabled. Usually a restart is required to actually make it active. If active, the current packet delay is shown. It might be different from the selected value in the configuration above, but if so a note appears that a restart is required.
- Required actions: An info box appears if the a restart of the packet processing is required. The shown link leads to the page Settings → Administration where the restart can be triggered. The device itself does not need to be rebooted, only the packet processing must be restarted which usually takes only a few seconds.
- Custom remote device SSL certificate: If a custom SSL certificate is installed on the remote device, you have to upload the public certificate to the master device as well. Otherwise you will get a SSL error during connect to the remote device. Select the PEM certificate and click on Install certificate to upload it to the device. You can also remove an already installed certificate by clicking on the Remove certificate button.
Measurement statistics
Web interface
The measurement tab show the real-time results of the ongoing measurement. At the top the current state of the measurement engine and the remote connection is shown. The measurement status can be not running if it is disabled, warming up if the engine waits for synchronization with remote device, and running if it actually measures data. The remote client status indicates if the connection to the remote device is established. Since the packet information are gathered real regular capturing from the remote device, the capture connection is visible in the capture section of the remote device and might be stopped there. If the measurement connection is stopped or stopped working for other reasons (remote device unavailable, etc), the status box will turn red and a button appears to reconnect to the remote device. If the reconnect fails, an error message appears with detailed information what was going wrong.
 Typical errors are:
- remote device inaccessible (are the IP and port settings correct?)
- authentication error (invalid credentials?) When both boxes are green, the measurement is running and the four graphs show the real-time results.
1- Two-Way-Latency
The first graph shows the latency measured from the master device to the remote device and back. It cannot (due to asynchronous local time sources) measure the one-way latency of a single packet but only the duration of packets going in both directions. Example: Assume a packet A is seen from master to client, and another packet B is seen from client to master. The time difference when packet A is seen on master and on client plus the time difference of packet B being seen on client and master is taken into account to determine the two-way latency. Packet A and packet B are does not need to be related in any way. If traffic is going only in one direction, the measurement will not show any time result (even though packet loss is still visible). For each second, the average, minimum, and maximum two-way-latency is accounted and shown the graph. To the left of the graph the statistics for the visible time range is shown, changing the zoom level or time interval will update the values accordingly.
 2- Lost packets 
The second and third graph show the number of lost packets in each direction. Lost packets are only accounted for connections that have been seen on both devices. Depending on the installation point and routing setup, connections might be not be routed to the second device on purpose. These connections are not accounted as loss on the other device. The second graph accounts all packets that have been seen on the remote device, but are missed on the master device. That means that those packets got loss on its way to the master device. Accordingly, the third graph accounts all packets that have been seen on the master device, but are missed on the remote device. The graph also contains a line for packets that have been dropped by the client due to overload. If this value is not zero, those packets are accounted as packet loss even though it might not be actually losses. For correct measurements, make sure the graph for remote packet drops is never non-zero. These drops may happen due to several reasons:
1. System capture overload: If multiple captures are running in parallel, the CPU might be overloaded. Check the All tab in the Capture page to see how many captures are running.  In best case there is only the one capturing connection to the master device.
2. The capturing connection is encrypted with SSL. The small Allegro 200 has a limited encryption capacity so for large traffic this can be a bottleneck. The only solution is to use a more powerful Multimeter.
3. Capture drops can also occur if the network connection is not capable of transferring the data fast enough. Rule of thumb is that approximately 5% of the total traffic is used for the measurement connection. For example, if the traffic is 500 MBit/s, the measurement requires ~25 MBit/s of bandwidth on the management port. 
The fourth graph shows all packets that are monitored for the path measurement. This will cover all connections that have been seen on both devices.
 3- IP statistics 
The second tab shows packet loss information for each pair of IP addresses. This statistic covers all IP connections that has been seen on both measurement sides. The table shows the number of packets that have been counted for each communication pair. Additionally the number of packets seen on the master device and the corresponding packet loss is shown. The same statistics are shown for the client device too. You can click on the IP address to go to the detailed statistics of the IP module to check which kind of traffic was happening for that IP. Two graphs are shown for each IP pair which shows the packet loss for both direction on one graph and the total packets in the second graph. There is also a capture button to capture traffic for the IP pair. The captured traffic is only the traffic seen on the master device, it will not contain any packet from the client device as the master device does not have the packet data information available. To capture traffic from the client device, you have to go to the web interface of the client device and start a capture on that device.
 4- Switching graph modes 
The toggle buttons above the graphs allow to switch the graph modes from absolute values to relative values. This setting will show the lost packets in relation to the total (monitored) traffic. The second option allows to show mbit/s throughput instead of the packet rate.
Limitations
There are some limitations about the path measurement:
- Due to technical reasons, large clock adjustments cannot be filtered out so if that happens, a very large two-way-latency is measured. Both devices does not need to be time synchronized but large adjustments shall not happen. That means that time synchronization (for example via NTP) should be enabled on both devices or disabled on both devices for best results. However, such clock adjustment miss-measurements are one-time events and will not lead to false values for the following packets.
- The maximum supported packet size for the path measurement is currently 2048 bytes. Larger packets are truncated for the measurement.
 
3. NAT setups and different VLAN combinations on master and client are not supported at the moment. Such flows will be accounted as unmonitored flows in the debug view.
4.WAN optimizer and similar devices which rewrite some of the traffic are not supported either. If packet data is changed (like modifying the TCP header, adding TCP options, etc) the flow will account packet loss on both sides as the original packets are not seen on the other side. If the device in between also modifies the IP addresses or ports, the flows will be accounted as unmonitored.
5. The global setting for the packet length accounting should be set to the same value on both devices.Otherwise identical packets might be considered different because of different length and the bandwidth information will be inconsistent.
Typical use cases
See Investigating network problems on remote sites to get a detailed overview of use cases and device setup.
Debug information
The debug information tab shows additional statistics which are usually only relevant for identifying problems in the path measurement, either program errors or test setup errors.
1. Monitored flows seen on both devices: The monitored flows describes all IPv4 and IPv6 connections that have been seen on both devices and are used for calculating the latency and packet loss. Only this traffic can be considered for the actual measurement. In a working setup, the value must be non-zero.
2. Flows seen on both devices without matching packets: If a flow is seen on both devices but not a single packet matches on both sides, it indicates a potential network setup problem. This probably means the packet is somehow modified by a device in between both measurement points. This setup is not supported. Usually this value should be zero. Small non-zero values can be ok, if the first number of monitored flows is much larger.
3. Unmonitored flows seen only on master: This counter shows the number of IP connections that are only visible at the master device. It means that for those connections no matching client packet has been received. If the master device also sees network traffic that is not routed to the client device, this value can be non-zero.
4. Unmonitored flows seen only on client: This is the same counter as for the master device, but counting the connections on the client device that have not been seen on the master. Again, if the client device sees traffic that is not routed to the master, it is fine to see non-zero values here.
Possible problematic scenarios:
- There is a device between master and client that modifies the traffic (like a WAN optimizer): You will notice a larger value for counter 2 (flow without matching packets), almost zero value for counter 1 (flows seen on both devices).
- There is a device between master and client that changes ports and IP addresses (a NAT): You will notice almost zero values for counter 1 and 2, but high values for counter 3 and 4.
Both scenarios are not supported by the path measurement. Please adjust the test setup to disable any device modifying the network as described above.
The table below shows the following counters for the master and remote device:
1. The counter about packets seen on all devices measures the total amount of packets monitored and considered for the analysis.
2. The packets seen only on one devices indicates how much packets are lost on the other devices. 3. Duplicated packets: This counter includes packets that are duplicated or have the same checksum. It is valid to see non-zero values here. Some protocols like broadcast actually do not differ in the payload so the packet checksum will be identical. If those packets appear within the packet delay time window, it is accounted as a duplicate to the previous one.
4. Failed to process on master device: This counter indicates that packets from the client have been discarded due to overload of the master. The master device was not fast enough to process client packets. This usually means the local packet rate (at the master device) is too high. 5. Ignored on master device: These packets are ignored because the flow is unknown to the master devices. This happens when the packet checksum is received from the client but no connection information for that packet is known by the master. This value should always be zero. Otherwise it means that the number of active flows is too high.
6. Packets processed too early: This counter covers packets that packets could not be stored long enough to hit the configured packet delay limit. This happens when the packet rate is higher than the supported packet rate of the master device.
Below the table, two graphs showing time drift information are visible. 
The first graph shows the packet delay. It is the time between a matching packet from the master and the client. 
This value describes for how long the master device needed to wait to get a matching packet from the client. 
This value should always be much lower than the maximum packet delay configured in the path measurement configuration. 
The value cannot be larger than the maximum as then packets can no longer be matched. If the value keeps reaching the maximum, two problems are possible:
1. The delay between master and client is large due to generic network delay. For example, if a high-latency connection is used for path measurement, it can even take a few seconds for a packet info to arrive. Configure a larger maximum packet delay.
2.The bandwidth of the connection from the client (the client’s upload speed) is too small to satisfy the requirement for the checksum connection. 
This problem can be identified if even increasing the maximum packet delay does not help. 
If the bandwidth is too small, the packet will hit the maximum delay for any value configured, it will just take a little longer.
In this case try to use an alternative network connection to connect to the client device.
The second graph shows the time drift between the master and client device. Usually there will always be a drift between the clocks of both devices (if they are not synchronized by some mean). Even large drifts (hours, days, etc) are typically not a problem as the two-way latency zero-out the drift. But if the drift increased dramatically (like multiple seconds) constantly over a large period of time, it usually indicates a bandwidth overload just like the first graph.
FAQ
1. What does the note Network setup problem detected: Packet modification or complete loss means?
This message box appears if flows have been identified for which not a single packet could be seen on both sides. Usually this means that there is some device in between both measurement points that modifies the packet. This can be WAN optimizer which rewrite TCP connection for improved network performance. Suchsetup is not supported 2. What kind of packet information is used to determine latency and packet loss? Both measurement devices calculate checksums starting from the layer 3 packet data to compare packet information on both sides. This means for IPv4 and IPv6 traffic, the Ethernet header including possible VLAN tags is ignored. For non-IP traffic, the complete layer 2 packet is used so this traffic can only be analyzed in switched networks.

