Path measurement: Difference between revisions
|  (→FAQ) |  (→FAQ) | ||
| Line 197: | Line 197: | ||
| # What can I do if I think the packet loss is wrong? | # What can I do if I think the packet loss is wrong? | ||
| #: You can contact [[Reaching Allegro Support|our support]] and if possible provide a small capture from both network sides that cover the same traffic. This will help us to find the reason for the shown packet loss. | #: You can contact [[Reaching Allegro Support|our support]] and if possible provide a small capture from both network sides that cover the same traffic. This will help us to find the reason for the shown packet loss. | ||
| # How can I distinguish between real packet loss and reported packet loss due to packet modifications? | |||
| #: Packet loss is reported when a packet checksum does not match any other checksum reported by the other side within the configured maximum packet delay timeout. Reasons for such an event are | |||
| ## Actual loss | |||
| ##: The packet have been seen on one side but not the other. In this event the loss graph is usually different between the master and remote device. | |||
| ## Packet modification | |||
| ##: Some component modifies the packet in some way. Since such a modification is usually done in both direction (sender and receiver), the packet loss is visible on both sides. Therefore the loss graph is symmetrical. | |||
| ## Overloaded management connection | |||
| ##: The management connection is used to get packet checksum from the remote device. If the connection is not fast enough to transmit the checksum within the configured maximum packet delay time, the packets are then reported as loss. | |||
| ##: This can be verified by looking at the debug graph '''Packet delay between local and remote packets'''. In this event, the time will always be at the top limit that is the maximum packet delay time. | |||
| ## Temporary network failure | |||
| ##: In this event the packets are really lost on both directions so the loss graph may also look similiar. However, since no packet can be transmitted to the other side, you will also see no entries in the two-way-latency graph. | |||
Revision as of 14:36, 7 May 2020
Path measurement
The path measurement module allows to passively measure the packet loss and latency between two Allegro Network Multimeter installations. For example, when there are problems with remote offices, the network connection between the main office and the remote office can be analyzed by installing one Multimeter at the main office and another Multimeter at the remote office. Only those connections going through both Multimeters are analyzed and the packet loss and two-way-latency is measured and shown in graphs.
Overview
The main (or master) device captures packet meta data from the remote (or client) device which takes only a fraction of the total traffic. Approximately 5% additional bandwidth is required for this capture connection. So for fully loaded 100 mbit/s connection to the remote location an additional load of ~5 mbit/s is required to get packet information to the master device. The measurement connection can be a separate line or can be run over the line that is measured, the capture connection will be automatically ignored for the measurement. The measurement module must be configured with a maximum packet delay. This delay describes the amount of time the master devices waits for packet information to arrive from the remote device. The delay must be large enough to cover the actual latency of the connection and delay of the capture connection. Typical values are between 2 and 5 seconds. Larger values requires more memory to buffer packet meta data so very large values might only be selectable on larger Multimeter devices (Allegro 1000 or greater).
Configuration
Settings
- Enable analysis: This will disable or enable the path measurement feature. When disabled, no additional memory is used. When enabled, memory for the packet buffer is used which cannot be used for other analyzing modules thus reducing the maximum time the device can go back in time.
- Description of this master device: This field is only used for informational purposes to identify the master device. It can be freely chosen, for example to the location the device is installed. This field can also be left empty to use the default name master.
The following Remote device configuration section configures the access to the remote device:
- Device to use: To use a remote device for path measurement, you first need to add that device as a remote device the list of Multi-device settings. It does not matter if the device is active or not.
- You can select the device from the list of known multi-devices.
- Device description: Similar to the description of the master device, this field is for informational purpose only and has no other effect than helping identifying the remote device in the statistics. Usually the location of the remote device is entered.
Measurement settings:
- Maximum packet delay: This field describes the maximum amount of seconds to wait for packet information from the remote device.
- It basically means that the master devices waits for this number of seconds before deciding if a packet has been lost or not. If the data from the remote device arrives before those number of seconds, the path measurement can account the packet loss, if any and the two-way latency. This value must be at least as large as the worst-case latency between both measurement sites.
- Usually 3 seconds are more than enough but when the network in between can have a very long delay, you can increase the value. This will, however, use more main memory for the packet buffer.
The settings must be saved but to actually take effect, a restart of the packet processing is necessary. If this step is required will tell so at the bottom of the page under Required actions.
Parameters currently in use
This section shows the current state of the measurement engine. The engine might be inactive even if the feature is enabled. Usually a restart is required to actually make it active. If active, the current packet delay is shown. It might be different from the selected value in the configuration above, but if so a note appears that a restart is required.
Required actions
An info box appears if the a restart of the packet processing is required. The shown link leads to the page Settings → Administration where the restart can be triggered. The device itself does not need to be rebooted, only the packet processing must be restarted which usually takes only a few seconds.
Measurement statistics
The measurement tab show the real-time results of the ongoing measurement. At the top the current state of the measurement engine and the remote connection is shown. The measurement status can be not running if it is disabled, warming up if the engine waits for synchronization with remote device, and running if it actually measures data. The remote client status indicates if the connection to the remote device is established. Since the packet information are gathered real regular capturing from the remote device, the capture connection is visible in the capture section of the remote device and might be stopped there. If the measurement connection is stopped or stopped working for other reasons (remote device unavailable, etc), the status box will turn red and a button appears to reconnect to the remote device. If the reconnect fails, an error message appears with detailed information what was going wrong.
 Typical errors are:
- remote device inaccessible (are the IP and port settings correct?)
- authentication error (invalid credentials?) When both boxes are green, the measurement is running and the four graphs show the real-time results.
Two-Way-Latency
The first graph shows the latency measured from the master device to the remote device and back. It cannot (due to asynchronous local time sources) measure the one-way latency of a single packet but only the duration of packets going in both directions. Example: Assume a packet A is seen from master to client, and another packet B is seen from client to master. The time difference when packet A is seen on master and on client plus the time difference of packet B being seen on client and master is taken into account to determine the two-way latency. Packet A and packet B are does not need to be related in any way. If traffic is going only in one direction, the measurement will not show any time result (even though packet loss is still visible). For each second, the average, minimum, and maximum two-way-latency is accounted and shown the graph. To the left of the graph the statistics for the visible time range is shown, changing the zoom level or time interval will update the values accordingly.
Lost packets
The second and third graph show the number of lost packets in each direction. Lost packets are only accounted for connections that have been seen on both devices. Depending on the installation point and routing setup, connections might be not be routed to the second device on purpose. These connections are not accounted as loss on the other device. The second graph accounts all packets that have been seen on the remote device, but are missed on the master device. That means that those packets got loss on its way to the master device. Accordingly, the third graph accounts all packets that have been seen on the master device, but are missed on the remote device. The graph also contains a line for packets that have been dropped by the client due to overload. If this value is not zero, those packets are accounted as packet loss even though it might not be actually losses. For correct measurements, make sure the graph for remote packet drops is never non-zero. These drops may happen due to several reasons:
- System capture overload: If multiple captures are running in parallel, the CPU might be overloaded. Check the All tab in the Capture page to see how many captures are running. In best case there is only the one capturing connection to the master device.
- The capturing connection is encrypted with SSL. The small Allegro 200 has a limited encryption capacity so for large traffic this can be a bottleneck. The only solution is to use a more powerful Multimeter.
- Capture drops can also occur if the network connection is not capable of transferring the data fast enough. Rule of thumb is that approximately 5% of the total traffic is used for the measurement connection. For example, if the traffic is 500 MBit/s, the measurement requires ~25 MBit/s of bandwidth on the management port.
The fourth graph shows all packets that are monitored for the path measurement. This will cover all connections that have been seen on both devices.
IP statistics
The second tab shows packet loss information for each pair of IP addresses. This statistic covers all IP connections that has been seen on both measurement sides. The table shows the number of packets that have been counted for each communication pair. Additionally the number of packets seen on the master device and the corresponding packet loss is shown. The same statistics are shown for the client device too. You can click on the IP address to go to the detailed statistics of the IP module to check which kind of traffic was happening for that IP. Two graphs are shown for each IP pair which shows the packet loss for both direction on one graph and the total packets in the second graph. There is also a capture button to capture traffic for the IP pair. The captured traffic is only the traffic seen on the master device, it will not contain any packet from the client device as the master device does not have the packet data information available. To capture traffic from the client device, you have to go to the web interface of the client device and start a capture on that device.
Switching graph modes
The toggle buttons above the graphs allow to switch the graph modes from absolute values to relative values. This setting will show the lost packets in relation to the total (monitored) traffic. The second option allows to show mbit/s throughput instead of the packet rate.
Limitations
There are some limitations about the path measurement:
- Due to technical reasons, large clock adjustments cannot be filtered out so if that happens, a very large two-way-latency is measured. Both devices does not need to be time synchronized but large adjustments shall not happen. That means that time synchronization (for example via NTP) should be enabled on both devices or disabled on both devices for best results. However, such clock adjustment miss-measurements are one-time events and will not lead to false values for the following packets.
- The maximum supported packet size for the path measurement is currently 2048 bytes. Larger packets are truncated for the measurement.
- NAT setups and different VLAN combinations on master and client are not supported at the moment. Such flows will be accounted as unmonitored flows in the debug view.
- Different VLAN combinations on master and client are not supported by default. Connections which are visible on both sites with different VLAN tags are accounted as unmonitored flows in the debug view. However, the feature "ignore VLAN tags in flow keys" in the expert settings can be enabled so that the VLAN is completely ignored (for all other measurement modules as well).
- WAN optimizer and similar devices which rewrite some of the traffic are not supported either. If packet data is changed (like modifying the TCP header, adding TCP options, etc) the flow will account packet loss on both sides as the original packets are not seen on the other side. If the device in between also modifies the IP addresses or ports, the flows will be accounted as unmonitored.
- The global setting for the packet length accounting should be set to the same value on both devices.Otherwise identical packets might be considered different because of different length and the bandwidth information will be inconsistent.
Typical use cases
See Investigating network problems on remote sites to get a detailed overview of use cases and device setup.
Debug information
The debug information tab shows additional statistics which are usually only relevant for identifying problems in the path measurement, either program errors or test setup errors.
- Monitored flows seen on both devices:The monitored flows describes all IPv4 and IPv6 connections that have been seen on both devices and are used for calculating the latency and packet loss. Only this traffic can be considered for the actual measurement. In a working setup, the value must be non-zero.
- Flows seen on both devices without matching packets: If a flow is seen on both devices but not a single packet matches on both sides, it indicates a potential network setup problem. This probably means the packet is somehow modified by a device in between both measurement points. This setup is not supported. Usually this value should be zero. Small non-zero values can be ok, if the first number of monitored flows is much larger.
- Unmonitored flows seen only on master: This counter shows the number of IP connections that are only visible at the master device. It means that for those connections no matching client packet has been received. If the master device also sees network traffic that is not routed to the client device, this value can be non-zero.
- Unmonitored flows seen only on client: This is the same counter as for the master device, but counting the connections on the client device that have not been seen on the master. Again, if the client device sees traffic that is not routed to the master, it is fine to see non-zero values here.
Possible problematic scenarios:
- There is a device between master and client that modifies the traffic (like a WAN optimizer): You will notice a larger value for counter 2 (flow without matching packets),almost zero value for counter 1 (flows seen on both devices).
- There is a device between master and client that changes ports and IP addresses (a NAT):
You will notice almost zero values for counter 1 and 2, but high values for counter 3 and 4.
Both scenarios are not supported by the path measurement.  
Please adjust the test setup to disable any device modifying the network as described above.
The table below shows the following counters for the master and remote device:
- The counter about packets seen on all devices measures the total amount of packets monitored and considered for the analysis.
- The packets seen only on one devices indicates how much packets are lost on the other devices.
- Duplicated packets: This counter includes packets that are duplicated or have the same checksum. It is valid to see non-zero values here. Some protocols like broadcast actually do not differ in the payload so the packet checksum will be identical. If those packets appear within the packet delay time window, it is accounted as a duplicate to the previous one.
- Failed to process on master device: This counter indicates that packets from the client have been discarded due to overload of the master. The master device was not fast enough to process client packets. This usually means the local packet rate (at the master device) is too high.
- Ignored on master device: These packets are ignored because the flow is unknown to the master devices. This happens when the packet checksum is received from the client but no connection information for that packet is known by the master. This value should always be zero. Otherwise it means that the number of active flows is too high.
- Packets processed too early: This counter covers packets that packets could not be stored long enough to hit the configured packet delay limit. This happens when the packet rate is higher than the supported packet rate of the master device.
Below the table, two graphs showing time drift information are visible. 
The first graph shows the packet delay. It is the time between a matching packet from the master and the client. 
This value describes for how long the master device needed to wait to get a matching packet from the client. 
This value should always be much lower than the maximum packet delay configured in the path measurement configuration. 
The value cannot be larger than the maximum as then packets can no longer be matched. If the value keeps reaching the maximum, two problems are possible:
- The delay between master and client is large due to generic network delay. For example, if a high-latency connection is used for path measurement, it can even take a few seconds for a packet info to arrive. Configure a larger maximum packet delay.
- The bandwidth of the connection from the client (the client’s upload speed) is too small to satisfy the requirement for the checksum connection. This problem can be identified if even increasing the maximum packet delay does not help. If the bandwidth is too small, the packet will hit the maximum delay for any value configured, it will just take a little longer.
In this case try to use an alternative network connection to connect to the client device.
The second graph shows the time drift between the master and client device. Usually there will always be a drift between the clocks of both devices (if they are not synchronized by some mean). Even large drifts (hours, days, etc) are typically not a problem as the two-way latency zero-out the drift. But if the drift increased dramatically (like multiple seconds) constantly over a large period of time, it usually indicates a bandwidth overload just like the first graph.
FAQ
- What does the note Network setup problem detected: Packet modification or complete loss means?
- This message box appears if flows have been identified for which not a single packet could be seen on both sides. Usually this means that there is some device in between both measurement points that modifies the packet. This can be WAN optimizer which rewrite TCP connection for improved network performance. Such setup is not supported.
- It can also mean that some other packet field is modified at some point in the network. One field that is known for modification is the IP identification field in the IPv4 header. For this case an additional option can be enabled to ignore this field.
 
- What kind of packet information is used to determine latency and packet loss?
- Both measurement devices calculate checksums starting from the layer 3 packet data to compare packet information on both sides. This means for IPv4 and IPv6 traffic, the Ethernet header including possible VLAN tags is ignored. For non-IP traffic, the complete layer 2 packet is used so this traffic can only be analyzed in switched networks.
 
- What can I do if I think the packet loss is wrong?
- You can contact our support and if possible provide a small capture from both network sides that cover the same traffic. This will help us to find the reason for the shown packet loss.
 
- How can I distinguish between real packet loss and reported packet loss due to packet modifications?
- Packet loss is reported when a packet checksum does not match any other checksum reported by the other side within the configured maximum packet delay timeout. Reasons for such an event are
 - Actual loss
- The packet have been seen on one side but not the other. In this event the loss graph is usually different between the master and remote device.
 
- Packet modification
- Some component modifies the packet in some way. Since such a modification is usually done in both direction (sender and receiver), the packet loss is visible on both sides. Therefore the loss graph is symmetrical.
 
- Overloaded management connection
- The management connection is used to get packet checksum from the remote device. If the connection is not fast enough to transmit the checksum within the configured maximum packet delay time, the packets are then reported as loss.
- This can be verified by looking at the debug graph Packet delay between local and remote packets. In this event, the time will always be at the top limit that is the maximum packet delay time.
 
- Temporary network failure
- In this event the packets are really lost on both directions so the loss graph may also look similiar. However, since no packet can be transmitted to the other side, you will also see no entries in the two-way-latency graph.
 
 

