Performance Optimization Guide

From Allegro Packets Product Wiki
Jump to navigation Jump to search

About

This guide is about performance optimization of the Allegro Network Multimeter for specific use cases. By default, the device runs in a configuration that fits for the majority of users and you do not need to change any parameter of the configuration. Depending on the actual network traffic and measurement setup, it can be beneficial to adjust some performance-related parameters to achieve better overall performance.

High level Allegro system layout

The Allegro Network Multimeter has various components that process traffic. These components are:

  • I/O threads: responsible for all I/O operations between the network interface cards and the CPUs.
  • Analyzer threads: responsible for decoding network traffic and most of the database operations for the statistical values.
  • DB threads: optional threads which offload memory intensive database operations, see DB mode.

The Allegro Network Multimeter uses queues to buffer packets and messages between the hardware components (interfaces chips, central processing unit, storage) and threads. All threads measure their load individually which can be monitored at InfoSystem infoLoad.

The utilization of the following queues can be monitored to see if and where changes to queue settings are helpful:

Interface hardware queues

The interface hardware queue is between the network interfaces and the I/O threads of the central processing unit. Whenever the I/O threads are too slow to consume all packets from the built-in or extension network interfaces, the hardware miss counter of the Interface statistics will increase over time.

The load of the I/O threads can be checked at the InfoSystem infoLoad.

TODO: add overloaded Interface graph.

If the load is near 100%, packet loss can occur and the following countermeasures can be attempted:

  1. The Bridge mode requires approximately 10% - 30% more load on the I/O threads than the Sink mode. The I/O threads have to sent the incoming traffic to the corresponding outgoing network interface for forwarding. If packet forwarding is not necessary (for example when being deployed at a Mirror Port), switching to Sink mode will improve the performance of the device. For configuration, please see Global settings.
  2. The number of queues can be adjusted at the cost of analyzer threads. Each queue uses a corresponding CPU thread so more queues means less CPU threads available for other components. This option is available on the Allegro 3000 and above due to the large number of internal CPU cores. Allegro Packets recommends to increase the number of queues only on the Allegro 3000 and above if necessary.
This setting can be changed at Settings -> Global settingsExpert settings
Rx sockets io thread.png
Allegro Packets recommends to test with HT enabled and 2 or 4 queue for I/O. If you see a high load on the analyzers, you can also test with 4 queues (I/O threads) without HT for maximum performance.

Analyzer queues

The Allegro Network Multimeter has a packet queue between I/O threads and the analyzer threads which distribute the incoming packets to the actual processing threads. There are two statistics describing the load of the queues and analyzer threads. If analyzer threads cannot process incoming packets quickly enough, the corresponding queue will eventually overflow and packets must be skipped for processing.

  1. Graphs for skipped packets: You can check at the Interface stats per interface and check whether all I/O threads were able to push all packets to the analyzer queues or not. The corresponding counter is Not processed due to overload.
    Interface analyzer packet drop.png
    Note that high counters for a few seconds at the initial startup of the device are normal when it is started under high network load scenarios.
  2. Utilization of individual analyzer threads: The load of the analyzer threads can be checked at the InfoSystem infoLoad. Depending on the Allegro model there can be two (Allegro 200, Allegro 500) or up to 120 (Allegro 5300 or 5500) analyzer threads. The load graph gives an indication about the overall utilization of each thread but the important counter is the Not processed due to overload counter since this is the event when ultimately one or more packets could not be processed due to overload.

There are 3 scenarios where the a queue overload can occur which are described in the following sections:

Skipped packets at high analyzer load

The Allegro Network Multimeter has reached its processing limit for current traffic when the load of one or multiple analyzers reaches 100%.

There are a some options to reduce the analyzer load but they come with the penalty of no longer seeing the entire measurement data. You can disable some features or add a NIC filter to process only parts of the traffic.

  1. You can reduce the level of analysis at Settings -> Global settings.
    Detail of traffic analysis.png
    Every level reduction will reduce the amount of analyzed data and saved database operations, see Global_settings#Limit_module_processing for more details of this option. It is possible to adjust the setting so that live traffic is stored as fast possible to the ring buffer without further analysis and re-analyze parts of the recorded traffic with full analysis by using Parallel packet processing.
  2. The NIC filter can be used to reduce the amount of monitored traffic. It excludes traffic from the analyzers for the cost of not seeing all traffic of the link. See the interface filter for more details.

If none of these options are applicable, you need to upgrade the Allegro Network Multimeter to a larger model with more performance (Allegro 1000 to 3000 or 3000 to 5000).

Skipped packets at low analyzer load

The Allegro Network Multimeter conserves energy in very low traffic situations. Large packet bursts can lead to a high traffic situation so the analyser threads cannot keep up with the incoming packets fast enough during the period of power adjustment. After that small period of time, the analyzer threads are again fast enough to process the traffic.

This can be identified if packets are not processed while the system load is still not very high at the same time.

If this occurs, the option Analyzer queue overcommit at SettingsGlobal settingsExpert settings can be enabled. It increases the queue size to be able to buffer network burts during a short time period. The disadvantage is that the queues are overcommitted so it can happen that the network card does not have enough buffers available for incoming packets so some packets may not be forwarded in bridge. Therefore, this option is only recommended in Sink mode.

Queue overcommit.png

Skipped packets due to analyzer load imbalance

By default, the Allegro load balances the traffic between the analyzers based on the IP addresses of the client and server. This provides good balancing in most situations.

Network packets cannot distributed equally among all analyzer threads if there are many connections between only few IP addresses. An example is 2 SIP trunks with many RTP connections.

The load statistics will show parts of the analyzers with a constant high load and others with a significantly lower load.

The load balancing behavior of the Allegro Network Multimeter can be changed to flow-based load balancing mode at Settings -> Global settingsExpert settings.

Flow load balancing.png

This mode improves the performance only for imbalanced traffic. Use the option only if required since it has a negative performance impact on balanced traffic.

Database queues

The database mode is an extension for large Allegro models with multiple CPU sockets and it is disabled by default. This mode is only recommended for Allegro 3500 rev1 and Allegro 5500 rev1. The database mode helps to improve the performance for very high database loads. This could happen for millions of open and new connections in combination with NUMA bottlenecks. See DB mode for more details.

If enabled, you can check if there are message drops between the analyzer threads and the DB threads in the load statistics.

The ratio of DB threads vs analyzer threads can be adjusted so that ideally all threads have similar load.

The advantage of the DB mode is that additional message queues are used which can buffer much more information and therefore reduce the load on the analyzer threads. This will reduce the likelihood of skipped packets.

Disk I/O queues

The analyzer threads have to use additional queues for capturing packets to each disk or each disk cluster. Storage devices like HDDs and SSDs do not offer a constant write rate and have sudden write slowdowns. Please read the performance guide for the ring buffer Ring_Buffer_Configuration_Guide#Performance on how to adjust the options for high capturing performance.

The two generic solutions are to increase the buffer and to use filter rules. Both will reduce the number of bytes that are written to the disk.