Network Burst Analysis: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 1: Line 1:
== Problem ==
== Problem ==
How can you use the Allegro Network Multimeter to quickly and easily detect
How can you use the Allegro Network Multimeter to quickly and easily detect network bursts and find the related sender and receiver? <br>In our example, users are complaining about sudden slow application load times at certain points of the day.<br>To close in on the problem, we can use the Allegro Packets Network Multimeter and measure the TCP handshake time.
network bursts and find the related sender and receiver?


== Burst detection ==
== Incident rules ==
The Allegro Network Multimeter provides several options to detect bursts.
Since network bursts appear at random hours of the day the Allegro Network Multimeter has the option to create rules to automatically capture the incidents and notify the user.


* You can use the total throughput graph in the Dashboard. This graph aggregates the incoming traffic on all interfaces. The data is displayed with a resolution of one second. A burst that leads to a significant increase of traffic with a long enough duration can be easily seen as a spike.
To create a new rule first navigate to ‘Incidents’ under ‘Generic’ Tab.
* You can view the traffic graphs on the "Interface stats" page. These graphs display traffic per interface and use the same time resolution of one second.
* For an automatic notification when the total bandwidth throughput is too high, you can use bandwidth incidents. They are configured under 'Settings' -> 'Incident settings' -> 'Global incidents'. Just define a lower or upper threshold of bandwidth or packet rate and a severity. The time resolution is one second.
* For a higher resolution of up to 1 ms you can use the "Interface throughput" incidents. They are per interface incidents and will be generated when a threshold is exceeded.


== Interface throughput incidents ==
{|
In this example we will use the "Interface throughput" incidents to detect bursts and find
| [[File:Burst_analysis_dash.png|1000px|thumb|right]]
who sent the packets.
|}
 
Now continue to the ‘Incident rules’ Tab.
 
{|
| [[File:Burst_analysis_incident_rules.png|1000px|thumb|right]]
|}
 
Here you will find all your already created rules with option of editing or deleting them.
 
To create a new rule simply press the green ‘Add rule’ button.
 
{|
| [[File:Burst_analysis_add_rule.png|600px|thumb|right]]
|}
 
'''Meaning of each setting:'''
 
'''Rule name:''' Name of the to be created rule.
 
'''Severity:''' Type of severity your new rule is looking out for.
 
'''Trigger:''' List of possible Triggers your rule could surveil.
 
'''Attributes:''' Attributes of the new rule. Possible attributes depend on the selected trigger.
 
'''Virtual link group:''' if enabled, can detect end-to-end failure between two physical Ethernet interfaces. It allows the switch to detect unidirectional or bi-directional link failures irrespective of intermediary devices and enables link recovery.
 
'''Time Profile:''' Select when the rule is active and surveying.
 
'''Report channel:''' When the rule is triggered the user will be notified on the selected channel.
 
'''Aggregation of recurring incidents:''' Groups incidents with the same rule trigger if enabled.
 
'''Rule description:''' Every rule can get its own description.


For back-in-time data capture you can use the packet ring buffer
'''Traffic capturing:''' Select which kind of traffic the rule is capturing from.
feature.


Under 'Settings' -> 'Modules settings' -> 'Interface' we enable the measurement module and
set the duration of the measurement interval to 5 ms. You can set it
from several seconds to as low as 1 ms. Under 'Settings' -> 'Incident settings' -> 'Interface
throughput' the incident must be enabled by setting the severity to "Low" and a
threshold of 700 Mbit/s.


After several minutes we get a notification and go to the overview under
For our example, we will create a rule to check if the TCP handshake time stays below 0.2 seconds.
'Generic' -> 'Incidents'. When clicking on the incident we see details about the burst.


{|  
{|  
| [[File:Ap-mm-burst-analysis-incident.png|600px|thumb|right]]
| [[File:Burst_analysis_handshake_rule.png|600px|thumb|right]]
|}
|}


The burst started at 14:42:26.695 and lasted around 5 measurement cycles (25 ms).
'''Note:''' For our example it would not be wise to enable a notification, because this rule will trigger a lot on a daily basis and fill up the inbox.
A pcap link is available and will offer a capture of the time around the burst
 
for a deep per-packet analysis. Let's download the pcap and use it later.


The "Use as global time range" button allows for setting the global data range
around the time of the burst. By using it, all modules in the Allegro Network
Multimeter will display statistics and provide captures for this time range. Since
we want to analyze the burst we click on it.


== What was responsible for the burst? ==
After creating a rule, if ‘Traffic capturing’ is enabled, with the ‘Capture settings’ you can customize the saving of the capture.  
Let's take a look at the Dashboard.


{|  
{|  
|  
| [[File:Burst_analysis_capture_settings.png|1000px|thumb|right]]
[[File:Ap-mm-burst-analysis-dashboard.png|600px|thumb|right]]
|}
|}


The total throughput graph time resolution is too low to display the same
'''Meaning of each setting:'''
values as in the incident graph. But we get a good overview of the IPs with the
 
most traffic during this time interval. AFP and SSL were the most used protocols. The
'''Capture cooldown period:''' Cooldown prevents additional capture for given time for each rule separately.
traffic value of an IP is bi-directional, so a sender and receiver pair would
 
have around the same traffic and can be seen quite easily.
'''Storage Device:''' Select the storage device the capture is saved to.
 
'''Storage directory:''' Insert the directory of the storage device here.
 
'''Select packet ring buffer to capture from:''' List of all possible packet ring buffer to capture from.
 
'''Capture profile:''' List of all created capture profiles.
 
 
== Reporting Channels ==
 
If you want to be notified about a rule triggering, first a notification channel needs to be created.
 
To create more reporting channels, visit the ‘Notification channels’ tab.


We could assume that any of the top four IP addresses is either the sender or
{|
receiver of the burst packets. Though the fifth IP address has a relatively
| [[File:Burst_analysis_reporting_channels.png|1000px|thumb|right]]
high packet rate compared to the others, the byte count is significantly
|}
lower and it is not likely involved in the burst.


You can zoom in and out in all graphs  by pressing the Shift key and use the
Firstly, you will see a list of all created channels with the options of sending a test to the given recipient, editing the channel or the option to delete it.
mouse wheel. This will set the global time range and update the displayed graphs
and values. After zooming out, you still see the same traffic distribution on the
Dashboard.


Let's check the more detailed IP list under 'IP' -> 'IP' statistics to get a clearer
With the green ‘Add channel’ button a new channel can be created.
picture. We want to find out whether the top IP addresses were communicating
with each other. Perhaps we can find some pattern in the traffic related to the
burst?


{|  
{|  
|  
| [[File:Burst_analysis_add_channel.png|600px|thumb|right]]
[[File:Ap-mm-burst-analysis-ips.png|600px|thumb|right]]
|}
|}


We can immediately see a spike in both IP addresses 10.54.0.108 and 10.54.0.225
'''Meaning of each field:'''
around the time of the incident.
 
'''Name:''' Name of the channel
 
'''Type:''' Type of message that should be send in case of an incident.
 
'''Severity threshold:''' what kind of incident severity does the channel need to report the incident?
 
'''Handle incidents for:''' what kind of traffic incidents does the channel handle?
 
'''Email recipient address:''' name of the email address the incident should be reported to.
 
== Incident statistics ==


Now let's analyze the IP address 10.54.0.108 by clicking on it and opening the
If you want to see a live view of all the data your rules are checking, under ‘Incident statistics’ tab you can get short statistics for all your created rules.
tab "Peers":


{|  
{|  
|  
| [[File:Burst_analysis_incident_statistics.png|1000px|thumb|right]]
[[File:Ap-mm-burst-analysis-ip-peer.png|600px|thumb|right]]
|}
|}


Both IP addresses communicated with each other. 10.54.0.225 suddenly started
sending a unusually high number of packets to 10.54.0.108.


We can now check for more details in the pcap provided by the throughput incident.
 
== Occurred incidents ==
 
Finally, under the ‘Occurred incidents’ tab you will see a list of all past incidents.
 
{|
| [[File:Burst_analysis_occured_incidents.png|1000px|thumb|right]]
|}
 
Clicking on the subject of an incident will open a new window with more information for the specific occurrence.


{|  
{|  
|[[File:Ap-mm-burst-analysis-wireshark.png|600px|thumb|right]]
| [[File:Burst_analysis_rule_trigger.png|800px|thumb|right]]
|}
|}


Before the time of the incident, the traffic was significantly lower. At
Should a further investigation of the IP statistics be necessary, click on the given link in the subject. It will open an analysis of the IP address of the timeframe when the incident occurred.<br>These statistics can also be downloaded as a PCAP for further analysis.
14:42:26.69497 IP address 10.54.0.108 sent a packet to 10.54.0.225 which triggered
 
the traffic burst.
{|
| [[File:Burst_analysis_ip_statistics.png|1000px|thumb|right]]
|}

Navigation menu