Network Burst Analysis: Difference between revisions

Undo revision 5063 by Markus.Geissler (talk)
No edit summary
(Undo revision 5063 by Markus.Geissler (talk))
Tag: Undo
 
Line 1: Line 1:
== Problem ==
== Problem ==
How can you use the Allegro Network Multimeter to quickly and easily detect network bursts and find the related sender and receiver? <br>In our example, users are complaining about sudden slow application load times at certain points of the day.<br>To close in on the problem, we can use the Allegro Packets Network Multimeter and measure the TCP handshake time.
How can you use the Allegro Network Multimeter to quickly and easily detect
network bursts and find the related sender and receiver?


== Incident rules ==
== Burst detection ==
Since network bursts appear at random hours of the day the Allegro Network Multimeter has the option to create rules to automatically capture the incidents and notify the user.
The Allegro Network Multimeter provides several options to detect bursts.


To create a new rule first navigate to ‘Incidents’ under ‘Generic’ Tab.
* You can use the total throughput graph in the Dashboard. This graph aggregates the incoming traffic on all interfaces. The data is displayed with a resolution of one second. A burst that leads to a significant increase of traffic with a long enough duration can be easily seen as a spike.
* You can view the traffic graphs on the "Interface stats" page. These graphs display traffic per interface and use the same time resolution of one second.
* For an automatic notification when the total bandwidth throughput is too high, you can use bandwidth incidents. They are configured under 'Settings' -> 'Incident settings' -> 'Global incidents'. Just define a lower or upper threshold of bandwidth or packet rate and a severity. The time resolution is one second.
* For a higher resolution of up to 1 ms you can use the "Interface throughput" incidents. They are per interface incidents and will be generated when a threshold is exceeded.


{|
== Interface throughput incidents ==
| [[File:Burst_analysis_dash.png|1000px|thumb|right]]
In this example we will use the "Interface throughput" incidents to detect bursts and find
|}
who sent the packets.
 
Now continue to the ‘Incident rules’ Tab.
 
{|
| [[File:Burst_analysis_incident_rules.png|1000px|thumb|right]]
|}
 
Here you will find all your already created rules with option of editing or deleting them.
 
To create a new rule simply press the green ‘Add rule’ button.
 
{|
| [[File:Burst_analysis_add_rule.png|600px|thumb|right]]
|}
 
'''Meaning of each setting:'''
 
'''Rule name:''' Name of the to be created rule.
 
'''Severity:''' Type of severity your new rule is looking out for.
 
'''Trigger:''' List of possible Triggers your rule could surveil.
 
'''Attributes:''' Attributes of the new rule. Possible attributes depend on the selected trigger.
 
'''Virtual link group:''' if enabled, can detect end-to-end failure between two physical Ethernet interfaces. It allows the switch to detect unidirectional or bi-directional link failures irrespective of intermediary devices and enables link recovery.
 
'''Time Profile:''' Select when the rule is active and surveying.
 
'''Report channel:''' When the rule is triggered the user will be notified on the selected channel.
 
'''Aggregation of recurring incidents:''' Groups incidents with the same rule trigger if enabled.
 
'''Rule description:''' Every rule can get its own description.


'''Traffic capturing:''' Select which kind of traffic the rule is capturing from.
For back-in-time data capture you can use the packet ring buffer
feature.


Under 'Settings' -> 'Modules settings' -> 'Interface' we enable the measurement module and
set the duration of the measurement interval to 5 ms. You can set it
from several seconds to as low as 1 ms. Under 'Settings' -> 'Incident settings' -> 'Interface
throughput' the incident must be enabled by setting the severity to "Low" and a
threshold of 700 Mbit/s.


For our example, we will create a rule to check if the TCP handshake time stays below 0.2 seconds.
After several minutes we get a notification and go to the overview under
'Generic' -> 'Incidents'. When clicking on the incident we see details about the burst.


{|  
{|  
| [[File:Burst_analysis_handshake_rule.png|600px|thumb|right]]
| [[File:Ap-mm-burst-analysis-incident.png|600px|thumb|right]]
|}
|}


'''Note:''' For our example it would not be wise to enable a notification, because this rule will trigger a lot on a daily basis and fill up the inbox.
The burst started at 14:42:26.695 and lasted around 5 measurement cycles (25 ms).
 
A pcap link is available and will offer a capture of the time around the burst
for a deep per-packet analysis. Let's download the pcap and use it later.


The "Use as global time range" button allows for setting the global data range
around the time of the burst. By using it, all modules in the Allegro Network
Multimeter will display statistics and provide captures for this time range. Since
we want to analyze the burst we click on it.


After creating a rule, if ‘Traffic capturing’ is enabled, with the ‘Capture settings’ you can customize the saving of the capture.  
== What was responsible for the burst? ==
Let's take a look at the Dashboard.


{|  
{|  
| [[File:Burst_analysis_capture_settings.png|1000px|thumb|right]]
|  
[[File:Ap-mm-burst-analysis-dashboard.png|600px|thumb|right]]
|}
|}


'''Meaning of each setting:'''
The total throughput graph time resolution is too low to display the same
 
values as in the incident graph. But we get a good overview of the IPs with the
'''Capture cooldown period:''' Cooldown prevents additional capture for given time for each rule separately.
most traffic during this time interval. AFP and SSL were the most used protocols. The
 
traffic value of an IP is bi-directional, so a sender and receiver pair would
'''Storage Device:''' Select the storage device the capture is saved to.
have around the same traffic and can be seen quite easily.
 
'''Storage directory:''' Insert the directory of the storage device here.
 
'''Select packet ring buffer to capture from:''' List of all possible packet ring buffer to capture from.
 
'''Capture profile:''' List of all created capture profiles.
 
 
== Reporting Channels ==
 
If you want to be notified about a rule triggering, first a notification channel needs to be created.
 
To create more reporting channels, visit the ‘Notification channels’ tab.


{|
We could assume that any of the top four IP addresses is either the sender or
| [[File:Burst_analysis_reporting_channels.png|1000px|thumb|right]]
receiver of the burst packets. Though the fifth IP address has a relatively
|}
high packet rate compared to the others, the byte count is significantly
lower and it is not likely involved in the burst.


Firstly, you will see a list of all created channels with the options of sending a test to the given recipient, editing the channel or the option to delete it.
You can zoom in and out in all graphs  by pressing the Shift key and use the
mouse wheel. This will set the global time range and update the displayed graphs
and values. After zooming out, you still see the same traffic distribution on the
Dashboard.


With the green ‘Add channel’ button a new channel can be created.
Let's check the more detailed IP list under 'IP' -> 'IP' statistics to get a clearer
picture. We want to find out whether the top IP addresses were communicating
with each other. Perhaps we can find some pattern in the traffic related to the
burst?


{|  
{|  
| [[File:Burst_analysis_add_channel.png|600px|thumb|right]]
|  
[[File:Ap-mm-burst-analysis-ips.png|600px|thumb|right]]
|}
|}


'''Meaning of each field:'''
We can immediately see a spike in both IP addresses 10.54.0.108 and 10.54.0.225
 
around the time of the incident.
'''Name:''' Name of the channel
 
'''Type:''' Type of message that should be send in case of an incident.
 
'''Severity threshold:''' what kind of incident severity does the channel need to report the incident?
 
'''Handle incidents for:''' what kind of traffic incidents does the channel handle?
 
'''Email recipient address:''' name of the email address the incident should be reported to.
 
== Incident statistics ==


If you want to see a live view of all the data your rules are checking, under ‘Incident statistics’ tab you can get short statistics for all your created rules.
Now let's analyze the IP address 10.54.0.108 by clicking on it and opening the
tab "Peers":


{|  
{|  
| [[File:Burst_analysis_incident_statistics.png|1000px|thumb|right]]
|  
[[File:Ap-mm-burst-analysis-ip-peer.png|600px|thumb|right]]
|}
|}


Both IP addresses communicated with each other. 10.54.0.225 suddenly started
sending a unusually high number of packets to 10.54.0.108.


 
We can now check for more details in the pcap provided by the throughput incident.
== Occurred incidents ==
 
Finally, under the ‘Occurred incidents’ tab you will see a list of all past incidents.
 
{|
| [[File:Burst_analysis_occured_incidents.png|1000px|thumb|right]]
|}
 
Clicking on the subject of an incident will open a new window with more information for the specific occurrence.


{|  
{|  
| [[File:Burst_analysis_rule_trigger.png|800px|thumb|right]]
|[[File:Ap-mm-burst-analysis-wireshark.png|600px|thumb|right]]
|}
|}


Should a further investigation of the IP statistics be necessary, click on the given link in the subject. It will open an analysis of the IP address of the timeframe when the incident occurred.<br>These statistics can also be downloaded as a PCAP for further analysis.
Before the time of the incident, the traffic was significantly lower. At
 
14:42:26.69497 IP address 10.54.0.108 sent a packet to 10.54.0.225 which triggered
{|
the traffic burst.
| [[File:Burst_analysis_ip_statistics.png|1000px|thumb|right]]
|}