Generic troubleshooting processes: Difference between revisions
| Remco.derooy (talk | contribs) | Remco.derooy (talk | contribs)  | ||
| (15 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| '''Allegro Network Multimeter troubleshooting workflows''' | |||
| Every now and then we get asked, what a (generic) troubleshooting approach/workflow with an Allegro Network Multimeter would look like. | |||
| And, rightfully so, because the endless possibilities of an Allegro Network Multimeter might be overwhelming for some. | |||
| In this tutorial, we’ll go into several topics that might be of interest to you -the user- while working with an Allegro Network Multimeter.  | In this tutorial, we’ll go into several topics that might be of interest to you -the user- while working with an Allegro Network Multimeter.  | ||
| Line 12: | Line 12: | ||
| It all starts with basic understanding about what’s actually presented on your screen. | It all starts with basic understanding about what’s actually presented on your screen. | ||
| When it comes to providing you with elementary yet essential and actionable troubleshooting insights, Allegro has got you covered with the “Top users” and “Quality” dashboards. | When it comes to providing you with elementary yet essential and actionable troubleshooting insights, Allegro Packets has got you covered with the “Top users” and “Quality” dashboards. | ||
| Both can be found at the top of the control menu, at the left hand side of the web interface. | Both can be found at the top of the control menu, at the left hand side of the web interface. | ||
| Line 22: | Line 22: | ||
| ''Above: access to “Top users” and “Quality” screens highlighted in the green box''   | ''Above: access to “Top users” and “Quality” screens highlighted in the green box''   | ||
| ==TOP users == | ==TOP users == | ||
| The “Top users” screen is a great place to start your generic troubleshooting workflow. The top users screen provides you with high-level information about what is going on in your network. On this page you will find trending graphs and tables, depicting total packets and bytes for the top 5 IPs, top 5 MACs and top 5 protocols that were traversing your network – during the selected time interval. | The “Top users” screen is a great place to start your generic troubleshooting workflow. | ||
| Toggling between tables and graphs,  | |||
| The top users screen provides you with high-level information about what is going on in your network. | |||
| On this page you will find trending graphs and tables, depicting total packets and bytes for the top 5 IPs, top 5 MACs and top 5 protocols that were traversing your network – during the selected time interval. | |||
| Toggling between tables and graphs is very easy, by simply clicking the respective icon next to the widget’s caption. | |||
| [[File:Top_Sending.png|alt=|none|frame]] | [[File:Top_Sending.png|alt=|none|frame]] | ||
| When troubleshooting or better understanding network behavior,  | |||
| When troubleshooting or better understanding network behavior, most of the times it makes sense to “take a step back” and look at the bigger picture or larger trend.   | |||
| To accomplish this with your Allegro Network Multimeter, you can switch the -viewable timeframe- at in the top right-hand corner of the web interface.   | |||
| [[File:Time_frame.png|alt=|none|frame]] | [[File:Time_frame.png|alt=|none|frame]] | ||
| For identical time frames,  | Let us for example change the viewable timeframe from 1 minute LIVE to “1 day LIVE” or “Last day”.        | ||
| Now we have a clear overview of the TOP talkers over a 1 day timeframe.        | |||
| For identical time frames, e.g. “1 Day LIVE” and “Last Day”, both view modes will display exactly the same graphs.        | |||
| However, in table view (depicted below) the practical difference between the two becomes very clear.        | |||
| [[File:Top talker 2.png|1000x1000px|alt=]] | [[File:Top talker 2.png|1000x1000px|alt=]] | ||
| Line 40: | Line 60: | ||
| As becomes clear from the above illustration, LIVE-view will display TOP Talker information -for the selected LIVE timeframe- (in  | As becomes clear from the above illustration, LIVE-view will display TOP Talker information -for the selected LIVE timeframe- (in this case 10 minutes), | ||
| The download buttons, that you find everywhere throughout the Allegro web interface, give you quick and easy access to pre-filtered Pcap files. | |||
| accompanied with live traffic indicators based on packets per second and bits per second. | |||
| When selecting the “Last 10 minutes” view mode, the TOP talkers will be accompanied by total traffic in packets and Bytes – during the selected time frame. | |||
| This can be of great help to more quickly and easily identify communication relations. | |||
| The download buttons, that you find everywhere throughout the Allegro Network Multimeter web interface, give you quick and easy access to pre-filtered Pcap files. | |||
| [[File:Get_Pcap.png|alt=|none|frame]] | [[File:Get_Pcap.png|alt=|none|frame]] | ||
| With the pcap download buttons, pcap-files can be retroactively (back-in-time) extracted out of the Allegro Network Multimeter ring buffer. | |||
| The download buttons can also be used to initiate pre-filtered Live captures. | |||
| E.g. clicking the download button next to IP 192.168.178.101, will initiate a capture that is already pre-filtered to only capture traffic containing that IP during the selected time interval. | |||
| Again, such time interval may be in the past, as the Allegro Network Multimeter is able to extract the requested packets from its packet ring buffer (if the specific time frame and traffic was recorded). | |||
| == IP details page == | == IP details page == | ||
| If you want very detailed information about a certain IP, go to the IP-details page of that specific IP. This is easily done, by clicking on an IP, everywhere throughout the Allegro web interface. This will bring you to the IP-details page of that specific IP-address. The IP-details page, gives you 1-click access to all sorts of network performance information -during the selected time frame-. The different tabs that you can go through on the IP-details page, are highlighted in green in the image below.    | |||
| If you want very detailed information about a certain IP, go to the IP-details page of that specific IP.   | |||
| This is easily done, by clicking on an IP, everywhere throughout the Allegro Network Multimeter web interface. This will bring you to the IP-details page of that specific IP-address.   | |||
| The IP-details page, gives you 1-click access to all sorts of network performance information -during the selected time frame-.   | |||
| The different tabs that you can go through on the IP-details page, are highlighted in green in the image below.   | |||
| [[File:IP details TABS.png|1200x1200px]]   | [[File:IP details TABS.png|1200x1200px]] | ||
| Click on the image below for an enlarged view of a full IP details page.    | Click on the image below for an enlarged view of a full IP details page.    | ||
| [[File:IP details.png|none|thumb|495x495px]] | [[File:IP details.png|none|thumb|495x495px]] | ||
| As you can see, it is very easy to look into and investigate the (mis)use of QoS and protocols  | |||
| As you can see, it is very easy to look into and investigate the (mis)use of QoS and protocols on a per IP basis. | |||
| From the IP details page, you can also quickly and easily look into communication relations on connection/flow level, and even take a deep dive into the TCP-statistics for that IP. | |||
| == Quality dashboard == | == Quality dashboard == | ||
| For quality and performance assessment,  | For quality and performance assessment, the Allegro Network Multimeter quality dashboard is a great place to start. | ||
| All of the most important graphs, related to high level quality and performance monitoring/troubleshooting, are gathered on this page.  | |||
| Line 70: | Line 128: | ||
| === <u>Burst Analysis</u> === | === <u>Burst Analysis</u> === | ||
| The first graph on  | The first graph on Allegro Network Multimeter’s predefined quality dashboard, represents “Burst Analysis”. | ||
| Because the Allegro Network Multimeter supports data measurement intervals (sampling rates), as detailed as 1 ms, you can identify instances where a Link is 100% saturated, for very short fractions of time. | |||
| Evidently, micro bursts could potentially be (part of) the root cause of network performance issues. | |||
| Other than Allegro Packets, most monitoring & troubleshooting solutions are unable to pick this up, because of “low resolution” data sampling (i.e. ≥ 5 minutes). | |||
| === <u>Response times</u> === | === <u>Response times</u> === | ||
| The second graph provides you with trending information about global response times for TCP and HTTP, SSL, DNS plus DHCP. Clicking on “Application”, will bring you to the response time overview page, where trending response time graphs for HTTP, SSL, DNS and DHCP are individually presented. | The second graph provides you with trending information about global response times for TCP and HTTP, SSL, DNS plus DHCP. | ||
| Clicking on “Application”, will bring you to the response time overview page, where trending response time graphs for HTTP, SSL, DNS and DHCP are individually presented. | |||
| [[File:Response times.png|1000x1000px]] | [[File:Response times.png|1000x1000px]] | ||
| Line 80: | Line 148: | ||
| From here, it is very easy to identify -and zoom in on- timing related issues that happened on the network. In the 1-day time frame exampled above, clearly HTTP and DHCP show instances where response time deviated massively from the overall median line.   | From here, it is very easy to identify -and zoom in on- timing related issues that happened on the network. | ||
| You can select such a spike in the graph by clicking and holding the left mouse-button, selecting the spike and then releasing the left mouse-button. When zoomed in to your liking, click on the graphs title (e.g. DHCP) which will bring you to that specific details page. | |||
| In the 1-day time frame exampled above, clearly HTTP and DHCP show instances where response time deviated massively from the overall median line. | |||
| You can select such a spike in the graph by clicking and holding the left mouse-button, selecting the spike and then releasing the left mouse-button. | |||
| When zoomed in to your liking, click on the graphs title (e.g. DHCP) which will bring you to that specific details page. | |||
| [[File:DHCP.png|1100x1100px]] | [[File:DHCP.png|1100x1100px]] | ||
| Because you already zoomed into to a specific time frame on the graph, this page will now only show you the client / DHCP-server relations, that happened during the time frame that you selected in the graph. Also on this page, you’ll find a download button for simple (retroactive) extraction of a Pcap, that is pre-filtered to only contain DHCP and BOOTP packets.          | |||
| Because you already zoomed into to a specific time frame on the graph, this page will now only show you the client / DHCP-server relations, that happened during the time frame that you selected in the graph.         | |||
| Also on this page, you’ll find a download button for simple (retroactive) extraction of a Pcap, that is pre-filtered to only contain DHCP and BOOTP packets.          | |||
| === <u>UDP Jitter & packet loss</u> === | === <u>UDP Jitter & packet loss</u> === | ||
| The next two graphs provide trending and actionable insights for UDP-based protocols RTP and Profinet. First up is the graph depicting Jitter over time. Bad jitter can have a very negative impact on business critical production services and on VoIP- / Unified Communication services. | The next two graphs provide trending and actionable insights for UDP-based protocols RTP and Profinet. First up is the graph depicting Jitter over time. | ||
| Bad jitter can have a very negative impact on business critical production services and on VoIP- / Unified Communication services. | |||
| [[File:Jitter.png|700x700px]] | [[File:Jitter.png|700x700px]] | ||
| Line 102: | Line 184: | ||
| === <u>TCP retransmissions/packet loss</u> === | === <u>TCP retransmissions/packet loss</u> === | ||
| The next two graphs provide trending visibility and information about TCP packet loss in your network. TCP retransmission are seen in all networks, it’s the amount of retransmission -and better yet the retransmission ratio in percent- that indicate if things are problematic in your network. This is why graphs for both TCP retransmissions in absolute numbers, as well as in ratio are presented to you.      | The next two graphs provide trending visibility and information about TCP packet loss in your network.     | ||
| TCP retransmission are seen in all networks, it’s the amount of retransmission -and better yet the retransmission ratio in percent- that indicate if things are problematic in your network.     | |||
| This is why graphs for both TCP retransmissions in absolute numbers, as well as in ratio are presented to you.     | |||
| [[File:Tcp.png|700x700px]] | [[File:Tcp.png|700x700px]] | ||
| Line 110: | Line 198: | ||
| As a reference; | As a reference; | ||
| For wired infrastructures, a retransmission ratio of up to 2% is generally accepted to still be okay. In wireless infrastructures however, retransmissions of up to 10% are very common and considered to be a well-functioning wireless network. | For wired infrastructures, a retransmission ratio of up to 2% is generally accepted to still be okay. | ||
| In wireless infrastructures however, retransmissions of up to 10% are very common and considered to be a well-functioning wireless network. | |||
| === <u>TCP Zero window</u> === | === <u>TCP Zero window</u> === | ||
| For identifying application performance  | For identifying application performance bottlenecks and/or server capacity issues, the “TCP Zero Window” graph is a very, very powerful instrument. | ||
| Here’s why… | |||
| TCP zero window packets are being sent out by a server (or client), whenever it cannot optimally handle the oncoming traffic any more. | |||
| Basically the servers' (or clients') receive buffer gets full and so it will notify every sending party to slow down – by means of TCP zero packets. | |||
| [[File:Zerowin2.png|1200x1200px]] | |||
| Couple of reasons for high (continuous) counts of TCP zero window packets, may be things like: | Couple of reasons for high (continuous) counts of TCP zero window packets, may be things like: | ||
| Line 127: | Line 222: | ||
| * Applications that are too slow or problematic and therefore are unable to keep up | * Applications that are too slow or problematic and therefore are unable to keep up | ||
| * Storage that is too slow or problematic, and therefore is unable to keep up. | * Storage that is too slow or problematic, and therefore is unable to keep up. | ||
| == IP statistics (all IPs) == | |||
| If you are looking for network information based on multiple or all IP addresses, or want to start your troubleshooting journey from an IPs perspective – the L3 IP Statistics page is the right place for you. | |||
| [[File:L3- IPs.png|1100x1100px]] | |||
| IPs (of interest) can quickly be found, by entering (part of) an IP or resolved-name information. | |||
| When searching for a singular IP or IP-range, add a subnet mask to the IP for optimal results. | |||
| E.g. searching for 192.168.178.1 will give you a filtered list with all IPs matching 192.168.178.1xx. To mitigate this, add a subnet mask like so 192.168.178.1/32. | |||
| On the IP Statistics page, it is also possible to only present IPs that match certain (quality) metrics. | |||
| To start filtering with a “complex filter”, start your entry in the search bar with a "(". The next possible inputs are then shown to you, as a form of help. | |||
| By using a “complex filter” in the search bar, you can narrow down the number of displayed IPs, based on the following parameters: | |||
| "name", "ip", "packets", "bytes", "pps", "bps", "firsttime", "lasttime", "tcppackets", "udppackets", "tcppayload", "tcpRetrans", "tcpRetransRx", "tcpRetransTx", "category", "vlan", "mpls", "outermpls", "innermpls", "interface", "validconnections", "invalidconnections", "tcpZeroWindowRx", "tcpZeroWindowTx", "ipgroup", "mtu", "mtuRx", "mtuTx", "tcpMissedData", "(" | |||
| When typing in complex filters, the use of and/or/must contain/exact match operators is allowed in the form of: AND, &&, OR, ||, ==, === | |||
| ''This page might be extended over time.'' | |||
Latest revision as of 07:21, 24 July 2023
Allegro Network Multimeter troubleshooting workflows
Every now and then we get asked, what a (generic) troubleshooting approach/workflow with an Allegro Network Multimeter would look like.
And, rightfully so, because the endless possibilities of an Allegro Network Multimeter might be overwhelming for some.
In this tutorial, we’ll go into several topics that might be of interest to you -the user- while working with an Allegro Network Multimeter.
The basics
It all starts with basic understanding about what’s actually presented on your screen.
When it comes to providing you with elementary yet essential and actionable troubleshooting insights, Allegro Packets has got you covered with the “Top users” and “Quality” dashboards.
Both can be found at the top of the control menu, at the left hand side of the web interface.
Above: access to “Top users” and “Quality” screens highlighted in the green box
TOP users
The “Top users” screen is a great place to start your generic troubleshooting workflow.
The top users screen provides you with high-level information about what is going on in your network.
On this page you will find trending graphs and tables, depicting total packets and bytes for the top 5 IPs, top 5 MACs and top 5 protocols that were traversing your network – during the selected time interval.
Toggling between tables and graphs is very easy, by simply clicking the respective icon next to the widget’s caption.
 
When troubleshooting or better understanding network behavior, most of the times it makes sense to “take a step back” and look at the bigger picture or larger trend.
To accomplish this with your Allegro Network Multimeter, you can switch the -viewable timeframe- at in the top right-hand corner of the web interface.
 
Let us for example change the viewable timeframe from 1 minute LIVE to “1 day LIVE” or “Last day”.
Now we have a clear overview of the TOP talkers over a 1 day timeframe.
For identical time frames, e.g. “1 Day LIVE” and “Last Day”, both view modes will display exactly the same graphs.
However, in table view (depicted below) the practical difference between the two becomes very clear.
As becomes clear from the above illustration, LIVE-view will display TOP Talker information -for the selected LIVE timeframe- (in this case 10 minutes),
accompanied with live traffic indicators based on packets per second and bits per second.
When selecting the “Last 10 minutes” view mode, the TOP talkers will be accompanied by total traffic in packets and Bytes – during the selected time frame.
This can be of great help to more quickly and easily identify communication relations.
The download buttons, that you find everywhere throughout the Allegro Network Multimeter web interface, give you quick and easy access to pre-filtered Pcap files.
With the pcap download buttons, pcap-files can be retroactively (back-in-time) extracted out of the Allegro Network Multimeter ring buffer.
The download buttons can also be used to initiate pre-filtered Live captures.
E.g. clicking the download button next to IP 192.168.178.101, will initiate a capture that is already pre-filtered to only capture traffic containing that IP during the selected time interval.
Again, such time interval may be in the past, as the Allegro Network Multimeter is able to extract the requested packets from its packet ring buffer (if the specific time frame and traffic was recorded).
IP details page
If you want very detailed information about a certain IP, go to the IP-details page of that specific IP.
This is easily done, by clicking on an IP, everywhere throughout the Allegro Network Multimeter web interface. This will bring you to the IP-details page of that specific IP-address.
The IP-details page, gives you 1-click access to all sorts of network performance information -during the selected time frame-.
The different tabs that you can go through on the IP-details page, are highlighted in green in the image below.
Click on the image below for an enlarged view of a full IP details page.
As you can see, it is very easy to look into and investigate the (mis)use of QoS and protocols on a per IP basis.
From the IP details page, you can also quickly and easily look into communication relations on connection/flow level, and even take a deep dive into the TCP-statistics for that IP.
Quality dashboard
For quality and performance assessment, the Allegro Network Multimeter quality dashboard is a great place to start.
All of the most important graphs, related to high level quality and performance monitoring/troubleshooting, are gathered on this page.
Burst Analysis
The first graph on Allegro Network Multimeter’s predefined quality dashboard, represents “Burst Analysis”.
Because the Allegro Network Multimeter supports data measurement intervals (sampling rates), as detailed as 1 ms, you can identify instances where a Link is 100% saturated, for very short fractions of time.
Evidently, micro bursts could potentially be (part of) the root cause of network performance issues.
Other than Allegro Packets, most monitoring & troubleshooting solutions are unable to pick this up, because of “low resolution” data sampling (i.e. ≥ 5 minutes).
Response times
The second graph provides you with trending information about global response times for TCP and HTTP, SSL, DNS plus DHCP.
Clicking on “Application”, will bring you to the response time overview page, where trending response time graphs for HTTP, SSL, DNS and DHCP are individually presented.
From here, it is very easy to identify -and zoom in on- timing related issues that happened on the network.
In the 1-day time frame exampled above, clearly HTTP and DHCP show instances where response time deviated massively from the overall median line.
You can select such a spike in the graph by clicking and holding the left mouse-button, selecting the spike and then releasing the left mouse-button.
When zoomed in to your liking, click on the graphs title (e.g. DHCP) which will bring you to that specific details page.
Because you already zoomed into to a specific time frame on the graph, this page will now only show you the client / DHCP-server relations, that happened during the time frame that you selected in the graph.
Also on this page, you’ll find a download button for simple (retroactive) extraction of a Pcap, that is pre-filtered to only contain DHCP and BOOTP packets.
UDP Jitter & packet loss
The next two graphs provide trending and actionable insights for UDP-based protocols RTP and Profinet. First up is the graph depicting Jitter over time.
Bad jitter can have a very negative impact on business critical production services and on VoIP- / Unified Communication services.
From this graphs, it is very easy to quickly identify quality issues, such as instances where jitter is above 20ms in networks where VoIP is being used.
TCP retransmissions/packet loss
The next two graphs provide trending visibility and information about TCP packet loss in your network.
TCP retransmission are seen in all networks, it’s the amount of retransmission -and better yet the retransmission ratio in percent- that indicate if things are problematic in your network.
This is why graphs for both TCP retransmissions in absolute numbers, as well as in ratio are presented to you.
As a reference;
For wired infrastructures, a retransmission ratio of up to 2% is generally accepted to still be okay.
In wireless infrastructures however, retransmissions of up to 10% are very common and considered to be a well-functioning wireless network.
TCP Zero window
For identifying application performance bottlenecks and/or server capacity issues, the “TCP Zero Window” graph is a very, very powerful instrument.
Here’s why… TCP zero window packets are being sent out by a server (or client), whenever it cannot optimally handle the oncoming traffic any more.
Basically the servers' (or clients') receive buffer gets full and so it will notify every sending party to slow down – by means of TCP zero packets.
Couple of reasons for high (continuous) counts of TCP zero window packets, may be things like:
- Too much oncoming traffic, relative to the NIC Link speed
- Applications that are too slow or problematic and therefore are unable to keep up
- Storage that is too slow or problematic, and therefore is unable to keep up.
IP statistics (all IPs)
If you are looking for network information based on multiple or all IP addresses, or want to start your troubleshooting journey from an IPs perspective – the L3 IP Statistics page is the right place for you.
IPs (of interest) can quickly be found, by entering (part of) an IP or resolved-name information.
When searching for a singular IP or IP-range, add a subnet mask to the IP for optimal results.
E.g. searching for 192.168.178.1 will give you a filtered list with all IPs matching 192.168.178.1xx. To mitigate this, add a subnet mask like so 192.168.178.1/32.
On the IP Statistics page, it is also possible to only present IPs that match certain (quality) metrics.
To start filtering with a “complex filter”, start your entry in the search bar with a "(". The next possible inputs are then shown to you, as a form of help.
By using a “complex filter” in the search bar, you can narrow down the number of displayed IPs, based on the following parameters:
"name", "ip", "packets", "bytes", "pps", "bps", "firsttime", "lasttime", "tcppackets", "udppackets", "tcppayload", "tcpRetrans", "tcpRetransRx", "tcpRetransTx", "category", "vlan", "mpls", "outermpls", "innermpls", "interface", "validconnections", "invalidconnections", "tcpZeroWindowRx", "tcpZeroWindowTx", "ipgroup", "mtu", "mtuRx", "mtuTx", "tcpMissedData", "("
When typing in complex filters, the use of and/or/must contain/exact match operators is allowed in the form of: AND, &&, OR, ||, ==, ===
This page might be extended over time.













