Incidents: Difference between revisions

1,135 bytes added ,  15 March 2023
Restructure and update
(Update Kafka Settings)
(Restructure and update)
Line 1: Line 1:


[[File:Incidents_list.png|alt=|none|thumb|800x800px|Incident page]]
[[File:Incidents_list.png|alt=|none|thumb|800x800px|Incident page]]
Incidents are used to alarm the user when configured network events occur, usually for traffic based rules, but also for system-specific events. These notifications can be viewed in the web GUI and may also be delivered by email or syslog. Repeating incidents are counted as such and the time of the first and last occurrence of an incident is remembered. This feature can be disabled for some incidents. What makes an incident unique depends on the type of incident.  
Incidents are used to alarm the user when configured network events occur, usually for traffic based rules, but also for system-specific events. These notifications can be viewed in the web GUI and may also be delivered on various notification channels. Repeating incidents are counted as such and the time of the first and last occurrence of an incident is remembered. This feature can be disabled for some incidents. What makes an incident unique depends on the type of incident.  


The incident feature allows to define rules which are checked on the configured trigger point, like when a connection ends, a SIP call ends, or for checks on ongoing traffic. When such a trigger hits, configurable traffic attributes will be checked and if all attributes of a rule matches, an incident is created.
The incident feature allows to define rules which are checked on the configured trigger point, like when a connection ends, a SIP call ends, or for checks on ongoing traffic. When such a trigger hits, configurable traffic attributes will be checked and if all attributes of a rule match, an incident is created.


Occurred incidents can be seen in the web interface, and additionally reporting via email or syslog is possible too.
Occurred incidents can be seen in the web interface, additionally reporting via the following notification channels is possible:
* email
* Apache Kafka
* SNMP trap
* syslog


The first occurrence of a medium or high severity incident will also trigger a status notification which is visible at the top right of the web GUI.
The first occurrence of a medium or high severity incident will also trigger a status notification which is visible at the top right of the web GUI.
Line 11: Line 15:
Up to 1000 incidents will be remembered by the system and if this limit is exceeded the oldest incidents will be discarded.
Up to 1000 incidents will be remembered by the system and if this limit is exceeded the oldest incidents will be discarded.


== Rule configuration ==
== Configuration of incident rules ==
[[File:Incidents rules.png|thumb|600x600px|Rule configuration]]
[[File:Incidents rules.png|thumb|600x600px|Rule configuration]]
Incident rules can be defined in the "Configuration of incident rules" tab in the menu "Generic -> Incidents". All changes to the rule configuration will only take affect after saving the current configuration by clicking on the save button at the bottom of the page.
Incident rules can be defined in the "Configuration of incident rules" tab in the menu "Generic -> Incidents". All changes to the rule configuration will only take effect after saving the current configuration by clicking on the save button at the bottom of the page.


The page shows a table containing the existing rules and their configuration.
The page shows a table containing the existing rules and their configuration.
Line 25: Line 29:
A rule is defined by the following settings:
A rule is defined by the following settings:


* Rule text: This is an arbitrary text describing the purpose of the rule. This text is shown in the incident list and email/syslog ouptut.
* Rule name: This is an arbitrary text describing the purpose of the rule. This text is shown in the incident list and email/Kafka/SNMP trap/syslog ouptut.
* Severity: three different severity values "low", "medium", and "high" can be used to group more important and less important incidents. Reporting channels can be configured to only report incidents of a minimum severity level.  A rule can also be disabled by choosing the severity level "disabled". It will not be evaluated and can be enabled later at will.
* Severity: three different severity values "low", "medium", and "high" can be used to group more important and less important incidents. Reporting channels can be configured to only report incidents of a minimum severity level.  A rule can also be disabled by choosing the severity level "disabled". It will not be evaluated and can be enabled later at will.
* Trigger: The trigger defines when a rule is evaluated. For each available trigger, a description is shown next to it giving more details about the trigger.  Some triggers are evaluated at a very specific time, like when a VoIP call ends, or are evaluated regularly like for throughput triggers of IP traffic which can be configured to be checked once very minute or hour or so.  See list below for a detailed description of the available triggers.
* Trigger: The trigger defines when a rule is evaluated. For each available trigger, a description is shown next to it giving more details about the trigger.  Some triggers are evaluated at a very specific time, like when a VoIP call ends, or are evaluated regularly like for throughput triggers of IP traffic which can be configured to be checked periodically.  See list below for a detailed description of the available triggers.
* Attributes: Attributes are used to make actual comparison of expected values vs. actual values.
* Attributes: Attributes are used to make actual comparison of expected values vs. actual values.
** Each trigger has a different set of attributes which can be checked for, and some triggers don't need to have an attribute at all.  See list below for a detailed description of the available attributes
** Each trigger has a different set of attributes which can be checked for, and some triggers don't need to have an attribute at all.  See list below for a detailed description of the available attributes
Line 33: Line 37:
** Multiple attributes must all match at the same time to let the rule create an incident.
** Multiple attributes must all match at the same time to let the rule create an incident.
** Each attribute can be compared to a specific value, so that the actual value is lower, equal, or greater than a defined value.
** Each attribute can be compared to a specific value, so that the actual value is lower, equal, or greater than a defined value.
** Some attributes have an additional parameter, like a timespan which defines how the attribute value is calculated.
** Some attributes have an additional parameter, like a time span which defines how the attribute value is calculated.
* Virtual link group: The rule can be limited to a selected [[Virtual Link Group functionality|virtual link group]] or to be applied for any group.  Some triggers cannot be limited to a virtual link group so the configuration will be hidden.
* Virtual link group: The rule can be limited to a selected [[Virtual Link Group functionality|virtual link group]] or to be applied for any group.  Some triggers cannot be limited to a virtual link group so the configuration will be hidden.
* IP filter: Depending on the selected trigger, the rule can be limited to a specific IP address. In firmware version >= 3.7, the IP filter can also be an IP subnet in the format IP/masklen (Example: 10.0.0.0/8)
* IP filter: Depending on the selected trigger, the rule can be limited to a specific IP address. In firmware version >= 4.0, the IP filter can also be an IP subnet in the format IP/mask-length (Example: 10.0.0.0/8)
* IP group: Depending on the selected trigger, the rule can be apply to IP group instead of individual IP address.
* IP group: Depending on the selected trigger, the rule can be applied to an IP group instead of an individual IP address.
* Virtual link group, IP and IP filter can also be used inversely by using the != comparator
* Virtual link group, IP and IP filter can also be used inversely by using the != comparator
* Report channel: Incidents are always visible in the web interface, but can also be reported via multiple channels which can be configured separately in the tab "Configuration of notification channels".  Up to ten channels can be selected so that the incident for this rule is reported on each channel.  Also, no channel can be configured so the incident is only accessible on the web interface.
* Report channel: Incidents are always visible in the web interface, but can also be reported via multiple channels which can be configured separately in the tab "Configuration of notification channels".  Up to ten channels can be selected so that the incident for this rule is reported on each channel.  Also, no channel can be configured so the incident is only accessible on the web interface.
* Aggregation: Incidents are aggregated by default. This means the table only shows the number of incidents of the type and the timestamps of the first and the last  incident. This can be disabled for most of the incidents, so that you are able to see every indent of the incident-type.
* Aggregation of recurring Incidents: Incidents are aggregated by default. This means the table only shows the number of incidents of the type and the timestamps of the first and the last  incident. This can be disabled for most of the incidents, so that you are able to see every indent of the incident-type.
* Traffic capturing [since version >= 3.7]: If supported by the trigger, the rule can be configured to capture the network traffic triggering the rule, including some extra time before and after the incident.
* Traffic capturing [since version >= 4.0]: If supported by the trigger, the rule can be configured to capture the network traffic triggering the rule, including some extra time before and after the incident.
** Possible options:
** Possible options:
*** Disabled: capturing is disabled for this rule
*** Disabled: capturing is disabled for this rule
Line 47: Line 51:
*** Always: capturing happens in all traffic processing types.
*** Always: capturing happens in all traffic processing types.
** Extra capture time: configure the number of seconds before the start of the incident and after the end of the incident.
** Extra capture time: configure the number of seconds before the start of the incident and after the end of the incident.
*** If a timespan parameter is used for attributes, the capture time includes this time duration as well.
*** If a time span parameter is used for attributes, the capture time includes this time duration as well.
** The traffic is automatically filtered to only contain the traffic that actually triggered the rule, i.e., an IP address or an IP group for IP rules.
** The traffic is automatically filtered to only contain the traffic that actually triggered the rule, i.e., an IP address or an IP group for IP rules.


Line 55: Line 59:
!Trigger name
!Trigger name
!Description
!Description
!Attributes
!Attribute usage
!Attribute usage
|-
|-
|mac_traffic
|ARP: MAC change for an IP<br>
|This trigger is checked continuously for each active MAC address. The update interval is defined by the timespan parameter of the attributes.
(arp_ip_mac_changed)
|This trigger is checked on an ARP response and MAC address changed for a requested IP.
|time_since_last_mac
|optional
|-
|DNS: Server is not responding<br>
(dns_server_not_responding)
|This trigger is checked when a DNS server is not responding for some time. A server is considered unresponsive when more than 3 requests to the DNS server went unanswered for a period of more than 5 seconds. Such a server must have answered at least two requests previously.
|time_since_first_unanswered_request
|optional
|-
|DNS: Server response error<br>
(dns_server_response_error)
|This trigger is checked when a DNS server responds a configured error of type format error, server failure or non-existing domain
|error_type
|mandatory
|-
|Global: Connection start<br>
(global_new_connection)
|This trigger is checked continuously at connection start. It can be used to report new connections with a certain layer 4 protocol and a given port range.
|l4_protocol, port_range, since_start_time
|mandatory
|-
|Global: GPS synchronization status change<br>
(global_gps_sync_status_change)
|This trigger is checked when the GPS clock synchronization status changes.
|gps_sync_status
|optional
|-
|Global: Number of connections<br>
(global_connections)
|This trigger is checked continuously whether the amount of newly created connections exceeds a threshold. The update interval is defined by the timespan parameter of the attributes.
|new_connections
|mandatory
|-
|Global: Regular expressions<br>
(global_regex_match)
|This trigger allows to configure a list of regular expressions and is checked for each packet whose L7 data matches one of the regular expressions in the list. Since there are no attributes associated with this trigger, this effectively means that any packet which matches one of the regular expressions will result in an incident. The incident also contains information about which connection this packet belongs to as well as which of the regular expressions matches the packet.
|
|no attributes are available for this trigger
|-
|Global: Ring buffer<br>
(global_ring_buffer)
|This trigger is checked continuously to report changes in the ring buffer.
|used_size, bytes_captured, bytes_dropped
|mandatory
|mandatory
|-
|-
|mac_new_address
|Global: Speed change of an interface<br>
|This trigger is checked once when a new unicast MAC address appears for the first time.
(global_interface_speed_change)
|This trigger is checked when the speed of an interfaces changes.
|interface_speed
|optional
|optional
|-
|-
|mac_new_l7_protocol
|Global: Speed mismatch for an interface pair<br>
|This trigger is checked when a unicast MAC address uses a l7 protocol for the first time.
(global_interface_speed_mismatch)
|This trigger is checked when the status or speed of an interfaces changes and mismatches the speed of corresponding interface of a link.
|link_speed_difference
|optional
|optional
|-
|-
|arp_ip_mac_changed
|Global: Status change of an interface<br>
|This trigger is checked on an ARP response and MAC address changed for a requested IP.
(global_interface_status_change)
|This trigger is checked when the status of an interfaces changes.
|interface_status
|optional
|optional
|-
|-
|ip_flow_end
|Global: Traffic<br>
(global_traffic)
|This trigger is checked continuously for the total traffic of the device. The update interval is defined by the timespan parameter of the attributes.
|throughput, throughput_increase, packet_rate, packet_rate_increase
|mandatory
|-
|IP: Connection end<br>
(ip_flow_end)
|This trigger checks the attributes whenever an IP flow ended.
|This trigger checks the attributes whenever an IP flow ended.
|total_packets, total_bytes, tcp_handshake_time, percent_retransmissions, zero_window_packets, duration
|mandatory
|mandatory
|-
|-
|ip_flow_start
|IP: Connection start<br>
(ip_flow_start)
|This trigger checks the attributes whenever an IP flow starts.
|This trigger checks the attributes whenever an IP flow starts.
|new_connections
|mandatory
|mandatory
|-
|-
|ip_traffic
|IP: Local IP with multiple MAC addresses<br>
|This trigger is checked continuously for each active IP or IP group. The update interval is defined by the timespan parameter of the attributes.
(ip_local_ip_multiple_macs)
|mandatory
|This trigger is checked on each new flow of a local IP address and more than one MAC address uses this IP.
|mac_count
|optional
|-
|-
|ip_new_local_ip
|IP: New local IP address<br>
(ip_new_local_ip)
|This trigger is checked once for each new IP belonging to a private network address range.
|This trigger is checked once for each new IP belonging to a private network address range.
|since_start_time
|optional
|optional
|-
|-
|ip_new_local_l7_protocol
|IP: New local L7 protocol<br>
(ip_new_local_l7_protocol)
|This trigger is checked once for each new l7 protocol used by a local IP.
|This trigger is checked once for each new l7 protocol used by a local IP.
|since_start_time
|optional
|optional
|-
|-
|ip_local_ip_multiple_macs
|IP: TCP handshake<br>
|This trigger is checked on each new flow of a local IP address and more than one MAC address uses this IP.
(ip_tcp_handshake)
|optional
|-
|ip_tcp_handshake
|This trigger is checked after successful TCP handshake.
|This trigger is checked after successful TCP handshake.
|handshake_time, server_handshake_time, client_handshake_time
|mandatory
|mandatory
|-
|-
|qos_traffic
|IP: Traffic on IP addresses<br>
|This trigger is checked continuously for each active QoS class. The update interval is defined by the timespan parameter of the attributes.
(ip_traffic)
|This trigger is checked continuously for each active IP or IP group. The update interval is defined by the timespan parameter of the attributes.
|throughput, throughput_increase, packet_rate, packet_rate_increase, total_packets, total_bytes, retransmission_ratio, zero_window_packets, tcp_syn_packets, tcp_fin_packets, tcp_rst_packets
|mandatory
|mandatory
|-
|-
|dns_server_not_responding
|LACP: Status change of a channel<br>
|This trigger is checked when a DNS server is not responding for some time. A server is considered unresponsive when more than 3 requests to the DNS server went unanswered for a period of more than 5 seconds. Such a server must have answered at least two requests previously.
(lacp_channel_status_change)
|This trigger is checked when the status of a LACP port channel changes.
|channel_status
|optional
|optional
|-
|-
|dns_server_response_error
|MAC: New L7 protocol<br>
|This trigger is checked when a DNS server responds a configured error of type format error, server failure or non-existing domain
(mac_new_l7_protocol)
|mandatory
|This trigger is checked when a unicast MAC address uses a l7 protocol for the first time.
|-
|since_start_time
|sip_call_end
|This trigger is checked when a SIP call ended.
|mandatory
|-
|global_interface_status_change
|This trigger is checked when the status of an interfaces changes.
|optional
|optional
|-
|-
|global_interface_speed_change
|MAC: New MAC address<br>
|This trigger is checked when the speed of an interfaces changes.
(mac_new_address)
|This trigger is checked once when a new unicast MAC address appears for the first time.
|since_start_time
|optional
|optional
|-
|-
|global_interface_speed_mismatch
|MAC: Traffic on MAC addresses<br>
|This trigger is checked when the status or speed of an interfaces changes and mismatches the speed of corresponding interface of a link.
(mac_traffic)
|optional
|This trigger is checked continuously for each active MAC address. The update interval is defined by the timespan parameter of the attributes.
|broadcast_packet_rate
|mandatory
|-
|-
|global_gps_sync_status_change
|PPPoE: PPPoE Discovery traffic<br>
|This trigger is checked when the GPS clock synchronization status changes.
(pppoe_discovery_traffic)
|optional
|This trigger is checked continuously for PPPoE discovery traffic. The update interval is defined by the timespan parameter of the attributes.
|pppoe_discovery_packets
|mandatory
|-
|-
|global_traffic
|PTP: Timestamp packet<br>
|This trigger is checked continuously for the total traffic of the device. The update interval is defined by the timespan parameter of the attributes.
(ptp_timestamp_packet)
|This trigger is checked when a PTP packet containing a valid timestamp is seen.
|time_offset
|mandatory
|mandatory
|-
|-
|global_regex_match
|QOS: Traffic on QoS classes<br>
|This trigger allows to configure a list of regular expressions and is checked for each packet whose L7 data matches one of the regular expressions in the list. Since there are no attributes associated with this trigger, this effectively means that any packet which matches one of the regular expressions will result in an incident. The incident also contains information about which connection this packet belongs to as well as which of the regular expressions matches the packet.
(qos_traffic)
|no attributes are available for this trigger
|This trigger is checked continuously for each active QoS class. The update interval is defined by the timespan parameter of the attributes.
|throughput
|mandatory
|-
|-
|rtp_traffic
|RTP: Traffic for RTP connections<br>
(rtp_traffic)
|This trigger is checked continuously for traffic of each RTP connection. The update interval is defined by the timespan parameter of the attributes.
|This trigger is checked continuously for traffic of each RTP connection. The update interval is defined by the timespan parameter of the attributes.
|jitter, percent_loss
|mandatory
|mandatory
|-
|-
|ptp_timestamp_packet
|SIP: Call end<br>
|This trigger is checked when a PTP packet containing a valid timestamp is seen.
(sip_call_end)
|This trigger is checked when a SIP call ended.
|duration, status, mos, percent_loss, jitter, total_packets, total_bytes, total_caller_packets, total_callee_packets, total_caller_bytes, total_callee_bytes
|mandatory
|mandatory
|-
|-
|smb_v1_negotiation
|SMB: SMB1 negotiation<br>
(smb_v1_negotiation)
|This trigger is executed at the beginning of each SMB connection and checks whether insecure SMB1 has been negotiated.
|This trigger is executed at the beginning of each SMB connection and checks whether insecure SMB1 has been negotiated.
|
|none
|none
|-
|-
|global_connections
|SSL: Handshake<br>
|This trigger is checked continuously whether the amount of newly created connections exceeds a threshold. The update interval is defined by the timespan parameter of the attributes.
(ssl_handshake)
|mandatory
|-
|global_new_connection
|This trigger is checked continuously at connection start. It can be used to report new connections with a certain layer 4 protocol and a given port range.
|mandatory
|-
|ssl_handshake
|This trigger is checked during handshake of each SSL connection.
|This trigger is checked during handshake of each SSL connection.
|certificate_expires
|mandatory
|mandatory
|-
|pppoe_discovery_traffic
|This trigger is checked continuously for PPPoE discovery traffic. The update interval is defined by the timespan parameter of the attributes.
|mandatory
|-
|lacp_channel_status_change
|This trigger is checked when the status of a LACP port channel changes.
|optional
|}
|}


==== Special trigger properties ====
==== Special trigger properties ====
Some trigger are checked continuously every configured time span period, so the incidents are generated differently than for fixed event specific triggers like a call end.
Some triggers are checked continuously every configured time span period, so the incidents are generated differently than for fixed event specific triggers like a call end.


# Repeating incidents:  The following triggers will be evaluated every configured time span and will be re-issued whenever the configured attributes match.
# Repeating incidents:  The following triggers will be evaluated every configured time span and will be re-issued whenever the configured attributes match.
## ip_traffic
## mac_traffic
## mac_traffic
## ip_traffic
## qos_traffic
## qos_traffic
## rtp_traffic
## rtp_traffic
Line 185: Line 254:
## global_traffic
## global_traffic


So for repeating incidents you will get repeated incidents for the same attribute every time span. For example, if an IP address has traffic of 100 mbit/s for 2 minutes and a rule checks for more than 50 mbit/s over 30 seconds, the rule will hit 4 times. There will be one incident which will contain the exact number of repetitions for reference.
So for repeating incidents you will get repeated incidents for the same attribute every time span. For example, if an IP address has traffic of 100 Mbit/s for 2 minutes and a rule checks for more than 50 Mbit/s over 30 seconds, the rule will hit 4 times. There will be one incident which will contain the exact number of repetitions for reference.


For start/stop incidents, you will only see two rule hits and the incident description will state the start and stop time.
For start/stop incidents, you will only see two rule hits and the incident description will state the start and stop time.
Line 191: Line 260:
=== Available attributes ===
=== Available attributes ===


* mac_traffic
* '''broadcast_packet_rate''': The attribute is the number of packets per second on average over the configured timespan for MAC broadcast packets.
** broadcast_packet_rate: The attribute is the number of packets per second on average over the configured timespan for MAC broadcast packets.
* '''certificate_expires''': This is the number of days until the certificate expires. If the certificate is already expired, the value is <= 0.
* mac_new_address
* '''channel_status''': 0 means that the LACP port channel is not synchronized, 1 means that the LACP port channel is synchronized.
** since_start_time: This is number of seconds after packet processing start when the MAC address appeared. This is useful to only report new MAC address after  some learning time.
* '''duration''':
* mac_new_l7_protocol
** ''IP: Connection end'': The time between first and last packet of the flow.
** since_start_time: This is number of seconds after packet processing start when the MAC address appeared. This is useful to only report new MAC address after  some learning time.
** ''SIP: Call end'': The call duration.
* arp_ip_mac_changed
* '''error_type''': equal or not equal to:
** time_since_last_mac: This is number of seconds between changed MAC addresses. If, for examples, dynamic IP assignment is used, changing MAC addresses is normal so the test can be limited to only a certain amount of time.
** Format Error: DNS responds a format error.
* ip_flow_end
** Non-existent Domain: DNS could not find queried domain name.
** total_packets: The total number of packets seen for both directions of the flow.
** Server Failure: DNS responds server failure.
** total_bytes: The total number of bytes seen for both directions of the flow.
* '''gps_sync_status''': 0 means that the GPS clock in not synchronized, 1 means that the GPS clock is synchronized.
** tcp_handshake_time: The TCP handshake time.
* '''handshake_time''': The TCP handshake time between the first SYN packet and the ACK packet for the SYN/ACK packet of the server.
** percent_transmissions: The amount of TCP retransmission as a percentage of the total bytes.
** '''client_handshake_time''': The TCP handshake time between the SYN/ACK packet of the server and the ACK packet of the client.
** zero_window_packets: The number of packets with a TCP window of 0 for both directions of the flow.
** '''server_handshake_time''': The TCP handshake time between the first SYN packet of the client and the SYN/ACK packet of the server.
** duration: The time between first and last packet of the flow.
* '''interface_speed''': The current speed of the interface in Mbit/s.
* ip_flow_start
* '''interface_status''': 0 means interface is down, 1 means interface is up.
** new_connections: The amount of newly created connections (TCP and UDP) for the given timespan.
* '''jitter''':
* ip_traffic
** ''SIP: Call end'': The average jitter of the call, using the maximum value of both call sides.
** throughput: The throughput bandwidth in bit/s on average during the configured timespan.
** ''RTP: Traffic for RTP connections'': The average jitter of the RTP connection for the given timespan, using the maximum value of both directions.
** throughput_increase: The throughput bandwidth increase in % during the configured timespan compared to the average throughput of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
* '''l4_protocol''': The layer 4 protocol. Can be TCP, UDP or other.
** packet_rate: The packet rate in pps on average during the configured timespan.
* '''link_speed_difference''': This is the absolute difference between the speeds of both interface of a link in Mbit/s.
** packet_rate_increase: The packet rate increase in % during the configured timespan compared to the average packet rate of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
* '''mac_count''': The number of different MAC addresses for the corresponding IP address.
** total_packets: The number of packets seen in the configured timespan.
* '''mos''': The average MOS quality value of the call, using the minimum of both call sides.
** total_bytes: The number of bytes seen in the configured timespan.
* '''new_connections''': The amount of newly created connections (TCP and UDP) for the given timespan.
** retransmission_ratio: The TCP retransmission ratio seen in the configured timespan.
* '''packet_rate''': The packet rate in packets per second on average during the configured timespan.
** zero_window_packets: The number of zero window packets seen in the configured timespan.
* '''packet_rate_increase''': The packet rate increase in % during the configured timespan compared to the average packet rate of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
** tcp_syn_packets: The number of TCP SYN packets (RX + TX) seen in the configured timespan.
* '''percent_loss''':
** tcp_fin_packets: The number of TCP FIN packets (RX + TX) seen in the configured timespan.
** ''SIP: Call end'': The percentage of RTP packet loss for the call, accounting packets from both directions.
** tcp_rst_packets: The number of TCP RST packets (RX + TX) seen in the configured timespan.
** ''RTP: Traffic for RTP connections'': The percentage of RTP packet loss for the given timespan, accounting packets from both directions of the RTP connection.
* ip_new_local_ip
* '''percent_transmissions''': The amount of TCP retransmission as a percentage of the total bytes.
** since_start_time: This is number of seconds after packet processing start when the MAC address appeared. This is useful to only report new MAC address after  some learning time.
* '''port_range''': The TCP or UDP port. Can be also a range, e.g. 80,443,8443-8445
* ip_new_local_l7_protocol
* '''pppoe_discovery_packets''': The number of PPPoE discovery packets seen during the configured timespan.
** since_start_time: This is number of seconds after packet processing start when the MAC address appeared. This is useful to only report new MAC address after  some learning time.
* '''retransmission_ratio''': The TCP retransmission ratio seen in the configured timespan.
* ip_local_ip_multiple_macs
* '''since_start_time''':
** mac_count: The number of different MAC address for the corresponding IP address.
** ''MAC: New L7 protocol'': This is the number of seconds after packet processing start when a new Layer-7 protocol for the MAC address appeared.
* ip_tcp_handshake
** ''MAC: New MAC address'': This is the number of seconds after packet processing start when the MAC address appeared. This is useful to only report new MAC address after some learning time.
** handshake_time: The TCP handshake time between the first SYN packet and the ACK packet for the SYN/ACK packet of the server.
** ''IP: New local IP address'': This is the number of seconds after packet processing start when the IP address appeared. This is useful to only report new IP address after some learning time.
** server_handshake_time: The TCP handshake time between the first SYN packet of the client and the SYN/ACK packet of the server.
** ''IP: New local L7 protocol'': This is the number of seconds after packet processing start when the Layer-7 protocol for the IP address appeared.
** client_handshake_time: The TCP handshake time between the SYN/ACK packet of the server and the ACK packet of the client.
** ''Global: Connection start'': This is the number of seconds after packet processing start when the connection hast been started. This is useful to only report new connections after some learning time.
* qos_traffic
* '''status''': The call status code (a three digit number, like 200 for Success)
** throughput: The throughput bandwidth in bit/s on average during the configured timespan.
* '''tcp_handshake_time''': The TCP handshake time.
** total_packets: The number of packets seen in the configured timespan.
* '''tcp_fin_packets''': The number of TCP FIN packets (RX + TX) seen in the configured timespan.
** total_bytes: The number of bytes seen in the configured timespan.
* '''tcp_rst_packets''': The number of TCP RST packets (RX + TX) seen in the configured timespan.
* dns_server_not_responding
* '''tcp_syn_packets''': The number of TCP SYN packets (RX + TX) seen in the configured timespan.
** time_since_first_unanswered_request: This is the time span between when the trigger is checked and the first DNS request that has not been answered by the DNS server.
* '''throughput''': The throughput bandwidth in bit/s on average during the configured timespan.
* dns_server_response_error
* '''throughput_increase''': The throughput bandwidth increase in % during the configured timespan compared to the average throughput of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
** error_type: equal or not equal to:
* '''time_offset''': The time offset between the local time and the timestamp seen in the PTP packet.
*** Format Error: DNS responds a format error.
* '''time_since_first_unanswered_request''': This is the time span between when the trigger is checked and the first DNS request that has not been answered by the DNS server.
*** Non-existent Domain: DNS could not find queried domain name.
* '''time_since_last_mac''': This is the number of seconds between changed MAC addresses. If, for examples, dynamic IP assignment is used, changing MAC addresses is normal so the test can be limited to only a certain amount of time.
*** Server Failure: DNS responds server failure.
* '''total_bytes''':
* sip_call_end
** ''IP: Connection end'': The total number of bytes seen for both directions of the flow.
** duration: The call duration.
** ''IP: Traffic on IP addresses'', ''QOS: Traffic on QoS classes'', ''SIP: Call end'': The number of bytes seen in the configured timespan.
** status: The call status code (a three digit number, like 200 for Success)
** '''total_callee_bytes''': The number of bytes seen for the callee of the call.
** mos: The average MOS quality value of the call, using the minimum of both call sides.
** '''total_caller_bytes''': The number of bytes seen for the caller of the call.
** percent_loss: The percentage of RTP packet loss for the call, accounting packets from both directions.
* '''total_packets''':
** jitter: The average jitter of the call, using the maximum value of both call sides.
** ''IP: Connection end'': The total number of packets seen for both directions of the flow.
** total_packets: The number of packets seen in the configured timespan.
** ''IP: Traffic on IP addresses'', ''QOS: Traffic on QoS classes'', ''SIP: Call end'': The number of packets seen in the configured timespan.
** total_caller_packets: The number of packets seen for the caller of the call.
** '''total_callee_packets''': The number of packets seen for the callee of the call.
** total_callee_packets: The number of packets seen for the callee of the call.
** '''total_caller_packets''': The number of packets seen for the caller of the call.
** total_bytes: The number of bytes seen in the configured timespan.
* '''type''': The type of PPPoE discovery packet (PADI, PADO, PADR, PADS, PADT or any).
** total_caller_bytes: The number of bytes seen for the caller of the call.
* '''used_tls_version''': The TLS version (SSl 3.0, TLS 1.0, TLS 1.1, TLS 1.2 or TLS 1.3)
** total_callee_bytes: The number of bytes seen for the callee of the call.
* '''zero_window_packets''': The number of packets with a TCP window of 0 for both directions of the flow.
* global_interface_status_change
* '''zero_window_packets''': The number of zero window packets seen in the configured timespan.
** interface_status: 0 means interface is down, 1 means interface is up.
* global_interface_speed_change
** interface_speed: The current speed of the interface in mbit/s.
* global_interface_speed_mismatch
** link_speed_difference: This is the absolute difference between the speeds of both interface of a link in mbit/s.
* global_gps_sync_status_change
** gps_sync_status: 0 means that the GPS clock in not synchronized, 1 means that the GPS clock is synchronized.
* global_traffic
** throughput: The throughput bandwidth in bit/s on average during the configured timespan.
** throughput_increase: The throughput bandwidth increase in % during the configured timespan compared to the average throughput of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
** packet_rate: The packet packets/s on average during the configured timespan.
** packet_rate_increase: The packet rate increase in % during the configured timespan compared to the average packet rate of the given baseline timespan. The noise can be configured to allow deviations that should not lead to trigger the incident.
* rtp_traffic
** jitter: The average jitter of the RTP connection for the given timespan, using the maximum value of both directions.
** percent_loss: The percentage of RTP packet loss for the given timespan, accounting packets from both directions of the RTP connection.
* ptp_timestamp_packet
** time_offset: The time offset between the local time and the timestamp seen in the PTP packet.
* global_connections
** new_connections: The amount of newly created connections (TCP and UDP) for the given timespan.
* global_new_connection:
** l4_protocol: The layer 4 protocol. Can be TCP, UDP or other.
** port_range: The TCP or UDP port. Can be also a range, e.g. 80,443,8443-8445
** since_start time: This is number of seconds after packet processing start when the connection hast been started. This is useful to only report new connections after some learning time.
* ssl_handshake:
** certificate_expires: This is the number of days until the certificate expires. If the certificate is already expired, the value is <= 0.
* tls_version:
** used_tls_version: The TLS version (SSl 3.0, TLS 1.0, TLS 1.1, TLS 1.2 or TLS 1.3)
* pppoe_discovery_traffic
** pppoe_discovery_packets: The number of PPPoE discovery packets seen during the configured timespan.
** type: The type of PPPoE discovery packet (PADI, PADO, PADR, PADS, PADT or any).
* lacp_channel_status_change
** channel_status: 0 means that the LACP port channel is not synchronized, 1 means that the LACP port channel is synchronized.


=== Capture settings ===
=== Capture settings ===
Since firmware version 3.7, it is possible to automatically capture traffic for occurred incidents. These global settings control where capture files are stored and the capturing itself can be enabled for each rule separately.
Since firmware version 4.0, it is possible to automatically capture traffic for occurred incidents. These global settings control where capture files are stored and the capturing itself can be enabled for each rule separately.


The incident capture feature requires an active packet ring buffer since the packets are extracted from the buffer at the end of the incident period.
The incident capture feature requires an active packet ring buffer since the packets are extracted from the buffer at the end of the incident period.
Line 294: Line 331:
Available settings:
Available settings:


* Capture cooldown period: For each rule a cooldown period stops multiple captures happen in fast succession. By default, new captures happens firstly after 5 seconds, but any other value of at least 1 second can be configured. The cooldown is applied to each rule separately, but for each individual rule it does not matter if the same or a different entity triggers an incident. The incident is still reported within the cooldown period, but no additional capture is started.
* Capture cooldown period: For each rule a cooldown period prevents multiple captures from happening in fast succession. By default, new captures happen firstly after 5 seconds, but any other value of at least 1 second can be configured. The cooldown is applied to each rule separately, but for each individual rule it does not matter if the same or a different entity triggers an incident. The incident is still reported within the cooldown period, but no additional capture is started.
* Storage device: Select the storage device where the captures should be stored on.
* Storage device: Select the storage device where the captures should be stored on.
* Storage directory: Enter the directory where capture files should be stored or leave empty to use the top level directory.
* Storage directory: Enter the directory where capture files should be stored or leave empty to use the top level directory.
Line 307: Line 344:


* email: Incidents will be sent to the email address configured in the [[Global settings]].
* email: Incidents will be sent to the email address configured in the [[Global settings]].
* syslog: Incidents will be sent to the configured syslog server via TCP on port 514. Firmware >= 3.3: Configuration allows for TCP or UDP on any port.
* syslog: Incidents will be sent to the configured syslog server via TCP on any TCP or UDP port.
* kafka: The incidents are sent to a topic on the configured Apache Kafka server. Firmware >= 4.0. The message is the same as for syslog.
* kafka: The incidents are sent to a topic on the configured Apache Kafka server. Firmware >= 4.0. The message is the same as for syslog.
** Kafka Server configuration:  
** Kafka Server configuration:
*** Bootstrap Server: hostname/ip:port of a Kafka Broker or multiple Brokers separated by comma
*** Bootstrap Server: hostname/ip:port of a Kafka Broker or multiple Brokers separated by comma
*** Protocol: Plaintext (no authentication, no encryption), SASL Paintext (Plain authentication, no encryption), SASL SSL (Plain authentication, TLS/SSL encryption)
*** Protocol: Plaintext (no authentication, no encryption), SASL Paintext (Plain authentication, no encryption), SASL SSL (Plain authentication, TLS/SSL encryption)
Line 317: Line 354:
*** Topic: The name of the topic into which the Incidents are sent.
*** Topic: The name of the topic into which the Incidents are sent.
[[File:Incidents add channel.png|thumb|alt=|none|Adding a new channel]]
[[File:Incidents add channel.png|thumb|alt=|none|Adding a new channel]]
Each channel also uses a minimum severity settings so only incidents are reported which are of at least that severity.
Each channel also uses a minimum severity setting, so only incidents are reported which are of at least that severity.


Each channel can be configured to only handle incidents from live traffic or from replayed traffic.
Each channel can be configured to only handle incidents from live traffic or from replayed traffic.
Line 323: Line 360:
Some incidents cannot be configured via rules and you can choose to get those incidents also via email by enabling the settings at the lower part of the settings page.
Some incidents cannot be configured via rules and you can choose to get those incidents also via email by enabling the settings at the lower part of the settings page.


== Burst incident settings ==
== Interface burst incident ==
[[File:Incidents others.png|thumb|600x600px|Other incidents]]
[[File:Incidents others.png|thumb|600x600px|Other incidents]]
Burst incidents with milli-second resolution can be generated when the interface throughput exceeds a configurable threshold. The incident contains a graph of traffic for that interface with some data points before and after the threshold has been exceeded depending on the measurement interval. A PCAP link for capturing from the packet ring buffer is shown. For further investigation of that incident, the button Use as global time range can be used to set the global range to the start and end of the incident graph (at least 5 seconds) so that all modules of the Allegro Network Multimeter show that time span. The incident generation can be configured as follows:
Burst incidents with milli-second resolution can be generated when the interface throughput exceeds a configurable threshold. The incident contains a graph of traffic for that interface with some data points before and after the threshold has been exceeded depending on the measurement interval. A PCAP link for capturing from the packet ring buffer is shown. For further investigation of that incident, the button "Use as global time range" can be used to set the global range to the start and end of the incident graph (at least 5 seconds) so that all modules of the Allegro Network Multimeter show that time span. The incident generation can be configured as follows:
* '''throughput threshold exceeded''': report an incident if the throughput of any network interface exceeded.
* '''Report "throughput threshold exceeded" with severity''': report an incident with the selected severity level if the throughput of any network interface exceeded.
* '''Throughput threshold (Mbit/s)''': The threshold is configured in Mbit/s.
* '''Throughput threshold (Mbit/s)''': The threshold is configured in Mbit/s.
* '''How long throughput must be above threshold to generate incident (in milliseconds)''': The throughput must exceed the threshold for this duration in order to generate the incident. If set to zero (default) the incident is generated immediately after the threshold has been exceeded.
* '''How long throughput must be above threshold to generate incident (in milliseconds)''': The throughput must exceed the threshold for this duration in order to generate the incident. If set to zero (default) the incident is generated immediately after the threshold has been exceeded.
* '''Throughput cool-down period between two incidents in milliseconds''': Defines the time after an incident where no new incident is generated even if the threshold is exceeded. If this period is passed, throughput incidents could be generated again.
* '''Throughput cool-down period between two incidents in milliseconds''': Defines the time after an incident where no new incident is generated even if the threshold is exceeded. If this period is passed, throughput incidents could be generated again.


== Occured incident view ==
== Occured incidents ==
This page shows up to the last 1000 incidents occurred on the system. The table can be filtered for specific severity levels, as well as for specific trigger sources by selecting the trigger from the drop down menu.
This page shows up to the last 1000 incidents occurred on the system. The table can be filtered for specific severity levels, as well as for specific trigger sources by selecting the trigger from the drop down menu.
[[File:Incidents list filter.png|thumb|600x600px|Filter incidents by severity or trigger]]
[[File:Incidents list filter.png|thumb|600x600px|Filter incidents by severity or trigger]]
The list can also be filtered for the subject of the incident.
The list can also be filtered for the subject of the incident.


Individual incidents can be view in detail by clicking on the subject. The details page shows detailed information including links to the relevant measurement page.
Individual incidents can be viewed in detail by clicking on the subject. The details page shows detailed information including links to the relevant measurement page.


Incidents can be deleted individually by clicking on the delete button next to the incident, or all incident can be deleted by clicking on the button on the top right of the page.
Incidents can be deleted individually by clicking on the delete button next to the incident, or all incidents can be deleted by clicking on the button on the top right of the page.


== Rule statistics ==
== Statistics about incident rules ==
[[File:Incidents stats.png|thumb|600x600px|Statistics about rules]]
[[File:Incidents stats.png|thumb|600x600px|Statistics about rules]]
This page shows graphs about how often each rule has been hit both in absolute numbers as well as relatively to how often the rule has been checked.
This page shows graphs about how often each rule has been hit both in absolute numbers as well as relatively to how often the rule has been checked.
Line 350: Line 387:
Some technical limitations apply:
Some technical limitations apply:


* continuously checked triggers like "IP traffic" are only evaluated if there was at least one packet in the corresponding time interval. Therefore, rules check for zero packet count or throughput will never match.ys
* continuously checked triggers like "IP traffic" are only evaluated if there was at least one packet in the corresponding time interval. Therefore, rules check for zero packet count or throughput will never match.
183

edits