Alerts

FW Version: 7611+

Alerts can be generated by system automatically either

  • EMail alerts

  • Syslog alerts

  • SNMP Traps

Alert configuration file is located in

/opt/fmadio/etc/alert.lua

By default all Alert triggers are disabled.

Example Alert.lua

An example alert.lua file is shown below. If the file does not exist please create.

local L =
{
["AlertList"] =
{
    LinkState           = true,
    ByteCache           = 1e12,
    BytesOverflow       = true,
    PacketError         = true,
    PacketDrop          = true,
    CaptureState        = true,
    DiskSMART           = true,
    DiskFreeStore0      = 1e9,
    DiskFreeStore1      = 0,
    DiskFreeRemote0     = 0,
    CPUTemperature      = 80,
    FANAlert            = true,
    PSUAlert            = true,

    Sleep           = 60,                           -- how long to sleep when an alert is triggered. prevents flodding
}
}
return L

Triggers

System has can trigger an a small but well defined list of critical Events. The following is a description and example for each item. Triggers are enabled or disabled in the following part of the configuration file. Each line enables/disabled or puts a threshold on the trigger

["AlertList"] =
{
    LinkState           = true,
    CaptureState        = true,
    ByteCache           = 1e12,
    BytesOverflow       = true,
    PacketError         = true,
    PacketDrop          = true,
    DiskFreeStore0      = 1e9,
    DiskFreeStore1      = 0,
    DiskFreeRemote0     = 0,
    DiskSMART           = true,
    DiskError           = true,
    Sleep               = 60,
}

Each trigger is described below.

LinkState (Capture Port State)

Monitoring the capture link status is critical to ensure no data is lost. Enabling this option will alert when a capture link goes up or down.

Config

 LinkState       = true, 

SYSLOG

2021.07.05-07:53:49.790767 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Capture Link Cap2 State Change 1 -> 0

SNMP

fmadioCapture0Link
fmadioCapture1Link
fmadioCapture2Link
fmadioCapture3Link
fmadioCapture4Link
fmadioCapture5Link
fmadioCapture6Link
fmadioCapture7Link

CaptureState

Capture State shows the capture is active or in-active. When using in alert mode it will trigger anytime the capture state changes

Config

CaptureState    = true,

SYSLOG

2021.12.25-14:33:24.849212 (+09:00) | fmadio20v3-287 | local7.alert    | fmadio    | Alert     Capture State Change 0 -> 1

SNMP

fmadioCaptureEnable

Bytes Cached

Bytes Cached indicates how much capture data has been written to SSD, but not written back into long term storage yet. e.g. Its the delta between the capture SSD rate, and the HDD magnetic storage writeback. Trigger on for example 3TB here provides a good indication the HDD writeback process is running too slow for the sustained incoming capture rate.

Config

(example trigger once Cache goes overt 1TB)

 ByteCache       = 1e12,  

SYSLOG

2021.07.05-07:39:40.545686 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Bytes Cache Threadhold detected 4.000 GB (limit 1.000 G

SNMP

fmadioCaptureCache

BytesOverflow (trigger)

Any time Bytes Over increases an alert is generated. This typically a symptom of capture rates being too high, or HDD writeback too slow (or failing)

Config

   BytesOverflow   = true,

SYSLOG

2021.07.05-08:08:01.038273 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Bytes Overflow detected 3000000000 (prev:0)

SNMP

fmadioCaptureOverflow

PacketError

Counts FCS errors received on the interface. Any time packet error counts changes an alert is generated. Typically occurs when there are Layer1 link stability issues

Config

  PacketError     = true,  

SYSLOG

2021.07.05-08:09:51.481888 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Packet FCS Erors detected 0 (prev:1)

SNMP

fmadioCaptureError

PacketDrop

Alerts generated when packets are dropped on the capture device.

Config

PacketDrop      = true,

SYSLOG

2021.07.05-08:09:51.483071 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Packet Capture Drop detected 2 (prev:0)

SNMP

fmadioCaptureDrop

DiskFreeStore0

When space on /mnt/store0 partition is less than this amount (scientific notation) in bytes. Alerts are generated.

In the below example, an alert is generated when less than 4e9 (4GB) of space is free on /mnt/store0 partition

Config

    DiskFreeStore0  = 4e9,

SYSLOG

2021.07.05-08:32:10.876238 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     /mnt/store0 disk space low. Free 7160.188GB (7160188243968B) [Threshold 10000.000GB]

SNMP

fmadioDiskFreeStore0

DiskFreeStore1

When space on /mnt/store1 (scratch analytics workspace) is less than this amount (scientific notation) in bytes an Alert is generated

Config

 DiskFreeStore1    = 10e9,  

SYSLOG

2021.07.05-08:32:10.876238 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     /mnt/store1 disk space low. Free 7160.188GB (7160188243968B) [Threshold 10000.000GB]

SNMP

fmadioDiskFreeStore1

DiskFreeRemote0

When space on the /mnt/remote0 (typically NFS mount partition) is less than this threshold an Alert is generated

Config

 DiskFreeRemote0    = 10e9,  

SYSLOG

2021.07.05-08:32:10.876238 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     /mnt/remote0 disk space low. Free 7160.188GB (7160188243968B) [Threshold 10000.000GB]

SNMP

fmadioDiskFreeRemote0

DiskError

Alerts when there is a disk error or RAID error on the device. For example a disk has been lost or HDD RAID redundancy has been reduced.

Config

 DiskError       = true,   

SYSLOG

2021.07.05-08:05:34.224665 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Disk/Cache/RAID Error State 3 (prev:0)

SNMP

fmadioDiskSMART

DiskSMART

Alerts on the total number of disk SMART errors. The value is aggregated across all disks, please check the system log files for more details about which specific disk is having an issue.

Config

DiskSMART      = true,

SYSLOG

2021.07.05-08:05:34.224665 (+09:00) | fmadio20n40v3-363 | local7.alert    | fmadio    | Alert     Disk[0] SMART dError: 1 Total Errors 1

SNMP

fmadioDiskSMART

Sleep

Minimum number of seconds between alert generation. This is to prevent spamming of alerts due to unexpected system conditions.

SYSLOG Alerts

Alert events are always output to SYSLOG regardless of the other transport modes (email/snmp etc)

SYSLOG logfile is found in

/mnt/store0/messages

An example syslog alerts as follows.

2021.05.24-17:44:20.001457 (+09:00) | fmadio20v3-287 | local7.alert    | fmadio    | Alert     /mnt/store0 disk space low. Free 60.127GB (60127121408B) [Threshold 100.000GB]
2021.05.24-17:44:20.002903 (+09:00) | fmadio20v3-287 | local7.alert    | fmadio    | Alert     /mnt/store1 disk space low. Free 895.712GB (895712038912B) [Threshold 1000.000GB]
2021.05.24-17:44:25.444418 (+09:00) | fmadio20v3-287 | local7.alert    | fmadio    | Alert     /mnt/store0 disk space low. Free 60.127GB (60126965760B) [Threshold 100.000GB]
2021.05.24-17:44:25.446463 (+09:00) | fmadio20v3-287 | local7.alert    | fmadio    | Alert     /mnt/store1 disk space low. Free 895.712GB (895712038912B) [Threshold 1000.000GB]

EMAIL Alerts

Email alerts can be setup as the following, please add the ["Email"] section in the alet configuration file

/opt/fmadio/etc/alert.lua

An example that sends alerts to the address "alerts@fmad.io" is shown below.

local L =
{
["Email"] =
{
    Enable  = true,
    To      = "alerts@fmad.io",
    From    = "packet_capture@fmad.io",
}
,
["AlertList"] =
{
    BytesOverflow   = true,
    PacketError     = true,
    PacketDrop      = true,
    DiskFreeStore0  = 4e9,
    Sleep           = 60, 
}
}
return L

Email Server

In addition fmadio packet capture system uses msrtp as the email client, it requires smtp configuration file

/opt/fmadio/etc/msmtp.rc

Example configuration as follows. Please edit to match the email smtp provider

defaults
tls on
tls_certcheck off
logfile /mnt/store0/log/msmtp.log
tls_starttls on

account default
host mail.yourserver.com
port 587
auth on
user fmadio@yourserver.com
password <sercrets>

SNMP Broadcast

FW: 7611+

FMADIO devices can operate in SNMP Broadcast mode. In this mode the system will periodically broadcast all SNMP counter values at a fixed time interval to an SNMP target.

SNMP MIB

Latest MIB file is found (last updated 2021/12/25)

Config

The general configuration file is used for config

/opt/fmadio/etc/time.lua

Please edit the section titles ["SNMP"] as follows

["SNMP"] =
{
    ["Enable"]           = false,
    ["Trap"]             = false,
    ["Broadcast"]        = true,
    ["BroadcastPeriod"]  = 60e9,
    ["Verbose"]          = false,
    ["Target"]           = "127.0.0.1",
    ["ComName"]          = "public",
},

The above config enables SNMP Broadcast mode only, while SNMP Trap(Alert) mode is disabled. Broadcast frequency is 60e9 nanoseconds, e.g. every 1 minute.

Broadcast and Trap mode can be use simultaneously if required.

Please update ["Target"] = setting to the correct SNMP collector address. Multiple SNMP targets can be specified separated by spaces. For example

    ["Target"]     = "127.0.0.1 127.0.0.2 127.0.0.3",

Example output in broadcast mode is as follows, from the /mnt/store0/log/monitor_alert.cur logfile

This translates to

Troubleshooting

Logfiles are found /mnt/store0/log/monitor_alert.cur

Verbose mode above can be set to "true" to allow additional logging.

SNMP Trap

FW: 7611+

FMADIO Devices can send SNMP Traps based on the alert triggers described above. This may be preferable to email alerts for infrastructure management.

SNMP MIB

Latest MIB file is found (last updated 2021/12/25)

Config

The general configuration file is used for config

/opt/fmadio/etc/time.lua

Please edit the section titles ["SNMP"] as follows

["SNMP"] =
{
    ["Enable"]           = false,
    ["Trap"]             = true,
    ["Broadcast"]        = false,
    ["BroadcastPeriod"]  = 60e9,
    ["Verbose"]          = false,
    ["Target"]           = "127.0.0.1",
    ["ComName"]          = "public",
},

The above config enables SNMP TRAP mode only, SNMP Broadcast mode is disabled. This configuration will only send SNMP TRAP events when a Trigger is alerted.

Please update ["Target"] = setting to the correct SNMP collector address.

Troubleshooting

An easy way to trouble shoot traps is to se the DiskFreeStore0 threshold to a very large number. In this setup the SNMP TRAP event will be constantly generated (every 1 minute).

Logfiles are found in /mnt/store0/log/monitor_alert.cur

Last updated