We are having a new check proposed by one of our customers who had an issue with a single process eating up all the CPU time on a filer. It’s easy to identify the culprit once you are on the command-line of the filer (priv-mode) by issuing the ps
command. To automate that sort of monitoring and getting an alarm immediately if a process is getting out of control we offer check_netapp_processes now.
Example
This is how one would check using the default thresholds:“`
$ ./check_netapp_process.pl -H filer -s filer-01 -u admin -p ******
NETAPP_PROCESS CRITICAL – 1064 processes checked, 2 critical and 0 warning
idle: cpu0: 102.0 (CRITICAL)
idle: cpu1: 102.0 (CRITICAL)
ontap_dead_bsd_thre: 1.0
worker_thread_38: 0.0
iswts_sockio: 0.0
SMBOff […] | worker_thread_38=0.00%;20;50;0;100 iswts_sockio=0.00%;20;50;0;100 wafl_blog_early_kickout_worke=0.00%;20;50;0;100
Filtering the processes
-----------------------
Since the list is quite long filters can be set by means of \--exclude and \--include. E.g. if you do **not** want to check the _idle_\-procesess you would configure the check like this:```
$ ./check\_netapp\_process.pl -H filer ... -X ^idle:
```For other tips on how to deal with the very long output of that check have look into the article [Overly Long Outputs.](https://blog.monitoring-plugins.pro/posts/overly-long-outputs/)
Availability
------------
This check will be available in the next unstable version (**3.10.1\_12**) for testing which we will release today. At the moment this check only supports cdot and not 7-mode. If you would like to get this check for 7m too, please provide us with the CLI commands used on 7m to get the process-list. You can try that with check\_netapp\_anycli and send us the `--in` values. (For your reference, these are the commands we use to get the list on **cdot**: `set advanced -confirmations off;node run -node <node-name> -command ps` )
Discover more from reviewer4you.com
Subscribe to get the latest posts to your email.