Problem:

The exachk report showed:
 
WARNING! The data collection activity appears to be incomplete for this exachk run. Please review the “Killed Processes” and / or “Skipped Checks” section and refer to “Appendix A – Troubleshooting Scenarios” of the “Exachk User Guide” for corrective actions. 

On Infiniband Switch – No Checks reported but “Killed Processes” and / or “Skipped Checks” exist
 
Exachk has a “watchdog” process that monitors exachk execution and will kill commands that exceed default timeouts to prevent “hangs”. Occasionally on a busy system, checks may be killed simply because the target of the check has not responded within the default timeout. These environment variables can be used to lengthen the default timeouts. The most common timeout environment variables are:

  • RAT_TIMEOUT (default 90 seconds, non-root individual commands)
  • RAT_ROOT_TIMEOUT (default 300 seconds, root userid command sets)
  • RAT_PASSWORDCHEK_TIMEOUT (default 1 second, ssh login DNS handshake)

 
 Solution:

Please make sure the connection between db server and IB switches.  Please rerun exachk at a quiet time when system is less busy.  Or update the default time out:
 
export RAT_TIMEOUT=120
export RAT_ROOT_TIMEOUT=600
export RAT_PASSWORDCHECK_TIMEOUT=10

Advertisements