Handling watchdog resetsKB ID: 3013539 Version: 8.0 Published date: 01/15/2015 Views: 8592
Answer
A watchdog is an independent timer that monitors the progress of the main controller running Data ONTAP. Its function is to serve as an automatic server restart in the event the system encounters an unrecoverable system error. The watchdog implemented by NetApp uses a two-level timer with different actions associated with each level of time.
It is not necessary to ‘recover’ from a watchdog timeout or watchdog reset, as both of these events are recovery mechanisms for other failures. The objective instead is to identify the failure(s) that caused the watchdog event.
If the storage appliance receives a single watchdog reset, in general, no action needs to be taken as the condition causing the watchdog reset most often is a transient issue and would have been cleared by the reset process. A giveback should be performed if necessary, and the appliance should be monitored for repeat occurrences.
Note: No hardware should be replaced unless the root cause is a hardware issue.
Disclaimer NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
|
Handling Watchdog Resets
Leave a reply