Tag Archives: netapp

Configuring CNA ports and FC ports

Configuring CNA ports

If a node has onboard CNA ports or a CNA card, you must check the configuration of the ports and possibly reconfigure them, depending on how you want to use the upgraded system.

Before you begin

You must have the correct SFP+ modules for the CNA ports.

About this task

CNA ports can be configured into native Fibre Channel (FC) mode or CNA mode. FC mode supports FC initiator and FC target; CNA mode allows concurrent NIC and FCoE traffic the same 10GbE SFP+ interface and supports FC target. Continue reading

Handling Watchdog Resets

Back

Handling watchdog resets

KB ID: 3013539 Version: 8.0 Published date: 01/15/2015 Views: 8592

 

Answer

  1. What is a watchdog reset?

A watchdog is an independent timer that monitors the progress of the main controller running Data ONTAP. Its function is to serve as an automatic server restart in the event the system encounters an unrecoverable system error.

The watchdog implemented by NetApp uses a two-level timer with different actions associated with each level of time.

  • Level 1: Timeout: The storage appliance attempts to panic and dump the core in response to a non-maskable interrupt. Once a L1 watchdog is successfully issued, the system returns to service and a core file is written, allowing NetApp to determine the root cause of the hang. A L1 watchdog is issued if the timer is not reset within 1.5 seconds.
  • Level 2: Reset: The storage appliance resets through a hard reset signal sent from the timer. A L2 watchdog is issued if the watchdog timer is not reset within two seconds after the L1 watchdog.

It is not necessary to ‘recover’ from a watchdog timeout or watchdog reset, as both of these events are recovery mechanisms for other failures. The objective instead is to identify the failure(s) that caused the watchdog event.

  1. What is the appropriate response to a watchdog timeout (L1 Watchdog Event)?
    A watchdog timeout should be treated just like any other system panic. The associated backtrace and/or the core should be analyzed for the possible root cause(s). A giveback should be performed if necessary.
  2. What is the appropriate response to a watchdog reset (L2 Watchdog Event)?

If the storage appliance receives a single watchdog reset, in general, no action needs to be taken as the condition causing the watchdog reset most often is a transient issue and would have been cleared by the reset process. A giveback should be performed if necessary, and the appliance should be monitored for repeat occurrences.
If a storage appliance takes multiple watchdog resets, look for previously logged errors associated with the CPU, motherboard, memory or I/O cards.

  1. Data to be collected to help diagnose the cause of a watchdog reset:
  • AutoSupports
  • Console logs before, during, and after the watchdog event (if possible)
  • ssram log (/etc/log/ssram/ssram.log or /mroot/etc/log/ssram/ssram.log) – FAS62xx only
  • On systems with a service processor: – system sensors – events all – system log – sp status -d

Note: No hardware should be replaced unless the root cause is a hardware issue.

 

Disclaimer

NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.