NAS-134183 / 25.10 / nvme-of: wait for timeout to pass if shelf was empty #15707
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Removing power from the ES24N shelf does not trigger events on the discovery controller except for an Interface Link Down. Shelf drives keep retrying until a 10-minute timeout. After power restoration, it takes about 10 seconds to connect, yet the remote discovery log shows no entries. A discovery change event then removes all drives since the ES24N shelf reports none. One minute after Link Up, a Link Down occurs, followed by another Link Up a few seconds later that triggers a change event during which some entries, notably CM7 drives, are missed.
To address this, wait for the timeout to complete if the shelf was previously empty during a discovery change event on power restore. This avoids acting on incomplete discovery logs and ensures proper drive reconnection. In addition, when connecting the first disk to the enclosure, we add additional waiting time. The extra delay is minimal since disks already take a few seconds to appear in the discovery log after a change event.
Jira Ticket: https://ixsystems.atlassian.net/browse/NAS-134183
Validated by Jeff Ervin on f100-152