Skip to content
Commit 18f87a67 authored by Martin Peschke's avatar Martin Peschke Committed by Christoph Hellwig
Browse files

zfcp: auto port scan resiliency



This patch improves the Fibre Channel port scan behaviour of the zfcp lldd.
Without it the zfcp device driver may churn up the storage area network by
excessive scanning and scan bursts, particularly in big virtual server
environments, potentially resulting in interference of virtual servers and
reduced availability of storage connectivity.

The two main issues as to the zfcp device drivers automatic port scan in
virtual server environments are frequency and simultaneity.
On the one hand, there is no point in allowing lots of ports scans
in a row. It makes sense, though, to make sure that a scan is conducted
eventually if there has been any indication for potential SAN changes.
On the other hand, lots of virtual servers receiving the same indication
for a SAN change had better not attempt to conduct a scan instantly,
that is, at the same time.

Hence this patch has a two-fold approach for better port scanning:
the introduction of a rate limit to amend frequency issues, and the
introduction of a short random backoff to amend simultaneity issues.
Both approaches boil down to deferred port scans, with delays
comprising parts for both approaches.

The new port scan behaviour is summarised best by:

                                               NEW:    NEW:
                          no_auto_port_rescan  random  rate    flush
                                               backoff limit   =wait

adapter resume/thaw       yes                  yes     no      yes*
adapter online (user)     no                   yes     no      yes*
port rescan (user)        no                   no      no      yes
adapter recovery (user)   yes                  yes     yes     no
adapter recovery (other)  yes                  yes     yes     no
incoming ELS              yes                  yes     yes     no
incoming ELS lost         yes                  yes     yes     no

Implementation is straight-forward by converting an existing worker to
a delayed worker. But care is needed whenever that worker is going to be
flushed (in order to make sure work has been completed), since a flush
operation cancels the timer set up for deferred execution (see * above).

There is a small race window whenever a port scan work starts
running up to the point in time of storing the time stamp for that port
scan. The impact is negligible. Closing that gap isn't trivial, though, and
would the destroy the beauty of a simple work-to-delayed-work conversion.

Signed-off-by: default avatarMartin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
parent c8bba144
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment