Hello,

Few questions.

From -
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustrerecovery

A]
Network failures may be transient. To _avoid invoking recovery_, the client tries, initially, to re-send any _timed out_ request to the server.

-- What timeout is it referring to ? /proc/fs/lustre/timeout (obd_timeout)? OR
--  The _time_ after target disconnect (due to transient n/w issues) and before the recovery  starts for obd_timeout period? - Do we have this _time_ defined anywhere ?
IIRC in above scenario the recovery hasn't kicked in yet. Thus the client hasn't been evicted either. Please correct me?
-- Can you point to the source, for above please ?

B]
If the resend also fails, the client tries to re-establish a connection to the server. "_Clients can detect harmless partition upon reconnect if the server has not had any reason to evict the client._ "
-- How  _Clients can detect harmless partition upon reconnect_  ? Can you point to the source, for above please ?
--What does resend above refer to - requests committed to sever but w/o any replies seen by client OR new requests with trans no higher than last_recvd OR something else?


Thanks for your time.

--
cheers
Akam