Hello,
Few questions.
From -
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
A]
Network failures may be transient. To _*avoid invoking recovery_*, the
client tries, initially, to re-send any _timed out_ request to the server.
-- What timeout is it referring to ? /proc/fs/lustre/timeout (obd_timeout)?
OR
-- The _time_ after target disconnect (due to transient n/w issues) and
before the recovery starts for obd_timeout period? - Do we have this
_time_ defined anywhere ?
IIRC in above scenario the recovery hasn't kicked in yet. Thus the client
hasn't been evicted either. Please correct me?
-- Can you point to the source, for above please ?
B]
If the resend also fails, the client tries to re-establish a connection to
the server. *"_Clients can detect harmless partition upon reconnect if the
server has not had any reason to evict the client._ "
*
-- How _*Clients can detect harmless partition upon reconnect_ * ? Can you
point to the source, for above please ?
--What does resend above refer to - requests committed to sever but w/o any
replies seen by client OR new requests with trans no higher than last_recvd
OR something else?
Thanks for your time.
--
cheers
Akam