Mi,
A few more questions I should have asked earlier, and a request:
1. Are OST disconnects seen on any other clients? Or any other OSS's?
2. Is the application that was executing distributed? If so, were any similar events
seen on the other nodes running the application. If not distributed, it's unlikely
that one IO node is going to overwhelm an OSS with IO.
3. Do you know the IO pattern(s) of the application? It not, please try 'strace -T
-ttt -p <pid>' - it should print the IO in MB.
The request:
4. Lastly, and this is because we don't know your environment (other jobs have run
since the events and have "corrupted" some Lustre data points), but might you be
able to reset the clients "/proc/fs/lustre/*/rpc_stats" and OSS's
"/proc/fs/lustre/*/brw_stats" files and then re-run the app, and post the
relevant rpc_stats and brw_stats? Note that on the client you want the per-OST rpc_stats,
and on the OSS you want the per-client brw_stat, for any affected (disconnected, Bulk IO
write error) OST.
These data points should get us a better picture. Thanks.
--
Brett Lee
Sr. Systems Engineer
Intel High Performance Data Division
-----Original Message-----
From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-
bounces(a)lists.01.org] On Behalf Of Lee, Brett
Sent: Tuesday, May 07, 2013 10:27 AM
To: Mi Zhou; hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] OST refused connection from client
Hi Mi,
Yes, there are tuneables for threads, and other factors as well. Lustre,
however, does auto tuning of OSS/MDS threads itself, and from what I know
it does a pretty good job.
It would be helpful to know about the CPU core configuration on the OSS's as
well as about the threads running:
lctl get_param {service}.thread_{min,max,started}
You know, "Lustre" often seems to have the finger poked at it first, but in
actuality, Lustre typically runs atop a complex environment (nodes, network
infrastructure, etc) and is driven by demanding applications.
Based on the emails from yesterday and today, we're seeing client/OST
connections come and go. And we're seeing at least one "Bulk IO write"
error on an OSS.
Trying to piece these two events together, and not knowing the exact cause
of either, seems to leave lots of room for opinions/possibilities. AFIAK, we
cannot yet rule out intermittent network disruptions. These seem like an
easy (and common) explanation for the problems seen. Nor can we rule out
the application causing the client to be resource constrained and unable to
effectively "keep alive" with the OST, deliver RPC's, or respond to
OSS's.
Another common condition with running HPC applications.
Lots of possibilities and few data points. Coming back to the network, yes
again, is there monitoring in place that would track disruptions? Or on the
client node, is there monitoring that would indicate that the network stack
(or memory) was resource starved (OOM's, timeouts, etc.)? Same is true for
the OSS?
Lastly, are the OST's in question on the same OSS? Looks like they were
0009, 000a, and 000b.
--
Brett Lee
Sr. Systems Engineer
Intel High Performance Data Division
> -----Original Message-----
> From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-
> bounces(a)lists.01.org] On Behalf Of Mi Zhou
> Sent: Tuesday, May 07, 2013 9:49 AM
> To: hpdd-discuss(a)lists.01.org
> Subject: Re: [HPDD-discuss] OST refused connection from client
>
> Hi,
>
> Thanks everyone for the input.
>
> I do see "connection to ... was lost" on the client side, but I did
> not see messages like "waiting_locks_callback()".
>
> Below is another instance:
>
> Error on client:
>
> May 7 01:33:18 nodem14 kernel: Lustre:
> 2519:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent
> has timed out for sent delay: [sent 1367908387/real 0]
> req@ffff880b5d73ac00
> x1434028094681259/t0(0)
> o101->scratch-OST000a-osc-ffff880c43e21400@192.168.100.4@o2ib:28/4
> o101->lens
> 296/352 e 0 to 1 dl 1367908398 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 May
> 7 01:33:18
> nodem14 kernel: Lustre:
> scratch-OST000a-osc-ffff880c43e21400: Connection to scratch-OST000a
> (at
> 192.168.100.4@o2ib) was lost; in progress operations using this
> service will wait for recovery to complete May 7 01:33:18 nodem14 kernel:
Lustre:
> 2519:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent
> has timed out for sent delay: [sent 1367908387/real 0]
> req@ffff8809a750b000
> x1434028094681262/t0(0)
> o101->scratch-OST000b-osc-ffff880c43e21400@192.168.100.4@o2ib:28/4
> o101->lens
> 296/352 e 0 to 1 dl 1367908398 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 May
> 7 01:33:18
> nodem14 kernel: Lustre:
> scratch-OST000b-osc-ffff880c43e21400: Connection to scratch-OST000b
> (at
> 192.168.100.4@o2ib) was lost; in progress operations using this
> service will wait for recovery to complete May 7 01:33:20 nodem14 kernel:
LustreError:
> 11-0: an error occurred while communicating with 192.168.100.4@o2ib.
> The ost_connect operation failed with -16 May 7 01:33:20 nodem14 kernel:
> LustreError: 11-0: an error occurred while communicating with
> 192.168.100.4@o2ib. The ost_connect operation failed with -16 May 7
> 01:33:43 nodem14 kernel: Lustre:
> scratch-OST0009-osc-ffff880c43e21400: Connection restored to
> scratch-OST0009 (at 192.168.100.4@o2ib)
>
> Error on OSS:
>
> May 7 01:33:20 lustre-oss04 kernel: Lustre: scratch-OST000a: Bulk IO
> write error with 77b5db75-5d82-1976-0116-5ef24f9febee (at
> 192.168.102.14@o2ib), client will retry: rc -110 May 7 01:33:43
> lustre-oss04
> kernel: Lustre: scratch-OST0009: Client 77b5db75-5d82-1976-0116-
> 5ef24f9febee (at 192.168.102.14@o2ib) reconnecting
>
> I agree it is caused by some I/O intensive application, at least
> partially. I wonder if there is anything we can do on Lustre side to alleviate
the problem.
> Like, lower the number of threads, etc.
>
>
> Thanks
>
> Mi
>
>
>
> Email Disclaimer:
www.stjude.org/emaildisclaimer Consultation Disclaimer:
>
www.stjude.org/consultationdisclaimer
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss