Dear All,
I seem to have resolved my issues and recovered from the problems I have been having.
Thank you Rick for pointing out the cache parameters disabling that seems to have helped
the situation.
I have to understand our load a bit more to make sense on how to use
readcache_max_filesize so that some small I/O can benefit from it.
Slowness was caused by error's on OST's recovering them with e2fsck restored all
my problems.
Regards,
Amit
-----Original Message-----
From: HPDD-discuss [mailto:hpdd-discuss-bounces@lists.01.org] On Behalf Of
Kumar, Amit
Sent: Monday, August 17, 2015 12:45 PM
To: Mohr Jr, Richard Frank (Rick Mohr)
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] (ost_handler.c:1764:ost_blocking_ast()) Error -2
syncing data on lock cancel
Hi Rick,
>-----Original Message-----
>From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr@utk.edu]
>Do you have Lustre read/write caching enabled? That could be consuming
>the memory. You can check the values by running
>
>lctl get_param obdfilter.*.read_cache_enable lctl get_param
>obdfilter.*.writethrough_cache_enable
>
>If they are enabled, you could try disabling them to see how that
>affects memory usage (and if it has any effect on your problem).
>
I have disabled them both to see the affect. %mem seems to be dropping!! I
also read that setting readcache_max_filesize can control the max file size to
cache in memory. I will probably try this as well when %mem drops to certain
number.
>> Also under the cat /proc/fs/lustre/ldlm/namespaces/*/*
>> I see lru_size = 36000000
>> While others are all "0" not sure these numbers need to be tuned?
>
>I think the “0” just means the limit is dynamically determined.
>
>On our clients, we set lru_size=10000 and lru_max_age=172800 which is
>more than enough. (The dynamically determined values are crazy big…)
>
I have tweaked ours as well!!
Another outstanding issue I am having as a consequence of these events is my
file creation and opening times have shot up. When I strace it: I see time spent
in ioctl/open is in couple of minutes while creating files with lfs setstripe, and
time spent in "select/open" call while editing files is in couple of minutes.
Example strace time spent:
23994 open(".newfile_idx69_real.swp", O_RDWR|O_CREAT|O_EXCL, 0600) =
4 <20.590901>
23994 open(".newfile_idx69_real.swp",
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0600) = 4 <100.007478>
Any hints on troubleshooting this will be big help....
Thank you,
Amit
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss