On 2014/03/13, 11:40 AM, "Francesco De Giorgi"
<francesco.degiorgi(a)exact-lab.it> wrote:
Hi,
I was trying to run ost-survey on this environment
mds, oss: lustre 2.4.2 (CentOS 6.5 with lustre patched kernel RPM)
client: lustre 2.4.2 (CentOS 6.5 patchless client, lustre iokit 1.4.0.1)
command line: ost-survey -s 100 /lustre
but the script hangs each time, chewing all the CPU in the attempt to
echo 0 in /proc/fs/lustre/llite/*/max_cached_mb . This is due to the
subroutine cache_off in the ost-survey script.
That is probably a bug in the script, and also in the kernel to allow this
to be set. It doesn't make sense to prevent any pages to be cached on the
client. It probably makes sense to require at least a few MB of cache
space on the client.
If some site wants minimal memory usage at the expense of performance,
that is OK, but we shouldn't allow them to break the system completely.
But while in a lustre client 2.1.6 the max_cached_mb appears to me as
a single number
# cat /proc/fs/lustre/llite/*/max_cached_mb
18114
on a 2.4.2 client is different
# cat /proc/fs/lustre/llite/*/max_cached_mb
users: 2
max_cached_mb: 96766
used_mb: 21806
unused_mb: 74960
reclaim_count: 0
Am I missing a new version of ost-survey or should I simply get rid of
those dangerous lines in the script?
It looks like the script has not been fixed to work with the new output.
Could you please file a bug at
https://jira.hpdd.intel.com/ with the above
details so that it can be fixed.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division