Dave Chinner <david(a)fromorbit.com> writes:
On Thu, Feb 25, 2016 at 11:24:57AM -0500, Jeff Moyer wrote:
> But, it seems plausible to me that no matter how well you
> optimize your msync implementation, it will still be more expensive than
> an application that doesn't call msync at all. This obviously depends
> on how the application is using the programming model, among other
> things. I agree that we would need real data to back this up. However,
> I don't see any reason to preclude such an implementation, or to leave
> it as a last resort. I think it should be part of our planning process
> if it's reasonably feasible.
Essentially I see this situation/request as conceptually the same as
O_DIRECT for read/write - O_DIRECT bypasses the kernel dirty range
tracking and, as such, has nasty cache coherency issues when you mix
it with buffered IO. Nor does it play well with mmap, it has
different semantics for every filesystem and the kernel code has
been optimised to the point of fragility.
And, of course, O_DIRECT requires applications to do exactly the
right things to extract performance gains and maintain data
integrity. If they get it right, they will be faster than using the
page cache, but we know that applications often get it very wrong.
And even when they get it right, data corruption can still occur
because some thrid party accessed file in a different manner (e.g. a
backup) and triggered one of the known, fundamentally unfixable
coherency problems.
However, despite the fact we are stuck with O_DIRECT and it's
deranged monkeys (which I am one of), we should not be ignoring the
problems that bypassing the kernel infrastructure has caused us and
continues to cause us. As such, we really need to think hard about
whether we should be repeating the development of such a bypass
feature. If we do, we stand a very good chance of ending up in the
same place - a bunch of code that does not play well with others,
and a nightmare to test because it's expected to work and not
corrupt data...
We should try very hard not to repeat the biggest mistake O_DIRECT
made: we need to define and document exactly what behaviour we
guarantee, how it works and exaclty what responsisbilities the
kernel and userspace have in *great detail* /before/ we add the
mechanism to the kernel.
Think it through carefully - API changes and semantics are forever.
We don't want to add something that in a couple of years we are
wishing we never added....
I agree with everything you wrote, there.
Cheers,
Jeff