On Mon, 2019-09-23 at 12:08 -0700, Ira Weiny wrote:
Since the last RFC patch set much of the discussion of supporting
FS DAX has been around the semantics of the lease mechanism. Within that
thread it was suggested I try and write some documentation and/or tests for the
new mechanism being proposed. I have created a foundation to test lease
functionality within xfstests. This should be close to being accepted.
Before writing additional lease tests, or changing lots of kernel code, this
email presents documentation for the new proposed "layout lease" semantic.
At Linux Plumbers just over a week ago, I presented the current state of the
patch set and the outstanding issues. Based on the discussion there, well as
follow up emails, I propose the following addition to the fcntl() man page.
Thank you so much for doing this, Ira. This allows us to debate the
user-visible behavior semantics without getting bogged down in the
implementation details. More comments below:
<fcntl man page addition>
Layout (F_LAYOUT) leases are special leases which can be used to control and/or
be informed about the manipulation of the underlying layout of a file.
A layout is defined as the logical file block -> physical file block mapping
including the file size and sharing of physical blocks among files. Note that
the unwritten state of a block is not considered part of file layout.
**Read layout lease F_RDLCK | F_LAYOUT**
Read layout leases can be used to be informed of layout changes by the
system or other users. This lease is similar to the standard read (F_RDLCK)
lease in that any attempt to change the _layout_ of the file will be reported to
the process through the lease break process. But this lease is different
because the file can be opened for write and data can be read and/or written to
the file as long as the underlying layout of the file does not change.
Therefore, the lease is not broken if the file is simply open for write, but
_may_ be broken if an operation such as, truncate(), fallocate() or write()
results in changing the underlying layout.
**Write layout lease (F_WRLCK | F_LAYOUT)**
Write Layout leases can be used to break read layout leases to indicate that
the process intends to change the underlying layout lease of the file.
A process which has taken a write layout lease has exclusive ownership of the
file layout and can modify that layout as long as the lease is held.
Operations which change the layout are allowed by that process. But operations
from other file descriptors which attempt to change the layout will break the
lease through the standard lease break process. The F_LAYOUT flag is used to
indicate a difference between a regular F_WRLCK and F_WRLCK with F_LAYOUT. In
the F_LAYOUT case opens for write do not break the lease. But some operations,
if they change the underlying layout, may.
The distinction between read layout leases and write layout leases is that
write layout leases can change the layout without breaking the lease within the
owning process. This is useful to guarantee a layout prior to specifying the
unbreakable flag described below.
The above sounds totally reasonable. You're essentially exposing the
behavior of nfsd's layout leases to userland. To be clear, will F_LAYOUT
leases work the same way as "normal" leases, wrt signals and timeouts?
I do wonder if we're better off not trying to "or" in flags for this,
and instead have a separate set of commands (maybe F_RDLAYOUT,
F_WRLAYOUT, F_UNLAYOUT). Maybe I'm just bikeshedding though -- I don't
feel terribly strongly about it.
Also, at least in NFSv4, layouts are handed out for a particular byte
range in a file. Should we consider doing this with an API that allows
for that in the future? Is this something that would be desirable for
your RDMA+DAX use-cases?
We could add a new F_SETLEASE variant that takes a struct with a byte
range (something like struct flock).
**Unbreakable Layout Leases (F_UNBREAK)**
In order to support pinning of file pages by direct user space users an
unbreakable flag (F_UNBREAK) can be used to modify the read and write layout
lease. When specified, F_UNBREAK indicates that any user attempting to break
the lease will fail with ETXTBUSY rather than follow the normal breaking
Both read and write layout leases can have the unbreakable flag (F_UNBREAK)
specified. The difference between an unbreakable read layout lease and an
unbreakable write layout lease are that an unbreakable read layout lease is
_not_ exclusive. This means that once a layout is established on a file,
multiple unbreakable read layout leases can be taken by multiple processes and
used to pin the underlying pages of that file.
Care must therefore be taken to ensure that the layout of the file is as the
user wants prior to using the unbreakable read layout lease. A safe mechanism
to do this would be to take a write layout lease and use fallocate() to set the
layout of the file. The layout lease can then be "downgraded" to unbreakable
read layout as long as no other user broke the write layout lease.
Will userland require any special privileges in order to set an
F_UNBREAK lease? This seems like something that could be used for DoS. I
assume that these will never time out.
How will we deal with the case where something is is squatting on an
F_UNBREAK lease and isn't letting it go?
Leases are technically "owned" by the file description -- we can't
necessarily trace it back to a single task in a threaded program. The
kernel task that set the lease may have exited by the time we go
Will we be content trying to determine this using /proc/locks+lsof, etc,
or will we need something better?
</fcntl man page addition>
Jeff Layton <jlayton(a)kernel.org>