Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
https://github.com/0day-ci/linux/commits/Christian-Brauner/nsproxy-attach...
commit 7322464a68c444dccde385b3b696f48e6d1bb5cc
Author: Christian Brauner <christian.brauner(a)ubuntu.com>
AuthorDate: Mon Apr 27 16:36:46 2020 +0200
Commit: 0day robot <lkp(a)intel.com>
CommitDate: Tue Apr 28 08:20:37 2020 +0800
nsproxy: attach to namespaces via pidfds
For quite a while we have been thinking about using pidfds to attach to
namespaces. This patchset has existed for about a year already but we've
wanted to wait to see how the general api would be received and adopted.
Now that more and more programs in userspace have started using pidfds
for process management it's time to send this one out.
This patch makes it possible to use pidfds to attach to the namespaces
of another process, i.e. they can be passed as the first argument to the
setns() syscall. When only a single namespace type is specified the
semantics are equivalent to passing an nsfd. That means
setns(nsfd, CLONE_NEWNET) equals setns(pidfd, CLONE_NEWNET). However,
when a pidfd is passed, multiple namespace flags can be specified in the
second setns() argument and setns() will attach the caller to all the
specified namespaces all at once or to none of them. If 0 is specified
together with a pidfd then setns() will interpret it the same way 0 is
interpreted together with a nsfd argument, i.e. attach to any/all
namespaces.
The obvious example where this is useful is a standard container
manager interacting with a running container: pushing and pulling files
or directories, injecting mounts, attaching/execing any kind of process,
managing network devices all these operations require attaching to all
or at least multiple namespaces at the same time. Given that nowadays
most containers are spawned with all namespaces enabled we're currently
looking at at least 14 syscalls, 7 to open the /proc/<pid>/ns/<ns>
nsfds, another 7 to actually perform the namespace switch. With time
namespaces we're looking at about 16 syscalls.
(We could amortize the first 7 or 8 syscalls for opening the nsfds by
stashing them in each container's monitor process but that would mean
we need to send around those file descriptors through unix sockets
everytime we want to interact with the container or keep on-disk
state. Even in scenarios where a caller wants to join a particular
namespace in a particular order callers still profit from batching
other namespaces. That mostly applies to the user namespace but
all container runtimes I found join the user namespace first no matter
if it privileges or deprivileges the container.)
With pidfds this becomes a single syscall no matter how many namespaces
are supposed to be attached to.
A decently designed, large-scale container manager usually isn't the
parent of any of the containers it spawns so the containers don't die
when it crashes or needs to update or reinitialize. This means that
for the manger to interact with containers through pids is inherently
racy especially on systems where the maximum pid number is not
signficianly bumped. This is even more problematic since we often spawn
and manage thousands or ten-thousands of containers. Interacting with a
container through a pid thus can become risky quite quickly. Especially
since we allow for an administrator to enable advanced features such as
syscall interception where we're performing syscalls in lieu of the
container. In all of those cases we use pidfds if they are available and
we pass them around as stable references. Using them to setns() to the
target process namespaces is as reliable as using nsfds. Either the
target process is already dead and we get ESRCH or we manage to attach
to its namespaces but we can't accidently attach to another process'
namespaces. So pidfds lend themselves to be used with this api.
Apart from significiantly reducing the number of syscalls from double
digit to single digit which is a decent reason post-spectre/meltdown
this also allows to switch to a set of namespaces atomically, i.e.
either attaching to all the specified namespaces succeeds or we fail. If
we fail we haven't changed a single namespace. There are currently three
namespaces that can fail (other than for ENOMEM which really is not
very interesting since we then have other problems anyway) for
non-trivial reasons, user, mount, and pid namespaces. We can fail to
attach to a pid namespace if it is not our current active pid namespace
or a descendant of it. We can fail to attach to a user namespace because
we are multi-threaded, because our current mount namespace shares
filesystem state with other tasks, or because we're trying to setns()
to the same user namespace, i.e. the target task has the same user
namespace as we do. We can fail to attach to a mount namespace because
it shares filesystem state with other tasks or because we fail to lookup
the new root for the new mount namespace. In most non-pathological
scenarios these issues can be somewhat mitigated. But there's e.g.
still an inherent race between trying to setns() to the mount namespace
of a task and that task spawning a child with CLONE_FS. If that process
runs in a new user namespace we must have already setns()ed into the new
user namespace otherwise we fail to attach to the mount namespace. There
are other cases similar to that and we've had issues where we're
half-attached to some namespace and failing in the middle. I've talked
about some of these problem during the hallway track (something only the
pre-COVID-19 generation will remember) of Plumber in Los Angeles in
2018(?). Even if all these issues could be avoided with super careful
userspace coding it would be nicer to have this done in-kernel. There's
not a lot of cost associated with this extension for the kernel and
pidfds seem to lend themselves nicely for this.
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: Serge Hallyn <serge(a)hallyn.com>
Cc: Aleksa Sarai <cyphar(a)cyphar.com>
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
51184ae37e Merge tag 'for-5.7-rc3-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
7322464a68 nsproxy: attach to namespaces via pidfds
+---------------------------------------------------+------------+------------+
| | 51184ae37e | 7322464a68 |
+---------------------------------------------------+------------+------------+
| boot_successes | 44 | 0 |
| boot_failures | 1 | 16 |
| Mem-Info | 1 | |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 10 |
| Oops:#[##] | 0 | 16 |
| EIP:__ia32_sys_setns | 0 | 11 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 16 |
| WARNING:at_lib/refcount.c:#refcount_warn_saturate | 0 | 6 |
| EIP:refcount_warn_saturate | 0 | 6 |
| BUG:unable_to_handle_page_fault_for_address | 0 | 6 |
| EIP:get_pid_task | 0 | 5 |
+---------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <lkp(a)intel.com>
[ 9.233966] VFS: Warning: trinity-c3 using old stat() call. Recompile your binary.
[child3:813] set_mempolicy (276) returned ENOSYS, marking as inactive.
[ 9.237518] warning: process `trinity-c2' used the deprecated sysctl system call
with
[ 9.238724] ------------[ cut here ]------------
[ 9.239358] refcount_t: addition on 0; use-after-free.
[ 9.240095] WARNING: CPU: 0 PID: 808 at lib/refcount.c:25
refcount_warn_saturate+0xba/0x120
[ 9.241429] Modules linked in:
[ 9.241851] CPU: 0 PID: 808 Comm: trinity-c3 Tainted: G S
5.7.0-rc3-00013-g7322464a68c44 #1
[ 9.243138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1
04/01/2014
[ 9.244244] EIP: refcount_warn_saturate+0xba/0x120
[ 9.244884] Code: 58 e9 87 00 00 00 8d b4 26 00 00 00 00 8d 76 00 80 3d f4 84 f2 d1 00
75 74 68 24 07 bc d1 c6 05 f4 84 f2 d1 01 e8 36 56 c9 ff <0f> 0b 58 eb 5e 90 80 3d
f3 84 f2 d1 00 75 54 68 50 07 bc d1 c6 05
[ 9.247345] EAX: 0000002a EBX: f66d0b44 ECX: 0000008b EDX: f7314300
[ 9.248175] ESI: 00000000 EDI: f7314300 EBP: f7317f64 ESP: f7317f60
[ 9.249001] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010292
[ 9.249909] CR0: 80050033 CR2: b6d3c000 CR3: 2c9e6000 CR4: 000006d0
[ 9.250755] Call Trace:
[ 9.251099] get_pid_task+0x5e/0xa0
[ 9.251576] __ia32_sys_setns+0xcd/0x440
[ 9.252100] ? __task_pid_nr_ns+0xb7/0xd0
[ 9.252643] do_int80_syscall_32+0x45/0xd0
[ 9.253199] entry_INT80_32+0xf4/0xf4
[ 9.253691] EIP: 0x809b132
[ 9.254065] Code: 89 c8 c3 90 8d 74 26 00 85 c0 c7 01 01 00 00 00 75 d8 a1 6c 94 a8 08
eb d1 66 90 66 90 66 90 66 90 66 90 66 90 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d
bc 27 00 00 00 00 8b 10 a3 94 94 a8 08 85
[ 9.256568] EAX: ffffffda EBX: 0000011c ECX: 00000000 EDX: 4a03bed6
[ 9.257399] ESI: fffffffb EDI: 0d258544 EBP: 0000c000 ESP: bfea7cd8
[ 9.258231] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[ 9.259145] ---[ end trace 6da55ee5a0c4840d ]---
# HH:MM RESULT GOOD BAD
GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 5d84712bd468a94c6dc944824cbe62278e9ba112
6a8b55ed4056ea5559ebe4f6a4b247f627870d4c --
git bisect bad 56a717a7c44e387d6d9ac05bdb00177ed94073c7 # 02:19 B 0 1 17 0
Merge
'linux-review/Chuck-Lever/NFS-RDMA-client-patches-for-v5-7-rc/20200421-201520'
into devel-hourly-2020042818
git bisect bad 6b419cca3362544aa5b545cfd147f80268dbf995 # 02:48 B 0 4 20 0
Merge
'linux-review/Nishad-Kamdar/f2fs-Use-the-correct-style-for-SPDX-License-Identifier/20200427-163732'
into devel-hourly-2020042818
git bisect good 31d4507e44281e3a526306083d7240ae9727ca59 # 03:18 G 13 0 0 0
Merge
'linux-review/Denis-Kirjanov/xen-networking-add-basic-XDP-support-for-xen-netfront/20200428-083754'
into devel-hourly-2020042818
git bisect bad 39c0c195552fd7f3999a276a4d32a04f990541a4 # 03:53 B 1 3 1 1
Merge
'linux-review/Mateusz-Gorski/Add-support-for-different-DMIC-configurations/20200428-074312'
into devel-hourly-2020042818
git bisect good 57e099d6b915c6fdf563859662b3c49766011ba9 # 04:29 G 13 0 6 6
Merge
'linux-review/Jin-Yao/perf-stat-Fix-uncore-event-mixed-metric-with-workload-error-issue/20200428-082344'
into devel-hourly-2020042818
git bisect good 01c0ef45fd240689365da8765fe716700fac6ad6 # 05:00 G 13 0 4 4
Merge
'linux-review/Nishad-Kamdar/NFS-Use-the-correct-style-for-SPDX-License-Identifier/20200427-164819'
into devel-hourly-2020042818
git bisect good 864d22c79992955a3bcb81ec9b65be07ffdfe3a4 # 08:35 G 14 0 7 7
Merge
'linux-review/Marek-Szyprowski/Minor-WM8994-MFD-codec-fixes/20200428-055602' into
devel-hourly-2020042818
git bisect good 1ac57c71a6f74f5d705f4718bc0cad04b9de22b7 # 09:19 G 13 0 7 7
Merge
'linux-review/madhuparnabhowmik10-gmail-com/rapidio-Avoid-data-race-between-file-operation-callbacks-and-mport_cdev_add/20200428-031110'
into devel-hourly-2020042818
git bisect good 98d171a4cb8c07fa091f8ad5cbe9187771ce15ef # 10:06 G 13 0 6 6
Merge 'linux-review/Masahiro-Yamada/kbuild-remove-target/20200427-162454' into
devel-hourly-2020042818
git bisect bad 771b1c0092cd189d26703e37d6d5a62d064841d4 # 10:53 B 1 1 1 1
Merge
'linux-review/Christian-Brauner/nsproxy-attach-to-namespaces-via-pidfds/20200428-082028'
into devel-hourly-2020042818
git bisect bad 7322464a68c444dccde385b3b696f48e6d1bb5cc # 15:15 B 1 3 1 1
nsproxy: attach to namespaces via pidfds
# first bad commit: [7322464a68c444dccde385b3b696f48e6d1bb5cc] nsproxy: attach to
namespaces via pidfds
git bisect good 51184ae37e0518fd90cb437a2fbc953ae558cd0d # 16:08 G 47 0 5 5
Merge tag 'for-5.7-rc3-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
# extra tests with debug options
git bisect bad 7322464a68c444dccde385b3b696f48e6d1bb5cc # 16:41 B 3 1 3 3
nsproxy: attach to namespaces via pidfds
# extra tests on head commit of
linux-review/Christian-Brauner/nsproxy-attach-to-namespaces-via-pidfds/20200428-082028
git bisect bad 7322464a68c444dccde385b3b696f48e6d1bb5cc # 16:48 B 0 11 32 5
nsproxy: attach to namespaces via pidfds
# bad: [7322464a68c444dccde385b3b696f48e6d1bb5cc] nsproxy: attach to namespaces via
pidfds
# extra tests on revert first bad commit
git bisect good a845404448ebed36cbab42e09ce85fcebd48e701 # 17:19 G 15 0 1 1
Revert "nsproxy: attach to namespaces via pidfds"
# good: [a845404448ebed36cbab42e09ce85fcebd48e701] Revert "nsproxy: attach to
namespaces via pidfds"
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/lkp@lists.01.org