commit b704883aa8dc4d1d232d3a3cdc438a64889fcc6e Author: Greg Kroah-Hartman Date: Sun Aug 15 13:08:06 2021 +0200 Linux 5.4.141 Link: https://lore.kernel.org/r/20210813150523.364549385@linuxfoundation.org Tested-by: Shuah Khan Tested-by: Sudip Mukherjee Tested-by: Linux Kernel Functional Testing Tested-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 983d6a6b7e3cd6cfc05d903a08b52a99740fdf0d Author: Nikolay Borisov Date: Fri Aug 13 20:12:25 2021 +0800 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata commit 4d14c5cde5c268a2bc26addecf09489cb953ef64 upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo Signed-off-by: Nikolay Borisov Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit ea13f678a3fdd01fac59919dc64d635d654e5801 Author: Nikolay Borisov Date: Fri Aug 13 20:12:24 2021 +0800 btrfs: export and rename qgroup_reserve_meta commit 80e9baed722c853056e0c5374f51524593cb1031 upstream Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit 41a9b8f36de75dd87caf367d3a85b9fb253873f8 Author: Qu Wenruo Date: Fri Aug 13 20:12:23 2021 +0800 btrfs: qgroup: don't commit transaction when we already hold the handle commit 6f23277a49e68f8a9355385c846939ad0b1261e7 upstream [BUG] When running the following script, btrfs will trigger an ASSERT(): #/bin/bash mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 1G" $mnt/file sync btrfs quota enable $mnt btrfs quota rescan -w $mnt # Manually set the limit below current usage btrfs qgroup limit 512M $mnt $mnt # Crash happens touch $mnt/file The dmesg looks like this: assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3230! invalid opcode: 0000 [#1] SMP PTI RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] btrfs_commit_transaction.cold+0x11/0x5d [btrfs] try_flush_qgroup+0x67/0x100 [btrfs] __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs] btrfs_delayed_update_inode+0xaa/0x350 [btrfs] btrfs_update_inode+0x9d/0x110 [btrfs] btrfs_dirty_inode+0x5d/0xd0 [btrfs] touch_atime+0xb5/0x100 iterate_dir+0xf1/0x1b0 __x64_sys_getdents64+0x78/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fb5afe588db [CAUSE] In try_flush_qgroup(), we assume we don't hold a transaction handle at all. This is true for data reservation and mostly true for metadata. Since data space reservation always happens before we start a transaction, and for most metadata operation we reserve space in start_transaction(). But there is an exception, btrfs_delayed_inode_reserve_metadata(). It holds a transaction handle, while still trying to reserve extra metadata space. When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we will join current transaction and commit, while we still have transaction handle from qgroup code. [FIX] Let's check current->journal before we join the transaction. If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means we are not holding a transaction, thus are able to join and then commit transaction. If current->journal is a valid transaction handle, we avoid committing transaction and just end it This is less effective than committing current transaction, as it won't free metadata reserved space, but we may still free some data space before new data writes. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634 Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit 38b8485b72cbe4521fd2e0b8770e3d78f9b89e60 Author: YueHaibing Date: Wed May 19 10:47:04 2021 +0800 net: xilinx_emaclite: Do not print real IOMEM pointer commit d0d62baa7f505bd4c59cd169692ff07ec49dde37 upstream. Printing kernel pointers is discouraged because they might leak kernel memory layout. This fixes smatch warning: drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn: argument 4 to %08lX specifier is cast from pointer Signed-off-by: YueHaibing Signed-off-by: David S. Miller Signed-off-by: Pavel Machek (CIP) Signed-off-by: Greg Kroah-Hartman commit 654c19a7e8d8702adbd9944e98cdd6cbdce2d8f0 Author: Filipe Manana Date: Fri Aug 13 17:55:30 2021 +0800 btrfs: fix lockdep splat when enabling and disabling qgroups commit a855fbe69229078cd8aecd8974fb996a5ca651e6 upstream When running test case btrfs/017 from fstests, lockdep reported the following splat: [ 1297.067385] ====================================================== [ 1297.067708] WARNING: possible circular locking dependency detected [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted [ 1297.068322] ------------------------------------------------------ [ 1297.068629] btrfs/189080 is trying to acquire lock: [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.069274] but task is already holding lock: [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.070219] which lock already depends on the new lock. [ 1297.071131] the existing dependency chain (in reverse order) is: [ 1297.071721] -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}: [ 1297.072375] lock_acquire+0xd8/0x490 [ 1297.072710] __mutex_lock+0xa3/0xb30 [ 1297.073061] btrfs_qgroup_inherit+0x59/0x6a0 [btrfs] [ 1297.073421] create_subvol+0x194/0x990 [btrfs] [ 1297.073780] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] [ 1297.074133] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] [ 1297.074498] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] [ 1297.074872] btrfs_ioctl+0x1a90/0x36f0 [btrfs] [ 1297.075245] __x64_sys_ioctl+0x83/0xb0 [ 1297.075617] do_syscall_64+0x33/0x80 [ 1297.075993] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.076380] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 1297.077166] check_prev_add+0x91/0xc60 [ 1297.077572] __lock_acquire+0x1740/0x3110 [ 1297.077984] lock_acquire+0xd8/0x490 [ 1297.078411] start_transaction+0x3c5/0x760 [btrfs] [ 1297.078853] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.079323] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.079789] __x64_sys_ioctl+0x83/0xb0 [ 1297.080232] do_syscall_64+0x33/0x80 [ 1297.080680] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.081139] other info that might help us debug this: [ 1297.082536] Possible unsafe locking scenario: [ 1297.083510] CPU0 CPU1 [ 1297.084005] ---- ---- [ 1297.084500] lock(&fs_info->qgroup_ioctl_lock); [ 1297.084994] lock(sb_internal#2); [ 1297.085485] lock(&fs_info->qgroup_ioctl_lock); [ 1297.085974] lock(sb_internal#2); [ 1297.086454] *** DEADLOCK *** [ 1297.087880] 3 locks held by btrfs/189080: [ 1297.088324] #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs] [ 1297.088799] #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.089284] #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.089771] stack backtrace: [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1 [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1297.092123] Call Trace: [ 1297.092629] dump_stack+0x8d/0xb5 [ 1297.093115] check_noncircular+0xff/0x110 [ 1297.093596] check_prev_add+0x91/0xc60 [ 1297.094076] ? kvm_clock_read+0x14/0x30 [ 1297.094553] ? kvm_sched_clock_read+0x5/0x10 [ 1297.095029] __lock_acquire+0x1740/0x3110 [ 1297.095510] lock_acquire+0xd8/0x490 [ 1297.095993] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.096476] start_transaction+0x3c5/0x760 [btrfs] [ 1297.096962] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097451] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097941] ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.098429] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.098904] ? do_user_addr_fault+0x20c/0x430 [ 1297.099382] ? kvm_clock_read+0x14/0x30 [ 1297.099854] ? kvm_sched_clock_read+0x5/0x10 [ 1297.100328] ? sched_clock+0x5/0x10 [ 1297.100801] ? sched_clock_cpu+0x12/0x180 [ 1297.101272] ? __x64_sys_ioctl+0x83/0xb0 [ 1297.101739] __x64_sys_ioctl+0x83/0xb0 [ 1297.102207] do_syscall_64+0x33/0x80 [ 1297.102673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.103148] RIP: 0033:0x7f773ff65d87 This is because during the quota enable ioctl we lock first the mutex qgroup_ioctl_lock and then start a transaction, and starting a transaction acquires a fs freeze semaphore (at the VFS level). However, every other code path, except for the quota disable ioctl path, we do the opposite: we start a transaction and then lock the mutex. So fix this by making the quota enable and disable paths to start the transaction without having the mutex locked, and then, after starting the transaction, lock the mutex and check if some other task already enabled or disabled the quotas, bailing with success if that was the case. Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Conflicts: fs/btrfs/qgroup.c Signed-off-by: Greg Kroah-Hartman commit c55442cdfdb88deda480ff11c4c9fae05b40ba4d Author: Qu Wenruo Date: Fri Aug 13 17:55:29 2021 +0800 btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT commit adca4d945c8dca28a85df45c5b117e6dac2e77f1 upstream commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT") tries to reduce the early EDQUOT problems by checking the qgroup free against threshold and tries to wake up commit kthread to free some space. The problem of that mechanism is, it can only free qgroup per-trans metadata space, can't do anything to data, nor prealloc qgroup space. Now since we have the ability to flush qgroup space, and implemented retry-after-EDQUOT behavior, such mechanism can be completely replaced. So this patch will cleanup such mechanism in favor of retry-after-EDQUOT. Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit fdaf6a322fcc5b4355eadfdb32dcb0502a32804e Author: Qu Wenruo Date: Fri Aug 13 17:55:28 2021 +0800 btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED commit 3296bf562443a8ca35aaad959a76a49e9b412760 upstream The state was introduced in commit 4a9d8bdee368 ("Btrfs: make the state of the transaction more readable"), then in commit 302167c50b32 ("btrfs: don't end the transaction for delayed refs in throttle") the state is completely removed. So we can just clean up the state since it's only compared but never set. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit 36af2de520cca7c37974cc4944b47850f6c460ee Author: Qu Wenruo Date: Fri Aug 13 17:55:27 2021 +0800 btrfs: qgroup: try to flush qgroup space when we get -EDQUOT commit c53e9653605dbf708f5be02902de51831be4b009 upstream [PROBLEM] There are known problem related to how btrfs handles qgroup reserved space. One of the most obvious case is the the test case btrfs/153, which do fallocate, then write into the preallocated range. # btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad) # --- tests/btrfs/153.out 2019-10-22 15:18:14.068965341 +0800 # +++ xfstests-dev/results//btrfs/153.out.bad 2020-07-01 20:24:40.730000089 +0800 # @@ -1,2 +1,5 @@ # QA output created by 153 # +pwrite: Disk quota exceeded # +/mnt/scratch/testfile2: Disk quota exceeded # +/mnt/scratch/testfile2: Disk quota exceeded # Silence is golden # ... # (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad' to see the entire diff) [CAUSE] Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"), we always reserve space no matter if it's COW or not. Such behavior change is mostly for performance, and reverting it is not a good idea anyway. For preallcoated extent, we reserve qgroup data space for it already, and since we also reserve data space for qgroup at buffered write time, it needs twice the space for us to write into preallocated space. This leads to the -EDQUOT in buffered write routine. And we can't follow the same solution, unlike data/meta space check, qgroup reserved space is shared between data/metadata. The EDQUOT can happen at the metadata reservation, so doing NODATACOW check after qgroup reservation failure is not a solution. [FIX] To solve the problem, we don't return -EDQUOT directly, but every time we got a -EDQUOT, we try to flush qgroup space: - Flush all inodes of the root NODATACOW writes will free the qgroup reserved at run_dealloc_range(). However we don't have the infrastructure to only flush NODATACOW inodes, here we flush all inodes anyway. - Wait for ordered extents This would convert the preallocated metadata space into per-trans metadata, which can be freed in later transaction commit. - Commit transaction This will free all per-trans metadata space. Also we don't want to trigger flush multiple times, so here we introduce a per-root wait list and a new root status, to ensure only one thread starts the flushing. Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit 5c79287c2b6d3be01c5ba0b1306e8529137ae28d Author: Qu Wenruo Date: Fri Aug 13 17:55:26 2021 +0800 btrfs: qgroup: allow to unreserve range without releasing other ranges commit 263da812e87bac4098a4778efaa32c54275641db upstream [PROBLEM] Before this patch, when btrfs_qgroup_reserve_data() fails, we free all reserved space of the changeset. For example: ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M); If the last btrfs_qgroup_reserve_data() failed, it will release the entire [0, 3M) range. This behavior is kind of OK for now, as when we hit -EDQUOT, we normally go error handling and need to release all reserved ranges anyway. But this also means the following call is not possible: ret = btrfs_qgroup_reserve_data(); if (ret == -EDQUOT) { /* Do something to free some qgroup space */ ret = btrfs_qgroup_reserve_data(); } As if the first btrfs_qgroup_reserve_data() fails, it will free all reserved qgroup space. [CAUSE] This is because we release all reserved ranges when btrfs_qgroup_reserve_data() fails. [FIX] This patch will implement a new function, qgroup_unreserve_range(), to iterate through the ulist nodes, to find any nodes in the failure range, and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and decrease the extent_changeset::bytes_changed, so that we can revert to previous state. This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT happens. Suggested-by: Josef Bacik Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit b7a722fd75a1701011056b76da5020ec4b295a4e Author: Nikolay Borisov Date: Fri Aug 13 17:55:25 2021 +0800 btrfs: make btrfs_qgroup_reserve_data take btrfs_inode commit 7661a3e033ab782366e0e1f4b6aad0df3555fcbd upstream There's only a single use of vfs_inode in a tracepoint so let's take btrfs_inode directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit dfadea4061a24e2a3a4e55d1ff0e6cda59411a60 Author: Nikolay Borisov Date: Fri Aug 13 17:55:24 2021 +0800 btrfs: make qgroup_free_reserved_data take btrfs_inode commit df2cfd131fd33dbef1ce33be8b332b1f3d645f35 upstream It only uses btrfs_inode so can just as easily take it as an argument. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman commit 812f39ed5b0b7f34868736de3055c92c7c4cf459 Author: Miklos Szeredi Date: Mon Aug 9 10:19:47 2021 +0200 ovl: prevent private clone if bind mount is not allowed commit 427215d85e8d1476da1a86b8d67aceb485eb3631 upstream. Add the following checks from __do_loopback() to clone_private_mount() as well: - verify that the mount is in the current namespace - verify that there are no locked children Reported-by: Alois Wohlschlager Fixes: c771d683a62e ("vfs: introduce clone_private_mount()") Cc: # v3.18 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit eeb4742501e09c0cc58f500ebae9efd65d33eb67 Author: Pali Rohár Date: Sat Aug 7 18:00:50 2021 +0200 ppp: Fix generating ppp unit id when ifname is not specified commit 3125f26c514826077f2a4490b75e9b1c7a644c42 upstream. When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has to choose interface name as this ioctl API does not support specifying it. Kernel in this case register new interface with name "ppp" where is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This applies also in the case when registering new ppp interface via rtnl without supplying IFLA_IFNAME. PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel assign to ppp interface, in case this ppp id is not already used by other ppp interface. In case user does not specify ppp unit id then kernel choose the first free ppp unit id. This applies also for case when creating ppp interface via rtnl method as it does not provide a way for specifying own ppp unit id. If some network interface (does not have to be ppp) has name "ppp" with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails. And registering new ppp interface is not possible anymore, until interface which holds conflicting name is renamed. Or when using rtnl method with custom interface name in IFLA_IFNAME. As list of allocated / used ppp unit ids is not possible to retrieve from kernel to userspace, userspace has no idea what happens nor which interface is doing this conflict. So change the algorithm how ppp unit id is generated. And choose the first number which is not neither used as ppp unit id nor in some network interface with pattern "ppp". This issue can be simply reproduced by following pppd call when there is no ppp interface registered and also no interface with name pattern "ppp": pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty" Or by creating the one ppp interface (which gets assigned ppp unit id 0), renaming it to "ppp1" and then trying to create a new ppp interface (which will always fails as next free ppp unit id is 1, but network interface with name "ppp1" exists). This patch fixes above described issue by generating new and new ppp unit id until some non-conflicting id with network interfaces is generated. Signed-off-by: Pali Rohár Cc: stable@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3460f3959d1c5cba6c3806b8ba037d5e62e81d84 Author: Luke D Jones Date: Sat Aug 7 14:58:05 2021 +1200 ALSA: hda: Add quirk for ASUS Flow x13 commit 739d0959fbed23838a96c48fbce01dd2f6fb2c5f upstream. The ASUS GV301QH sound appears to work well with the quirk for ALC294_FIXUP_ASUS_DUAL_SPK. Signed-off-by: Luke D Jones Cc: Link: https://lore.kernel.org/r/20210807025805.27321-1-luke@ljones.dev Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 81d1a3f97631c20ad4b68c9bb49b90b392fc348c Author: Longfang Liu Date: Fri Apr 9 16:48:01 2021 +0800 USB:ehci:fix Kunpeng920 ehci hardware problem commit 26b75952ca0b8b4b3050adb9582c8e2f44d49687 upstream. Kunpeng920's EHCI controller does not have SBRN register. Reading the SBRN register when the controller driver is initialized will get 0. When rebooting the EHCI driver, ehci_shutdown() will be called. if the sbrn flag is 0, ehci_shutdown() will return directly. The sbrn flag being 0 will cause the EHCI interrupt signal to not be turned off after reboot. this interrupt that is not closed will cause an exception to the device sharing the interrupt. Therefore, the EHCI controller of Kunpeng920 needs to skip the read operation of the SBRN register. Acked-by: Alan Stern Signed-off-by: Longfang Liu Link: https://lore.kernel.org/r/1617958081-17999-1-git-send-email-liulongfang@huawei.com Signed-off-by: Greg Kroah-Hartman commit d28adaabbbf4a6949d0f6f71daca6744979174e2 Author: Lai Jiangshan Date: Thu Jun 3 13:24:55 2021 +0800 KVM: X86: MMU: Use the correct inherited permissions to get shadow page commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream. When computing the access permissions of a shadow page, use the effective permissions of the walk up to that point, i.e. the logic AND of its parents' permissions. Two guest PxE entries that point at the same table gfn need to be shadowed with different shadow pages if their parents' permissions are different. KVM currently uses the effective permissions of the last non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have full ("uwx") permissions, and the effective permissions are recorded only in role.access and merged into the leaves, this can lead to incorrect reuse of a shadow page and eventually to a missing guest protection page fault. For example, here is a shared pagetable: pgd[] pud[] pmd[] virtual address pointers /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--) /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-) pgd-| (shared pmd[] as above) \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--) \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--) pud1 and pud2 point to the same pmd table, so: - ptr1 and ptr3 points to the same page. - ptr2 and ptr4 points to the same page. (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries) - First, the guest reads from ptr1 first and KVM prepares a shadow page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1. "u--" comes from the effective permissions of pgd, pud1 and pmd1, which are stored in pt->access. "u--" is used also to get the pagetable for pud1, instead of "uw-". - Then the guest writes to ptr2 and KVM reuses pud1 which is present. The hypervisor set up a shadow page for ptr2 with pt->access is "uw-" even though the pud1 pmd (because of the incorrect argument to kvm_mmu_get_page in the previous step) has role.access="u--". - Then the guest reads from ptr3. The hypervisor reuses pud1's shadow pmd for pud2, because both use "u--" for their permissions. Thus, the shadow pmd already includes entries for both pmd1 and pmd2. - At last, the guest writes to ptr4. This causes no vmexit or pagefault, because pud1's shadow page structures included an "uw-" page even though its role.access was "u--". Any kind of shared pagetable might have the similar problem when in virtual machine without TDP enabled if the permissions are different from different ancestors. In order to fix the problem, we change pt->access to be an array, and any access in it will not include permissions ANDed from child ptes. The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/ Remember to test it with TDP disabled. The problem had existed long before the commit 41074d07c78b ("KVM: MMU: Fix inherited permissions for emulated guest pte updates"), and it is hard to find which is the culprit. So there is no fixes tag here. Signed-off-by: Lai Jiangshan Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") Signed-off-by: Paolo Bonzini [OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm - apply documentation changes to Documentation/virt/kvm/mmu.txt - adjusted context in arch/x86/kvm/paging_tmpl.h] Signed-off-by: Ovidiu Panait Signed-off-by: Greg Kroah-Hartman commit 5f4ab7e25fbb57a6e2e927ba5d09675b332c833e Author: Wesley Cheng Date: Thu Aug 12 20:16:52 2021 +0300 usb: dwc3: gadget: Avoid runtime resume if disabling pullup [ Upstream commit cb10f68ad8150f243964b19391711aaac5e8ff42 ] If the device is already in the runtime suspended state, any call to the pullup routine will issue a runtime resume on the DWC3 core device. If the USB gadget is disabling the pullup, then avoid having to issue a runtime resume, as DWC3 gadget has already been halted/stopped. This fixes an issue where the following condition occurs: usb_gadget_remove_driver() -->usb_gadget_disconnect() -->dwc3_gadget_pullup(0) -->pm_runtime_get_sync() -> ret = 0 -->pm_runtime_put() [async] -->usb_gadget_udc_stop() -->dwc3_gadget_stop() -->dwc->gadget_driver = NULL ... dwc3_suspend_common() -->dwc3_gadget_suspend() -->DWC3 halt/stop routine skipped, driver_data == NULL This leads to a situation where the DWC3 gadget is not properly stopped, as the runtime resume would have re-enabled EP0 and event interrupts, and since we avoided the DWC3 gadget suspend, these resources were never disabled. Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded") Cc: stable Acked-by: Felipe Balbi Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1628058245-30692-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman commit 1782c4af6bd017601d30880811f8a1c17f4b2e3a Author: Wesley Cheng Date: Thu Aug 12 20:16:51 2021 +0300 usb: dwc3: gadget: Disable gadget IRQ during pullup disable [ Upstream commit 8212937305f84ef73ea81036dafb80c557583d4b ] Current sequence utilizes dwc3_gadget_disable_irq() alongside synchronize_irq() to ensure that no further DWC3 events are generated. However, the dwc3_gadget_disable_irq() API only disables device specific events. Endpoint events can still be generated. Briefly disable the interrupt line, so that the cleanup code can run to prevent device and endpoint events. (i.e. __dwc3_gadget_stop() and dwc3_stop_active_transfers() respectively) Without doing so, it can lead to both the interrupt handler and the pullup disable routine both writing to the GEVNTCOUNT register, which will cause an incorrect count being read from future interrupts. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1621571037-1424-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman commit 54b7022f2878385868d321aad8410b2339c070ee Author: Wesley Cheng Date: Thu Aug 12 20:16:50 2021 +0300 usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable [ Upstream commit 5aef629704ad4d983ecf5c8a25840f16e45b6d59 ] Ensure that dep->flags are cleared until after stop active transfers is completed. Otherwise, the ENDXFER command will not be executed during ep disable. Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers") Cc: stable Reported-and-tested-by: Andy Shevchenko Tested-by: Marek Szyprowski Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1616610664-16495-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman commit e36245a68eb1e6b6fe1a64568407cfb5c9b02a69 Author: Wesley Cheng Date: Thu Aug 12 20:16:49 2021 +0300 usb: dwc3: gadget: Prevent EP queuing while stopping transfers [ Upstream commit f09ddcfcb8c569675066337adac2ac205113471f ] In the situations where the DWC3 gadget stops active transfers, once calling the dwc3_gadget_giveback(), there is a chance where a function driver can queue a new USB request in between the time where the dwc3 lock has been released and re-aquired. This occurs after we've already issued an ENDXFER command. When the stop active transfers continues to remove USB requests from all dep lists, the newly added request will also be removed, while controller still has an active TRB for it. This can lead to the controller accessing an unmapped memory address. Fix this by ensuring parameters to prevent EP queuing are set before calling the stop active transfers API. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1615507142-23097-1-git-send-email-wcheng@codeaurora.org Cc: stable Signed-off-by: Greg Kroah-Hartman commit 823f692508634f0481cdaaf34a82346a55a05d88 Author: Wesley Cheng Date: Thu Aug 12 20:16:48 2021 +0300 usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup [ Upstream commit a1383b3537a7bea1c213baa7878ccc4ecf4413b5 ] usb_gadget_deactivate/usb_gadget_activate does not execute the UDC start operation, which may leave EP0 disabled and event IRQs disabled when re-activating the function. Move the enabling/disabling of USB EP0 and device event IRQs to be performed in the pullup routine. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Tested-by: Michael Tretter Cc: stable Reported-by: Michael Tretter Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1609282837-21666-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman commit 25a0625fa96fe76aadaddfc85f9ddc1e4396ad1f Author: Wesley Cheng Date: Thu Aug 12 20:16:47 2021 +0300 usb: dwc3: gadget: Allow runtime suspend if UDC unbinded [ Upstream commit 77adb8bdf4227257e26b7ff67272678e66a0b250 ] The DWC3 runtime suspend routine checks for the USB connected parameter to determine if the controller can enter into a low power state. The connected state is only set to false after receiving a disconnect event. However, in the case of a device initiated disconnect (i.e. UDC unbind), the controller is halted and a disconnect event is never generated. Set the connected flag to false if issuing a device initiated disconnect to allow the controller to be suspended. Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/1609283136-22140-2-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman commit 5f081a928d5533dfaf858409ee1dfff151efb5cd Author: Wesley Cheng Date: Thu Aug 12 20:16:46 2021 +0300 usb: dwc3: Stop active transfers before halting the controller [ Upstream commit ae7e86108b12351028fa7e8796a59f9b2d9e1774 ] In the DWC3 databook, for a device initiated disconnect or bus reset, the driver is required to send dependxfer commands for any pending transfers. In addition, before the controller can move to the halted state, the SW needs to acknowledge any pending events. If the controller is not halted properly, there is a chance the controller will continue accessing stale or freed TRBs and buffers. Signed-off-by: Wesley Cheng Reviewed-by: Thinh Nguyen Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit 396f29ea0cd2a89cea8a7ad39533e31b55ba478e Author: Masami Hiramatsu Date: Wed Jul 28 07:55:43 2021 +0900 tracing: Reject string operand in the histogram expression commit a9d10ca4986571bffc19778742d508cc8dd13e02 upstream. Since the string type can not be the target of the addition / subtraction operation, it must be rejected. Without this fix, the string type silently converted to digits. Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 28276c280f2e757d3415e945f4ceab8ba3d0a9a8 Author: Alexandre Courbot Date: Thu Aug 27 14:49:45 2020 +0200 media: v4l2-mem2mem: always consider OUTPUT queue during poll commit 566463afdbc43c7744c5a1b89250fc808df03833 upstream. If poll() is called on a m2m device with the EPOLLOUT event after the last buffer of the CAPTURE queue is dequeued, any buffer available on OUTPUT queue will never be signaled because v4l2_m2m_poll_for_data() starts by checking whether dst_q->last_buffer_dequeued is set and returns EPOLLIN in this case, without looking at the state of the OUTPUT queue. Fix this by not early returning so we keep checking the state of the OUTPUT queue afterwards. Signed-off-by: Alexandre Courbot Reviewed-by: Ezequiel Garcia Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Cc: Lecopzer Chen Signed-off-by: Greg Kroah-Hartman commit 236aca70929db3aa87dfb8fbc5329aa027a62e91 Author: Sumit Garg Date: Mon Jun 14 17:33:15 2021 -0500 tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag [ Upstream commit 376e4199e327a5cf29b8ec8fb0f64f3d8b429819 ] Currently TEE_SHM_DMA_BUF flag has been inappropriately used to not register shared memory allocated for private usage by underlying TEE driver: OP-TEE in this case. So rather add a new flag as TEE_SHM_PRIV that can be utilized by underlying TEE drivers for private allocation and usage of shared memory. With this corrected, allow tee_shm_alloc_kernel_buf() to allocate a shared memory region without the backing of dma-buf. Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg Co-developed-by: Tyler Hicks Signed-off-by: Tyler Hicks Reviewed-by: Jens Wiklander Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander Signed-off-by: Sasha Levin commit 5b774238e8afade606657196d68092e1d2028a4f Author: Sean Christopherson Date: Tue Aug 3 09:27:46 2021 -0700 KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB [ Upstream commit 179c6c27bf487273652efc99acd3ba512a23c137 ] Use the raw ASID, not ASID-1, when nullifying the last used VMCB when freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by the raw ASID, thus KVM could get a false negative when checking for a different VMCB if KVM manages to reallocate the same ASID+VMCB combo for a new VM. Note, this cannot cause a functional issue _in the current code_, as pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the last_cpu variable. Thus it's theoretically possible that older versions of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID and VMCB exactly match those of a prior VM. Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky Cc: Brijesh Singh Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin