commit 665c6ff082e214537beef2e39ec366cddf446d52 Author: Greg Kroah-Hartman Date: Wed Oct 14 11:56:00 2020 +0200 Linux 5.8.15 Tested-by: Jeffrin Jose T Tested-by: Jon Hunter Tested-by: Linux Kernel Functional Testing Tested-by: Guenter Roeck Tested-by: Shuah Khan Link: https://lore.kernel.org/r/20201012133146.834528783@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman commit 03b7311c2d351647c43ab9c4451e1e2b782c4252 Author: Cong Wang Date: Tue Sep 22 20:56:24 2020 -0700 net_sched: commit action insertions together commit 0fedc63fadf0404a729e73a35349481c8009c02f upstream. syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free. Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more. One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker(). Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Vlad Buslov Cc: Jamal Hadi Salim Cc: Jiri Pirko Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1e02bbf908d35c98aa673ea7fdd7d1de8bcc0adc Author: Cong Wang Date: Tue Sep 22 20:56:23 2020 -0700 net_sched: defer tcf_idr_insert() in tcf_action_init_1() commit e49d8c22f1261c43a986a7fdbf677ac309682a07 upstream. All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov Cc: Jamal Hadi Salim Cc: Jiri Pirko Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b6a788af71ed94281fa21e32a72eb003cb4f1edf Author: Manivannan Sadhasivam Date: Sat Sep 26 22:26:25 2020 +0530 net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks commit a7809ff90ce6c48598d3c4ab54eb599bec1e9c42 upstream. The rcu read locks are needed to avoid potential race condition while dereferencing radix tree from multiple threads. The issue was identified by syzbot. Below is the crash report: ============================= WARNING: suspicious RCU usage 5.7.0-syzkaller #0 Not tainted ----------------------------- include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u4:1/21: #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline] #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241 #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243 stack backtrace: CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: qrtr_ns_handler qrtr_ns_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1e9/0x30e lib/dump_stack.c:118 radix_tree_deref_slot include/linux/radix-tree.h:176 [inline] ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline] qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674 process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268 worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414 kthread+0x353/0x380 kernel/kthread.c:268 Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com Signed-off-by: Manivannan Sadhasivam Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 691847cc626c9c4e8465fe02d3a0659ca4e1203f Author: Anant Thazhemadam Date: Mon Oct 5 18:59:58 2020 +0530 net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails commit f45a4248ea4cc13ed50618ff066849f9587226b2 upstream. When get_registers() fails in set_ethernet_addr(),the uninitialized value of node_id gets copied over as the address. So, check the return value of get_registers(). If get_registers() executed successfully (i.e., it returns sizeof(node_id)), copy over the MAC address using ether_addr_copy() (instead of using memcpy()). Else, if get_registers() failed instead, a randomly generated MAC address is set as the MAC address instead. Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Acked-by: Petko Manolov Signed-off-by: Anant Thazhemadam Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 624143319921b3561c8c79c135eb3d6f76c91b2b Author: Xiongfeng Wang Date: Tue Jul 21 22:24:24 2020 -0700 Input: ati_remote2 - add missing newlines when printing module parameters commit 37bd9e803daea816f2dc2c8f6dc264097eb3ebd2 upstream. When I cat some module parameters by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/module/ati_remote2/parameters/mode_mask 0x1froot@syzkaller:~# cat /sys/module/ati_remote2/parameters/channel_mask 0xffffroot@syzkaller:~# Signed-off-by: Xiongfeng Wang Link: https://lore.kernel.org/r/20200720092148.9320-1-wangxiongfeng2@huawei.com Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit 2cdb64863860d7e47bedbf92edafb194c513880b Author: Alexey Kardashevskiy Date: Wed Jun 17 17:04:44 2020 +1000 tty/vt: Do not warn when huge selection requested commit 44c413d9a51752056d606bf6f312003ac1740fab upstream. The tty TIOCL_SETSEL ioctl allocates a memory buffer big enough for text selection area. The maximum allowed console size is VC_RESIZE_MAXCOL * VC_RESIZE_MAXROW == 32767*32767 == ~1GB and typical MAX_ORDER is set to allow allocations lot less than than (circa 16MB). So it is quite possible to trigger huge allocation (and syzkaller just did that) which is going to fail (which is fine) with a backtrace in mm/page_alloc.c at WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)) and this may trigger panic (if panic_on_warn is enabled) and leak kernel addresses to dmesg. This passes __GFP_NOWARN to kmalloc_array to avoid unnecessary user- triggered WARN_ON. Note that the error is not ignored and the warning is still printed. Signed-off-by: Alexey Kardashevskiy Link: https://lore.kernel.org/r/20200617070444.116704-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman commit af2c68e241ba6b816f1622f7a071e96e425349fa Author: Aya Levin Date: Sun Aug 9 12:34:21 2020 +0300 net/mlx5e: Fix driver's declaration to support GRE offload commit 3d093bc2369003b4ce6c3522d9b383e47c40045d upstream. Declare GRE offload support with respect to the inner protocol. Add a list of supported inner protocols on which the driver can offload checksum and GSO. For other protocols, inform the stack to do the needed operations. There is no noticeable impact on GRE performance. Fixes: 2729984149e6 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 13e623dc27722668975bc315e55903bccadb3969 Author: Rohit Maheshwari Date: Thu Sep 24 12:28:45 2020 +0530 net/tls: race causes kernel panic commit 38f7e1c0c43dd25b06513137bb6fd35476f9ec6d upstream. BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S 5.9.0-rc3+ #1 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018 Workqueue: events tx_work_handler [tls] RIP: 0010:tx_work_handler+0x1b/0x70 [tls] Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38 FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0x1a7/0x370 worker_thread+0x30/0x370 ? process_one_work+0x370/0x370 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 tls_sw_release_resources_tx() waits for encrypt_pending, which can have race, so we need similar changes as in commit 0cada33241d9de205522e3858b18e506ca5cce2c here as well. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Rohit Maheshwari Acked-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d1a1891a586561b3085bf92d6dfbc0698000cb01 Author: Nikolay Aleksandrov Date: Mon Sep 28 18:30:02 2020 +0300 net: bridge: fdb: don't flush ext_learn entries commit f2f3729fb65c5c2e6db234e6316b71a7bdc4b30b upstream. When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before. Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 54d2034e1d130d90b729ddd75d58bd1ebf8bff81 Author: Guillaume Nault Date: Fri Oct 2 21:53:08 2020 +0200 net/core: check length before updating Ethertype in skb_mpls_{push,pop} commit 4296adc3e32f5d544a95061160fe7e127be1b9ff upstream. Openvswitch allows to drop a packet's Ethernet header, therefore skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true and mac_len=0. In that case the pointer passed to skb_mod_eth_type() doesn't point to an Ethernet header and the new Ethertype is written at unexpected locations. Fix this by verifying that mac_len is big enough to contain an Ethernet header. Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions") Signed-off-by: Guillaume Nault Acked-by: Davide Caratti Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 912721b3ad7206d109162e679b0e07b496c7463e Author: Johannes Berg Date: Fri Oct 2 09:46:04 2020 +0200 netlink: fix policy dump leak commit a95bc734e60449e7b073ff7ff70c35083b290ae9 upstream. If userspace doesn't complete the policy dump, we leak the allocated state. Fix this. Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 85355299d6fa4207093d3cb7c9afe6f926a40e0c Author: Eric Dumazet Date: Mon Oct 5 06:48:13 2020 -0700 tcp: fix receive window update in tcp_add_backlog() commit 86bccd0367130f481ca99ba91de1c6a5aa1c78c1 upstream. We got reports from GKE customers flows being reset by netfilter conntrack unless nf_conntrack_tcp_be_liberal is set to 1. Traces seemed to suggest ACK packet being dropped by the packet capture, or more likely that ACK were received in the wrong order. wscale=7, SYN and SYNACK not shown here. This ACK allows the sender to send 1871*128 bytes from seq 51359321 : New right edge of the window -> 51359321+1871*128=51598809 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 Receiver now allows to send 606*128=77568 from seq 51521241 : New right edge of the window -> 51521241+606*128=51598809 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 It seems the sender exceeds RWIN allowance, since 51611353 > 51598809 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0 netfilter conntrack is not happy and sends RST 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0 Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. New right edge of the window -> 51521241+859*128=51631193 Normally TCP stack handles OOO packets just fine, but it turns out tcp_add_backlog() does not. It can update the window field of the aggregated packet even if the ACK sequence of the last received packet is too old. Many thanks to Alexandre Ferrieux for independently reporting the issue and suggesting a fix. Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet Reported-by: Alexandre Ferrieux Acked-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a4c5f912c926148fd424db44a844df8f6aceb717 Author: Vijay Balakrishna Date: Sat Oct 10 23:16:40 2020 -0700 mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged commit 4aab2be0983031a05cb4a19696c9da5749523426 upstream. When memory is hotplug added or removed the min_free_kbytes should be recalculated based on what is expected by khugepaged. Currently after hotplug, min_free_kbytes will be set to a lower default and higher default set when THP enabled is lost. This change restores min_free_kbytes as expected for THP consumers. [vijayb@linux.microsoft.com: v5] Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com Fixes: f000565adb77 ("thp: set recommended min free kbytes") Signed-off-by: Vijay Balakrishna Signed-off-by: Andrew Morton Reviewed-by: Pavel Tatashin Acked-by: Michal Hocko Cc: Allen Pais Cc: Andrea Arcangeli Cc: "Kirill A. Shutemov" Cc: Oleg Nesterov Cc: Song Liu Cc: Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0d600018dde7ab4ae6d9625daa45826acef6bd61 Author: Minchan Kim Date: Sat Oct 10 23:16:37 2020 -0700 mm: validate inode in mapping_set_error() commit 8b7b2eb131d3476062ffd34358785b44be25172f upstream. The swap address_space doesn't have host. Thus, it makes kernel crash once swap write meets error. Fix it. Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs") Signed-off-by: Minchan Kim Signed-off-by: Andrew Morton Acked-by: Jeff Layton Cc: Jan Kara Cc: Andres Freund Cc: Matthew Wilcox Cc: Al Viro Cc: Christoph Hellwig Cc: Dave Chinner Cc: David Howells Cc: Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 270974601ea5ad2e23cdd95b3a6e71e6a762eede Author: Coly Li Date: Fri Oct 2 09:38:52 2020 +0800 mmc: core: don't set limits.discard_granularity as 0 [ Upstream commit 4243219141b67d7c2fdb2d8073c17c539b9263eb ] In mmc_queue_setup_discard() the mmc driver queue's discard_granularity might be set as 0 (when card->pref_erase > max_discard) while the mmc device still declares to support discard operation. This is buggy and triggered the following kernel warning message, WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294 CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1 Hardware name: Google Kevin (DT) pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--) pc : __blkdev_issue_discard+0x200/0x294 lr : __blkdev_issue_discard+0x54/0x294 sp : ffff800011dd3b10 x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace: __blkdev_issue_discard+0x200/0x294 __submit_discard_cmd+0x128/0x374 __issue_discard_cmd_orderly+0x188/0x244 __issue_discard_cmd+0x2e8/0x33c issue_discard_thread+0xe8/0x2f0 kthread+0x11c/0x120 ret_from_fork+0x10/0x1c ---[ end trace e4c8023d33dfe77a ]--- This patch fixes the issue by setting discard_granularity as SECTOR_SIZE instead of 0 when (card->pref_erase > max_discard) is true. Now no more complain from __blkdev_issue_discard() for the improper value of discard granularity. This issue is exposed after commit b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag is also added for the commit to make sure people won't miss this patch after applying the change of __blkdev_issue_discard(). Fixes: e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout") Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()"). Reported-and-tested-by: Vicente Bergas Signed-off-by: Coly Li Acked-by: Adrian Hunter Cc: Ulf Hansson Link: https://lore.kernel.org/r/20201002013852.51968-1-colyli@suse.de Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin commit 23030fd9134878832601473564d02fafd0328346 Author: Kajol Jain Date: Thu Aug 27 12:17:32 2020 +0530 perf: Fix task_function_call() error handling [ Upstream commit 6d6b8b9f4fceab7266ca03d194f60ec72bd4b654 ] The error handling introduced by commit: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()") looses any return value from smp_call_function_single() that is not {0, -EINVAL}. This is a problem because it will return -EXNIO when the target CPU is offline. Worse, in that case it'll turn into an infinite loop. Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()") Reported-by: Srikar Dronamraju Signed-off-by: Kajol Jain Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Barret Rhoden Tested-by: Srikar Dronamraju Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin commit 02b573f11b1c141cf9f211a0a00807e17c39ffe7 Author: David Howells Date: Wed Oct 7 14:22:12 2020 +0100 afs: Fix deadlock between writeback and truncate [ Upstream commit ec0fa0b659144d9c68204d23f627b6a65fa53e50 ] The afs filesystem has a lock[*] that it uses to serialise I/O operations going to the server (vnode->io_lock), as the server will only perform one modification operation at a time on any given file or directory. This prevents the the filesystem from filling up all the call slots to a server with calls that aren't going to be executed in parallel anyway, thereby allowing operations on other files to obtain slots. [*] Note that is probably redundant for directories at least since i_rwsem is used to serialise directory modifications and lookup/reading vs modification. The server does allow parallel non-modification ops, however. When a file truncation op completes, we truncate the in-memory copy of the file to match - but we do it whilst still holding the io_lock, the idea being to prevent races with other operations. However, if writeback starts in a worker thread simultaneously with truncation (whilst notify_change() is called with i_rwsem locked, writeback pays it no heed), it may manage to set PG_writeback bits on the pages that will get truncated before afs_setattr_success() manages to call truncate_pagecache(). Truncate will then wait for those pages - whilst still inside io_lock: # cat /proc/8837/stack [<0>] wait_on_page_bit_common+0x184/0x1e7 [<0>] truncate_inode_pages_range+0x37f/0x3eb [<0>] truncate_pagecache+0x3c/0x53 [<0>] afs_setattr_success+0x4d/0x6e [<0>] afs_wait_for_operation+0xd8/0x169 [<0>] afs_do_sync_operation+0x16/0x1f [<0>] afs_setattr+0x1fb/0x25d [<0>] notify_change+0x2cf/0x3c4 [<0>] do_truncate+0x7f/0xb2 [<0>] do_sys_ftruncate+0xd1/0x104 [<0>] do_syscall_64+0x2d/0x3a [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The writeback operation, however, stalls indefinitely because it needs to get the io_lock to proceed: # cat /proc/5940/stack [<0>] afs_get_io_locks+0x58/0x1ae [<0>] afs_begin_vnode_operation+0xc7/0xd1 [<0>] afs_store_data+0x1b2/0x2a3 [<0>] afs_write_back_from_locked_page+0x418/0x57c [<0>] afs_writepages_region+0x196/0x224 [<0>] afs_writepages+0x74/0x156 [<0>] do_writepages+0x2d/0x56 [<0>] __writeback_single_inode+0x84/0x207 [<0>] writeback_sb_inodes+0x238/0x3cf [<0>] __writeback_inodes_wb+0x68/0x9f [<0>] wb_writeback+0x145/0x26c [<0>] wb_do_writeback+0x16a/0x194 [<0>] wb_workfn+0x74/0x177 [<0>] process_one_work+0x174/0x264 [<0>] worker_thread+0x117/0x1b9 [<0>] kthread+0xec/0xf1 [<0>] ret_from_fork+0x1f/0x30 and thus deadlock has occurred. Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact that the caller is holding i_rwsem doesn't preclude more pages being dirtied through an mmap'd region. Fix this by: (1) Use the vnode validate_lock to mediate access between afs_setattr() and afs_writepages(): (a) Exclusively lock validate_lock in afs_setattr() around the whole RPC operation. (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to shared-lock validate_lock and returning immediately if we couldn't get it. (c) If WB_SYNC_ALL is set, wait for the lock. The validate_lock is also used to validate a file and to zap its cache if the file was altered by a third party, so it's probably a good fit for this. (2) Move the truncation outside of the io_lock in setattr, using the same hook as is used for local directory editing. This requires the old i_size to be retained in the operation record as we commit the revised status to the inode members inside the io_lock still, but we still need to know if we reduced the file size. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 29c60e82c6a5461a789f6696bf4a66ad347f7fd3 Author: Vladimir Oltean Date: Mon Oct 5 12:09:11 2020 +0300 net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP [ Upstream commit 601e984f23abcaa7cf3eb078c13de4db3cf6a4f0 ] Tail dropping is enabled for a port when: 1. A source port consumes more packet buffers than the watermark encoded in SYS:PORT:ATOP_CFG.ATOP. AND 2. Total memory use exceeds the consumption watermark encoded in SYS:PAUSE_CFG:ATOP_TOT_CFG. The unit of these watermarks is a 60 byte memory cell. That unit is programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when written into ATOP, it would get truncated and wrap around. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 9fd541ad02bd2bb34a1f931068afd5313336d4e4 Author: Maxim Kochetkov Date: Mon Jul 13 19:57:08 2020 +0300 net: mscc: ocelot: extend watermark encoding function [ Upstream commit aa92d836d5c40a7e21e563a272ad177f1bfd44dd ] The ocelot_wm_encode function deals with setting thresholds for pause frame start and stop. In Ocelot and Felix the register layout is the same, but for Seville, it isn't. The easiest way to accommodate Seville hardware configuration is to introduce a function pointer for setting this up. Signed-off-by: Maxim Kochetkov Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 13c1167842508e38b97de2bd9c720c6e04fe0994 Author: Vladimir Oltean Date: Mon Jul 13 19:57:05 2020 +0300 net: mscc: ocelot: split writes to pause frame enable bit and to thresholds [ Upstream commit e8e6e73db14273464b374d49ca7242c0994945f3 ] We don't want ocelot_port_set_maxlen to enable pause frame TX, just to adjust the pause thresholds. Move the unconditional enabling of pause TX to ocelot_init_port. There is no good place to put such setting because it shouldn't be unconditional. But at the moment it is, we're not changing that. Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 43e89f7e3c98ad1bd42e1e234a60b880e97de99c Author: Vladimir Oltean Date: Sat Jun 20 18:43:39 2020 +0300 net: mscc: ocelot: rename ocelot_board.c to ocelot_vsc7514.c [ Upstream commit 589aa6e7c9de322d47eb33a5cee8cc38838319e6 ] To follow the model of felix and seville where we have one platform-specific file, rename this file to the actual SoC it serves. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 78272109f44d9bf3925da12ce1fa62c644a1af6a Author: David Howells Date: Fri Oct 2 14:04:51 2020 +0100 rxrpc: Fix server keyring leak [ Upstream commit 38b1dc47a35ba14c3f4472138ea56d014c2d609b ] If someone calls setsockopt() twice to set a server key keyring, the first keyring is leaked. Fix it to return an error instead if the server key keyring is already set. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells Signed-off-by: Sasha Levin commit bf123536563755173a8879ebc9a0319189c1f1ad Author: David Howells Date: Wed Sep 30 19:52:08 2020 +0100 rxrpc: The server keyring isn't network-namespaced [ Upstream commit fea99111244bae44e7d82a973744d27ea1567814 ] The keyring containing the server's tokens isn't network-namespaced, so it shouldn't be looked up with a network namespace. It is expected to be owned specifically by the server, so namespacing is unnecessary. Fixes: a58946c158a0 ("keys: Pass the network namespace into request_key mechanism") Signed-off-by: David Howells Signed-off-by: Sasha Levin commit 0fb27a1f99c1b16fef099258b7887ea74e4a7727 Author: David Howells Date: Thu Oct 1 11:57:40 2020 +0100 rxrpc: Fix some missing _bh annotations on locking conn->state_lock [ Upstream commit fa1d113a0f96f9ab7e4fe4f8825753ba1e34a9d3 ] conn->state_lock may be taken in softirq mode, but a previous patch replaced an outer lock in the response-packet event handling code, and lost the _bh from that when doing so. Fix this by applying the _bh annotation to the state_lock locking. Fixes: a1399f8bb033 ("rxrpc: Call channels should have separate call number spaces") Signed-off-by: David Howells Signed-off-by: Sasha Levin commit 6343a701ca68d532d858cc3ca164aed8da624de4 Author: David Howells Date: Tue Sep 8 22:09:04 2020 +0100 rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() [ Upstream commit 9a059cd5ca7d9c5c4ca5a6e755cf72f230176b6a ] If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a type it doesn't recognise, it can BUG in a couple of places, which is unnecessary as it can easily get back to userspace. Fix this to print an error message instead. Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)") Signed-off-by: David Howells Signed-off-by: Sasha Levin commit 3a15888ff3dfb62a6839d1dcb08bf90aa33ab1d3 Author: Marc Dionne Date: Fri Sep 4 14:01:24 2020 -0300 rxrpc: Fix rxkad token xdr encoding [ Upstream commit 56305118e05b2db8d0395bba640ac9a3aee92624 ] The session key should be encoded with just the 8 data bytes and no length; ENCODE_DATA precedes it with a 4 byte length, which confuses some existing tools that try to parse this format. Add an ENCODE_BYTES macro that does not include a length, and use it for the key. Also adjust the expected length. Note that commit 774521f353e1d ("rxrpc: Fix an assertion in rxrpc_read()") had fixed a BUG by changing the length rather than fixing the encoding. The original length was correct. Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)") Signed-off-by: Marc Dionne Signed-off-by: David Howells Signed-off-by: Sasha Levin commit 41d0598c0f437a39c10cbfc1ff12724f15568d28 Author: Tom Rix Date: Sat Oct 3 11:51:21 2020 -0700 net: mvneta: fix double free of txq->buf [ Upstream commit f4544e5361da5050ff5c0330ceea095cb5dbdd72 ] clang static analysis reports this problem: drivers/net/ethernet/marvell/mvneta.c:3465:2: warning: Attempt to free released memory kfree(txq->buf); ^~~~~~~~~~~~~~~ When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs, it frees without poisoning txq->buf. The error is caught in the mvneta_setup_txqs() caller which handles the error by cleaning up all of the txqs with a call to mvneta_txq_sw_deinit which also frees txq->buf. Since mvneta_txq_sw_deinit is a general cleaner, all of the partial cleaning in mvneta_txq_sw_deinit()'s error handling is not needed. Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO") Signed-off-by: Tom Rix Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit d5c6f130b6f0f63e256ed64b37f735173b9999d9 Author: Si-Wei Liu Date: Sat Oct 3 01:02:10 2020 -0400 vhost-vdpa: fix page pinning leakage in error path [ Upstream commit 7ed9e3d97c32d969caded2dfb6e67c1a2cc5a0b1 ] Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. As the inflight pinned pages, specifically for memory region that strides across multiple chunks, would need more than one free page for book keeping and accounting. For simplicity, pin pages for all memory in the IOVA range in one go rather than have multiple pin_user_pages calls to make up the entire region. This way it's easier to track and account the pages already mapped, particularly for clean-up in the error path. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin commit ec7257845d40ee81ed18f67d97df87e7107d4cc4 Author: Si-Wei Liu Date: Sat Oct 3 01:02:09 2020 -0400 vhost-vdpa: fix vhost_vdpa_map() on error condition [ Upstream commit 1477c8aebb94a1db398c12d929a9d27bbd678d8c ] vhost_vdpa_map() should remove the iotlb entry just added if the corresponding mapping fails to set up properly. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin commit 72d41c97e736bfa7acf91b91d6f3c27a9499f2aa Author: Randy Dunlap Date: Thu Oct 1 10:54:49 2020 -0700 net: hinic: fix DEVLINK build errors [ Upstream commit 1f7e877c20517735bceff1535e1b7fa846b2f215 ] Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK. ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler': hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump': hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put' ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc': hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free': hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register': hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister': hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create': hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy': Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface") Signed-off-by: Randy Dunlap Cc: Bin Luo Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Aviad Krawczyk Cc: Zhao Chen Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit a974b4bddae366ae31cbf6d3d92d2d7e4d7769fb Author: Vineetha G. Jaya Kumaran Date: Thu Oct 1 23:56:09 2020 +0800 net: stmmac: Modify configuration method of EEE timers [ Upstream commit 388e201d41fa1ed8f2dce0f0567f56f8e919ffb0 ] Ethtool manual stated that the tx-timer is the "the amount of time the device should stay in idle mode prior to asserting its Tx LPI". The previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW timer duration which is not correct. Hence, this patch fixes the "ethtool --set-eee tx-timer" to configure the EEE LPI timer. The LPI TW Timer will be using the defined default value instead of "ethtool --set-eee tx-timer" which follows the EEE LS timer implementation. Changelog V2 *Not removing/modifying the eee_timer. *EEE LPI timer can be configured through ethtool and also the eee_timer module param. *EEE TW Timer will be configured with default value only, not able to be configured through ethtool or module param. This follows the implementation of the EEE LS Timer. Fixes: d765955d2ae0 ("stmmac: add the Energy Efficient Ethernet support") Signed-off-by: Vineetha G. Jaya Kumaran Signed-off-by: Voon Weifeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit d0eb9588f724be8f192fff0760eb335434474f09 Author: Vlad Buslov Date: Sun Sep 20 19:59:08 2020 +0300 net/mlx5e: Fix race condition on nhe->n pointer in neigh update [ Upstream commit 1253935ad801485270194d5651acab04abc97b36 ] Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one): [ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]--- Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit eef0da1560402615a513ee9cc6d46bb1d3afffab Author: Aya Levin Date: Sun Sep 13 18:05:40 2020 +0300 net/mlx5e: Fix VLAN create flow [ Upstream commit d4a16052bccdd695982f89d815ca075825115821 ] When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does. Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode. Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit b6dc435f360340d574a8c7b2f3e42142bc99bbbc Author: Aya Levin Date: Sun Sep 13 17:57:23 2020 +0300 net/mlx5e: Fix VLAN cleanup flow [ Upstream commit 8c7353b6f716436ad0bfda2b5c5524ab2dde5894 ] Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table. The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode. mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]--- Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit f2140d0c6b93ae21ba90ca6977b97987eabebd30 Author: Aya Levin Date: Wed Aug 12 10:44:36 2020 +0300 net/mlx5e: Fix return status when setting unsupported FEC mode [ Upstream commit 2608a2f831c47dfdf18885a7289be5af97182b05 ] Verify the configured FEC mode is supported by at least a single link mode before applying the command. Otherwise fail the command and return "Operation not supported". Prior to this patch, the command was successful, yet it falsely set all link modes to FEC auto mode - like configuring FEC mode to auto. Auto mode is the default configuration if a link mode doesn't support the configured FEC mode. Fixes: b5ede32d3329 ("net/mlx5e: Add support for FEC modes based on 50G per lane links") Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 96e80a3466347105d8e04515219347b0d6525397 Author: Aya Levin Date: Mon Jul 20 16:53:18 2020 +0300 net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU [ Upstream commit c3c9402373fe20e2d08c04f437ce4dcd252cffb2 ] Prior to this fix, in Striding RQ mode the driver was vulnerable when receiving packets in the range (stride size - headroom, stride size]. Where stride size is calculated by mtu+headroom+tailroom aligned to the closest power of 2. Usually, this filtering is performed by the HW, except for a few cases: - Between 2 VFs over the same PF with different MTUs - On bluefield, when the host physical function sets a larger MTU than the ARM has configured on its representor and uplink representor. When the HW filtering is not present, packets that are larger than MTU might be harmful for the RQ's integrity, in the following impacts: 1) Overflow from one WQE to the next, causing a memory corruption that in most cases is unharmful: as the write happens to the headroom of next packet, which will be overwritten by build_skb(). In very rare cases, high stress/load, this is harmful. When the next WQE is not yet reposted and points to existing SKB head. 2) Each oversize packet overflows to the headroom of the next WQE. On the last WQE of the WQ, where addresses wrap-around, the address of the remainder headroom does not belong to the next WQE, but it is out of the memory region range. This results in a HW CQE error that moves the RQ into an error state. Solution: Add a page buffer at the end of each WQE to absorb the leak. Actually the maximal overflow size is headroom but since all memory units must be of the same size, we use page size to comply with UMR WQEs. The increase in memory consumption is of a single page per RQ. Initialize the mkey with all MTTs pointing to a default page. When the channels are activated, UMR WQEs will redirect the RX WQEs to the actual memory from the RQ's pool, while the overflow MTTs remain mapped to the default page. Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU") Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 4dc4c132f27fb1ba430f55039068d4f7d5becfc4 Author: Maor Gottlieb Date: Mon Aug 31 21:37:31 2020 +0300 net/mlx5: Fix request_irqs error flow [ Upstream commit 732ebfab7fe96b7ac9a3df3208f14752a4bb6db3 ] Fix error flow handling in request_irqs which try to free irq that we failed to request. It fixes the below trace. WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60 CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1 Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020 RIP: 0010:free_irq+0x4d/0x60 RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282 RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000 RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478 RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838 R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4 FS: 00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mlx5_irq_table_create+0x38d/0x400 [mlx5_core] ? atomic_notifier_chain_register+0x50/0x60 mlx5_load_one+0x7ee/0x1130 [mlx5_core] init_one+0x4c9/0x650 [mlx5_core] pci_device_probe+0xb8/0x120 driver_probe_device+0x2a1/0x470 ? driver_allows_async_probing+0x30/0x30 bus_for_each_drv+0x54/0x80 __device_attach+0xa3/0x100 pci_bus_add_device+0x4a/0x90 pci_iov_add_virtfn+0x2dc/0x2f0 pci_enable_sriov+0x32e/0x420 mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core] ? kstrtoll+0x22/0x70 num_vf_store+0x4b/0x70 [mlx5_core] kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x140 ? rcu_all_qs+0x5/0x80 ? _cond_resched+0x15/0x30 ? __sb_start_write+0x41/0x80 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x60/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 24163189da48 ("net/mlx5: Separate IRQ request/free from EQ life cycle") Signed-off-by: Maor Gottlieb Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 91ddbc505218f20fd858dec7f7152beca64e928c Author: Eran Ben Elisha Date: Mon Aug 31 15:04:35 2020 +0300 net/mlx5: Add retry mechanism to the command entry index allocation [ Upstream commit 410bd754cd73c4a2ac3856d9a03d7b08f9c906bf ] It is possible that new command entry index allocation will temporarily fail. The new command holds the semaphore, so it means that a free entry should be ready soon. Add one second retry mechanism before returning an error. Patch "net/mlx5: Avoid possible free of command entry while timeout comp handler" increase the possibility to bump into this temporarily failure as it delays the entry index release for non-callback commands. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 963f9da027303820d43cbf469b20dfdde48b1207 Author: Eran Ben Elisha Date: Tue Jul 21 10:25:52 2020 +0300 net/mlx5: poll cmd EQ in case of command timeout [ Upstream commit 1d5558b1f0de81f54ddee05f3793acc5260d107f ] Once driver detects a command interface command timeout, it warns the user and returns timeout error to the caller. In such case, the entry of the command is not evacuated (because only real event interrupt is allowed to clear command interface entry). If the HW event interrupt of this entry will never arrive, this entry will be left unused forever. Command interface entries are limited and eventually we can end up without the ability to post a new command. In addition, if driver will not consume the EQE of the lost interrupt and rearm the EQ, no new interrupts will arrive for other commands. Add a resiliency mechanism for manually polling the command EQ in case of a command timeout. In case resiliency mechanism will find non-handled EQE, it will consume it, and the command interface will be fully functional again. Once the resiliency flow finished, wait another 5 seconds for the command interface to complete for this command entry. Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow. Add an async EQ spinlock to avoid races between resiliency flows and real interrupts that might run simultaneously. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit da87ea137373689dec9d3fafa34a57787320a4b3 Author: Eran Ben Elisha Date: Tue Aug 4 10:40:21 2020 +0300 net/mlx5: Avoid possible free of command entry while timeout comp handler [ Upstream commit 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 ] Upon command completion timeout, driver simulates a forced command completion. In a rare case where real interrupt for that command arrives simultaneously, it might release the command entry while the forced handler might still access it. Fix that by adding an entry refcount, to track current amount of allowed handlers. Command entry to be released only when this refcount is decremented to zero. Command refcount is always initialized to one. For callback commands, command completion handler is the symmetric flow to decrement it. For non-callback commands, it is wait_func(). Before ringing the doorbell, increment the refcount for the real completion handler. Once the real completion handler is called, it will decrement it. For callback commands, once the delayed work is scheduled, increment the refcount. Upon callback command completion handler, we will try to cancel the timeout callback. In case of success, we need to decrement the callback refcount as it will never run. In addition, gather the entry index free and the entry free into a one flow for all command types release. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit eb50f5c289e60638a897de0c8fc6417630b064b0 Author: Eran Ben Elisha Date: Thu Aug 13 16:55:20 2020 +0300 net/mlx5: Fix a race when moving command interface to polling mode [ Upstream commit 432161ea26d6d5e5c3f7306d9407d26ed1e1953e ] As part of driver unload, it destroys the commands EQ (via FW command). As the commands EQ is destroyed, FW will not generate EQEs for any command that driver sends afterwards. Driver should poll for later commands status. Driver commands mode metadata is updated before the commands EQ is actually destroyed. This can lead for double completion handle by the driver (polling and interrupt), if a command is executed and completed by FW after the mode was changed, but before the EQ was destroyed. Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee that only DESTROY_EQ command can be executed during this time period. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 04f31610f34fb9a8493347e9b5fe5b77e8fd90ac Author: Qian Cai Date: Thu Oct 1 08:50:55 2020 -0400 pipe: Fix memory leaks in create_pipe_files() [ Upstream commit 8a018eb55e3ac033592afbcb476b0ffe64465b12 ] Calling pipe2() with O_NOTIFICATION_PIPE could results in memory leaks unless watch_queue_init() is successful. In case of watch_queue_init() failure in pipe2() we are left with inode and pipe_inode_info instances that need to be freed. That failure exit has been introduced in commit c73be61cede5 ("pipe: Add general notification queue support") and its handling should've been identical to nearby treatment of alloc_file_pseudo() failures - it is dealing with the same situation. As it is, the mainline kernel leaks in that case. Another problem is that CONFIG_WATCH_QUEUE and !CONFIG_WATCH_QUEUE cases are treated differently (and the former leaks just pipe_inode_info, the latter - both pipe_inode_info and inode). Fixed by providing a dummy wacth_queue_init() in !CONFIG_WATCH_QUEUE case and by having failures of wacth_queue_init() handled the same way we handle alloc_file_pseudo() ones. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Qian Cai Signed-off-by: Al Viro Signed-off-by: Sasha Levin commit ce1dde1980792a4e383ed9be12363a2e8bbe55a5 Author: Hariprasad Kelam Date: Wed Sep 30 21:39:35 2020 +0530 octeontx2-pf: Fix synchnorization issue in mbox [ Upstream commit 66a5209b53418111757716d71e52727b782eabd4 ] Mbox implementation in octeontx2 driver has three states alloc, send and reset in mbox response. VF allocate and sends message to PF for processing, PF ACKs them back and reset the mbox memory. In some case we see synchronization issue where after msgs_acked is incremented and before mbox_reset API is called, if current execution is scheduled out and a different thread is scheduled in which checks for msgs_acked. Since the new thread sees msgs_acked == msgs_sent it will try to allocate a new message and to send a new mbox message to PF.Now if mbox_reset is scheduled in, PF will see '0' in msgs_send. This patch fixes the issue by calling mbox_reset before incrementing msgs_acked flag for last processing message and checks for valid message size. Fixes: d424b6c02 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Hariprasad Kelam Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 5cfc870ede16e7c3d5ee0b82b9eb7badcbdf12e7 Author: Hariprasad Kelam Date: Wed Sep 30 21:39:14 2020 +0530 octeontx2-pf: Fix the device state on error [ Upstream commit 1ea0166da0509e987caa42c30a6a71f2c6ca1875 ] Currently in otx2_open on failure of nix_lf_start transmit queues are not stopped which are already started in link_event. Since the tx queues are not stopped network stack still try's to send the packets leading to driver crash while access the device resources. Fixes: 50fe6c02e ("octeontx2-pf: Register and handle link notifications") Signed-off-by: Hariprasad Kelam Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 7778b88602289d26c8ec61f01abea34f2504c283 Author: Geetha sowjanya Date: Wed Sep 30 21:38:52 2020 +0530 octeontx2-pf: Fix TCP/UDP checksum offload for IPv6 frames [ Upstream commit 89eae5e87b4fa799726a3e8911c90d418cb5d2b1 ] For TCP/UDP checksum offload feature in Octeontx2 expects L3TYPE to be set irrespective of IP header checksum is being offloaded or not. Currently for IPv6 frames L3TYPE is not being set resulting in packet drop with checksum error. This patch fixes this issue. Fixes: 3ca6c4c88 ("octeontx2-pf: Add packet transmission support") Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 921dfb5fec6bf8687aadf069f9481af130f2f1b2 Author: Subbaraya Sundeep Date: Wed Sep 30 21:38:27 2020 +0530 octeontx2-af: Fix enable/disable of default NPC entries [ Upstream commit e154b5b70368a84a19505a0be9b0096c66562b56 ] Packet replication feature present in Octeontx2 is a hardware linked list of PF and its VF interfaces so that broadcast packets are sent to all interfaces present in the list. It is driver job to add and delete a PF/VF interface to/from the list when the interface is brought up and down. This patch fixes the npc_enadis_default_entries function to handle broadcast replication properly if packet replication feature is present. Fixes: 40df309e4166 ("octeontx2-af: Support to enable/disable default MCAM entries") Signed-off-by: Subbaraya Sundeep Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b9f0dcfbfc07719be7cc732cda4e609280704605 Author: Willy Liu Date: Tue Sep 29 10:10:49 2020 +0800 net: phy: realtek: fix rtl8211e rx/tx delay config [ Upstream commit bbc4d71d63549bcd003a430de18a72a742d8c91e ] There are two chip pins named TXDLY and RXDLY which actually adds the 2ns delays to TXC and RXC for TXD/RXD latching. These two pins can config via 4.7k-ohm resistor to 3.3V hw setting, but also config via software setting (extension page 0xa4 register 0x1c bit13 12 and 11). The configuration register definitions from table 13 official PHY datasheet: PHYAD[2:0] = PHY Address AN[1:0] = Auto-Negotiation Mode = Interface Mode Select RX Delay = RX Delay TX Delay = TX Delay SELRGV = RGMII/GMII Selection This table describes how to config these hw pins via external pull-high or pull- low resistor. It is a misunderstanding that mapping it as register bits below: 8:6 = PHY Address 5:4 = Auto-Negotiation 3 = Interface Mode Select 2 = RX Delay 1 = TX Delay 0 = SELRGV So I removed these descriptions above and add related settings as below: 14 = reserved 13 = force Tx RX Delay controlled by bit12 bit11 12 = Tx Delay 11 = Rx Delay 10:0 = Test && debug settings reserved by realtek Test && debug settings are not recommend to modify by default. Fixes: f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx delays config") Signed-off-by: Willy Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 9d41929ceea93bfd20a143a343d4382133a9f455 Author: Tonghao Zhang Date: Tue Sep 29 09:58:06 2020 +0800 virtio-net: don't disable guest csum when disable LRO [ Upstream commit 1a03b8a35a957f9f38ecb8a97443b7380bbf6a8b ] Open vSwitch and Linux bridge will disable LRO of the interface when this interface added to them. Now when disable the LRO, the virtio-net csum is disable too. That drops the forwarding performance. Fixes: a02e8964eaf9 ("virtio-net: ethtool configurable LRO") Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Willem de Bruijn Signed-off-by: Tonghao Zhang Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit f5f8861d01d35f1401a00fd26800c1121b27a9cc Author: Wilken Gottwalt Date: Mon Sep 28 11:01:04 2020 +0200 net: usb: ax88179_178a: fix missing stop entry in driver_info [ Upstream commit 9666ea66a74adfe295cb3a8760c76e1ef70f9caf ] Adds the missing .stop entry in the Belkin driver_info structure. Fixes: e20bd60bf62a ("net: usb: asix88179_178a: Add support for the Belkin B2B128") Signed-off-by: Wilken Gottwalt Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit fb4fb78d23fc4c99aa337e0a9b0780aea0b9fb53 Author: Heiner Kallweit Date: Sun Sep 27 19:44:29 2020 +0200 r8169: fix RTL8168f/RTL8411 EPHY config [ Upstream commit 709a16be0593c08190982cfbdca6df95e6d5823b ] Mistakenly bit 2 was set instead of bit 3 as in the vendor driver. Fixes: a7a92cf81589 ("r8169: sync PCIe PHY init with vendor driver 8.047.01") Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 0ea7fe7c26efc467c8e73e30ab9187c6920bf163 Author: Ido Schimmel Date: Sun Sep 27 09:42:11 2020 +0300 mlxsw: spectrum_acl: Fix mlxsw_sp_acl_tcam_group_add()'s error path [ Upstream commit 72865028582a678be1e05240e55d452e5c258eca ] If mlxsw_sp_acl_tcam_group_id_get() fails, the mutex initialized earlier is not destroyed. Fix this by initializing the mutex after calling the function. This is symmetric to mlxsw_sp_acl_tcam_group_del(). Fixes: 5ec2ee28d27b ("mlxsw: spectrum_acl: Introduce a mutex to guard region list updates") Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 698075baae0b9790623360657918245a6f2e77a4 Author: Randy Dunlap Date: Sat Sep 26 21:33:43 2020 -0700 mdio: fix mdio-thunder.c dependency & build error [ Upstream commit 7dbbcf496f2a4b6d82cfc7810a0746e160b79762 ] Fix build error by selecting MDIO_DEVRES for MDIO_THUNDER. Fixes this build error: ld: drivers/net/phy/mdio-thunder.o: in function `thunder_mdiobus_pci_probe': drivers/net/phy/mdio-thunder.c:78: undefined reference to `devm_mdiobus_alloc_size' Fixes: 379d7ac7ca31 ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.") Reported-by: kernel test robot Signed-off-by: Randy Dunlap Cc: Bartosz Golaszewski Cc: Andrew Lunn Cc: Heiner Kallweit Cc: netdev@vger.kernel.org Cc: David Daney Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit c83ed7bb74698493f7d873af08d9a2d4bcf2f410 Author: Eric Dumazet Date: Fri Sep 25 06:38:07 2020 -0700 bonding: set dev->needed_headroom in bond_setup_by_slave() [ Upstream commit f32f19339596b214c208c0dba716f4b6cc4f6958 ] syzbot managed to crash a host by creating a bond with a GRE device. For non Ethernet device, bonding calls bond_setup_by_slave() instead of ether_setup(), and unfortunately dev->needed_headroom was not copied from the new added member. [ 171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0 [ 171.243111] ------------[ cut here ]------------ [ 171.243112] kernel BUG at net/core/skbuff.c:112! [ 171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI [ 171.243469] gsmi: Log Shutdown Reason 0x03 [ 171.243505] Call Trace: [ 171.243506] [ 171.243512] [] skb_push+0x49/0x50 [ 171.243516] [] ipgre_header+0x2a/0xf0 [ 171.243520] [] neigh_connected_output+0xb7/0x100 [ 171.243524] [] ip6_finish_output2+0x383/0x490 [ 171.243528] [] __ip6_finish_output+0xa2/0x110 [ 171.243531] [] ip6_finish_output+0x2c/0xa0 [ 171.243534] [] ip6_output+0x69/0x110 [ 171.243537] [] ? ip6_output+0x110/0x110 [ 171.243541] [] mld_sendpack+0x1b2/0x2d0 [ 171.243544] [] ? mld_send_report+0xf0/0xf0 [ 171.243548] [] mld_ifc_timer_expire+0x2d7/0x3b0 [ 171.243551] [] ? mld_gq_timer_expire+0x50/0x50 [ 171.243556] [] call_timer_fn+0x30/0x130 [ 171.243559] [] expire_timers+0x4c/0x110 [ 171.243563] [] __run_timers+0x213/0x260 [ 171.243566] [] ? ktime_get+0x3d/0xa0 [ 171.243570] [] ? clockevents_program_event+0x7e/0xe0 [ 171.243574] [] ? sched_clock_cpu+0x15/0x190 [ 171.243577] [] run_timer_softirq+0x1d/0x40 [ 171.243581] [] __do_softirq+0x152/0x2f0 [ 171.243585] [] irq_exit+0x9f/0xb0 [ 171.243588] [] smp_apic_timer_interrupt+0xfd/0x1a0 [ 171.243591] [] apic_timer_interrupt+0x86/0x90 Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 665298cbd6bdd950d4b8a4aab3ccc6e2e3fd96a4 Author: Ivan Khoronzhuk Date: Fri Sep 25 15:44:39 2020 +0300 net: ethernet: cavium: octeon_mgmt: use phy_start and phy_stop [ Upstream commit 4663ff60257aec4ee1e2e969a7c046f0aff35ab8 ] To start also "phy state machine", with UP state as it should be, the phy_start() has to be used, in another case machine even is not triggered. After this change negotiation is supposed to be triggered by SM workqueue. It's not correct usage, but it appears after the following patch, so add it as a fix. Fixes: 74a992b3598a ("net: phy: add phy_check_link_status") Signed-off-by: Ivan Khoronzhuk Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 2cb43007e0601498245f96fcfda4a983dddea6fb Author: Wong Vee Khee Date: Fri Sep 25 17:54:06 2020 +0800 net: stmmac: Fix clock handling on remove path [ Upstream commit ac322f86b56cb99d1c4224c209095aa67647c967 ] While unloading the dwmac-intel driver, clk_disable_unprepare() is being called twice in stmmac_dvr_remove() and intel_eth_pci_remove(). This causes kernel panic on the second call. Removing the second call of clk_disable_unprepare() in intel_eth_pci_remove(). Fixes: 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove paths") Cc: Andy Shevchenko Reviewed-by: Voon Weifeng Signed-off-by: Wong Vee Khee Reviewed-by: Andy Shevchenko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 39d93de64749f1ab3573cb1ceee54bfb80e79001 Author: Ronak Doshi Date: Thu Sep 24 23:11:29 2020 -0700 vmxnet3: fix cksum offload issues for non-udp tunnels [ Upstream commit 1dac3b1bc66dc68dbb0c9f43adac71a7d0a0331a ] Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the inner offload capability is to be restrictued to UDP tunnels. This patch fixes the issue for non-udp tunnels by adding features check capability and filtering appropriate features for non-udp tunnels. Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 6ececc888c0c8ba01755b8147baf7083fb23940b Author: Jacob Keller Date: Wed Sep 2 08:53:47 2020 -0700 ice: fix memory leak in ice_vsi_setup [ Upstream commit f6a07271bb1535d9549380461437cc48d9e19958 ] During ice_vsi_setup, if ice_cfg_vsi_lan fails, it does not properly release memory associated with the VSI rings. If we had used devres allocations for the rings, this would be ok. However, we use kzalloc and kfree_rcu for these ring structures. Using the correct label to cleanup the rings during ice_vsi_setup highlights an issue in the ice_vsi_clear_rings function: it can leave behind stale ring pointers in the q_vectors structure. When releasing rings, we must also ensure that no q_vector associated with the VSI will point to this ring again. To resolve this, loop over all q_vectors and release their ring mapping. Because we are about to free all rings, no q_vector should remain pointing to any of the rings in this VSI. Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Jacob Keller Tested-by: Aaron Brown Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit c4b9b9d7eb107794c341f89d1ae2ee3382a53f79 Author: Jacob Keller Date: Wed Sep 2 08:53:46 2020 -0700 ice: fix memory leak if register_netdev_fails [ Upstream commit 135f4b9e9340dadb78e9737bb4eb9817b9c89dac ] The ice_setup_pf_sw function can cause a memory leak if register_netdev fails, due to accidentally failing to free the VSI rings. Fix the memory leak by using ice_vsi_release, ensuring we actually go through the full teardown process. This should be safe even if the netdevice is not registered because we will have set the netdev pointer to NULL, ensuring ice_vsi_release won't call unregister_netdev. An alternative fix would be moving management of the PF VSI netdev into the main VSI setup code. This is complicated and likely requires significant refactor in how we manage VSIs Fixes: 3a858ba392c3 ("ice: Add support for VSI allocation and deallocation") Signed-off-by: Jacob Keller Tested-by: Aaron Brown Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 33e948635e65fab8dbdc3e987dc858b48a481226 Author: Sylwester Dziedziuch Date: Wed Sep 2 12:54:59 2020 +0000 iavf: Fix incorrect adapter get in iavf_resume [ Upstream commit 75598a8fc0e0dff2aa5d46c62531b36a595f1d4f ] When calling iavf_resume there was a crash because wrong function was used to get iavf_adapter and net_device pointers. Changed how iavf_resume is getting iavf_adapter and net_device pointers from pci_dev. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Sylwester Dziedziuch Reviewed-by: Aleksandr Loktionov Tested-by: Aaron Brown Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 1e0cdecfb8967a67342fb463f9d599a295af8b7d Author: Vaibhav Gupta Date: Mon Jun 29 14:59:39 2020 +0530 iavf: use generic power management [ Upstream commit bc5cbd73eb493944b8665dc517f684c40eb18a4a ] With the support of generic PM callbacks, drivers no longer need to use legacy .suspend() and .resume() in which they had to maintain PCI states changes and device's power state themselves. The required operations are done by PCI core. PCI drivers are not expected to invoke PCI helper functions like pci_save/restore_state(), pci_enable/disable_device(), pci_set_power_state(), etc. Their tasks are completed by PCI core itself. Compile-tested only. Signed-off-by: Vaibhav Gupta Tested-by: Andrew Bowers Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 13685508abf3fc84eaaa1b3a6456ac455e5c76d8 Author: Herbert Xu Date: Fri Sep 25 14:42:56 2020 +1000 xfrm: Use correct address family in xfrm_state_find [ Upstream commit e94ee171349db84c7cfdc5fefbebe414054d0924 ] The struct flowi must never be interpreted by itself as its size depends on the address family. Therefore it must always be grouped with its original family value. In this particular instance, the original family value is lost in the function xfrm_state_find. Therefore we get a bogus read when it's coupled with the wrong family which would occur with inter- family xfrm states. This patch fixes it by keeping the original family value. Note that the same bug could potentially occur in LSM through the xfrm_state_pol_flow_match hook. I checked the current code there and it seems to be safe for now as only secid is used which is part of struct flowi_common. But that API should be changed so that so that we don't get new bugs in the future. We could do that by replacing fl with just secid or adding a family field. Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...") Signed-off-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 3e835221d6709df98a02bbc05c89c835a16e17ee Author: Xiaoliang Yang Date: Thu Sep 24 09:57:46 2020 +0800 net: dsa: felix: convert TAS link speed based on phylink speed [ Upstream commit dba1e4660a87927bdc03c23e36fd2c81a16a7ab1 ] state->speed holds a value of 10, 100, 1000 or 2500, but QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the speed to a proper value. Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang Reviewed-by: Vladimir Oltean Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 24bc1ec457c8dc9c2da6eeb7dda2742762c4ca59 Author: Luo bin Date: Thu Sep 24 09:31:51 2020 +0800 hinic: fix wrong return value of mac-set cmd [ Upstream commit f68910a8056f9451ee9fe7e1b962f7d90d326ad3 ] It should also be regarded as an error when hw return status=4 for PF's setting mac cmd. Only if PF return status=4 to VF should this cmd be taken special treatment. Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Luo bin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 43b7d340cb3a87aba552f9e9b6710b33addf2752 Author: Luo bin Date: Sat Jul 25 15:11:19 2020 +0800 hinic: add log in exception handling processes [ Upstream commit 90f86b8a36c065286b743eed29661fc5cd00d342 ] improve the error message when functions return failure and dump relevant registers in some exception handling processes Signed-off-by: Luo bin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 5f8c48c299bc227d0e9e92a16d56219103e9ed0e Author: Necip Fazil Yildiran Date: Thu Sep 17 19:16:53 2020 +0300 platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP [ Upstream commit afdd1ebb72051e8b6b83c4d7dc542a9be0e1352d ] When FUJITSU_LAPTOP is enabled and NEW_LEDS is disabled, it results in the following Kbuild warning: WARNING: unmet direct dependencies detected for LEDS_CLASS Depends on [n]: NEW_LEDS [=n] Selected by [y]: - FUJITSU_LAPTOP [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y] && INPUT [=y] && BACKLIGHT_CLASS_DEVICE [=y] && (ACPI_VIDEO [=n] || ACPI_VIDEO [=n]=n) The reason is that FUJITSU_LAPTOP selects LEDS_CLASS without depending on or selecting NEW_LEDS while LEDS_CLASS is subordinate to NEW_LEDS. Honor the kconfig menu hierarchy to remove kconfig dependency warnings. Reported-by: Hans de Goede Fixes: d89bcc83e709 ("platform/x86: fujitsu-laptop: select LEDS_CLASS") Signed-off-by: Necip Fazil Yildiran Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin commit 6d9886e6081ba95941b8ddc976bc1668305b1632 Author: Necip Fazil Yildiran Date: Tue Sep 15 12:09:23 2020 +0300 platform/x86: fix kconfig dependency warning for LG_LAPTOP [ Upstream commit 8f0c01e666685c4d2e1a233e6f4d7ab16c9f8b2a ] When LG_LAPTOP is enabled and NEW_LEDS is disabled, it results in the following Kbuild warning: WARNING: unmet direct dependencies detected for LEDS_CLASS Depends on [n]: NEW_LEDS [=n] Selected by [y]: - LG_LAPTOP [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y] && ACPI_WMI [=y] && INPUT [=y] The reason is that LG_LAPTOP selects LEDS_CLASS without depending on or selecting NEW_LEDS while LEDS_CLASS is subordinate to NEW_LEDS. Honor the kconfig menu hierarchy to remove kconfig dependency warnings. Fixes: dbf0c5a6b1f8 ("platform/x86: Add LG Gram laptop special features driver") Signed-off-by: Necip Fazil Yildiran Reviewed-by: Hans de Goede Acked-by: mark gross Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin commit 046add2ce07c7bbc107cb82065dabf2a64873eb2 Author: Voon Weifeng Date: Wed Sep 23 16:56:14 2020 +0800 net: stmmac: removed enabling eee in EEE set callback [ Upstream commit 7241c5a697479c7d0c5a96595822cdab750d41ae ] EEE should be only be enabled during stmmac_mac_link_up() when the link are up and being set up properly. set_eee should only do settings configuration and disabling the eee. Without this fix, turning on EEE using ethtool will return "Operation not supported". This is due to the driver is in a dead loop waiting for eee to be advertised in the for eee to be activated but the driver will only configure the EEE advertisement after the eee is activated. Ethtool should only return "Operation not supported" if there is no EEE capbility in the MAC controller. Fixes: 8a7493e58ad6 ("net: stmmac: Fix a race in EEE enable callback") Signed-off-by: Voon Weifeng Acked-by: Mark Gross Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit ac25c357463b1dc5b88caa55139d996ae39b15e2 Author: Magnus Karlsson Date: Wed Sep 16 14:00:25 2020 +0200 xsk: Do not discard packet when NETDEV_TX_BUSY [ Upstream commit 642e450b6b5955f2059d0ae372183f7c6323f951 ] In the skb Tx path, transmission of a packet is performed with dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it signifies that it was not possible to send the packet right now, please try later. Unfortunately, the xsk transmit code discarded the packet and returned EBUSY to the application. Fix this unnecessary packet loss, by not discarding the packet in the Tx ring and return EAGAIN. As EAGAIN is returned to the application, it can then retry the send operation later and the packet will then likely be sent as the driver will then likely have space/resources to send the packet. In summary, EAGAIN tells the application that the packet was not discarded from the Tx ring and that it needs to call send() again. EBUSY, on the other hand, signifies that the packet was not sent and discarded from the Tx ring. The application needs to put the packet on the Tx ring again if it wants it to be sent. Fixes: 35fcde7f8deb ("xsk: support for Tx") Reported-by: Arkadiusz Zema Suggested-by: Arkadiusz Zema Suggested-by: Daniel Borkmann Signed-off-by: Magnus Karlsson Signed-off-by: Daniel Borkmann Reviewed-by: Jesse Brandeburg Link: https://lore.kernel.org/bpf/1600257625-2353-1-git-send-email-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin commit 38dd384ce429ff0bbd4c7e7750e159397b79f0de Author: Antony Antony Date: Fri Sep 4 08:50:29 2020 +0200 xfrm: clone whole liftime_cur structure in xfrm_do_migrate [ Upstream commit 8366685b2883e523f91e9816d7be371eb1144749 ] When we clone state only add_time was cloned. It missed values like bytes, packets. Now clone the all members of the structure. v1->v3: - use memcpy to copy the entire structure Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 8baab80240288ad0ccefcca9f638f7cc412089f2 Author: Antony Antony Date: Fri Sep 4 08:50:11 2020 +0200 xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate [ Upstream commit 7aa05d304785204703a67a6aa7f1db402889a172 ] XFRMA_SEC_CTX was not cloned from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - return -ENOMEM on error v2->v3: - fix return type to int Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 3ab37554e6ce52f2ed330872d4949775efbebaf2 Author: Antony Antony Date: Fri Sep 4 08:49:55 2020 +0200 xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate [ Upstream commit 91a46c6d1b4fcbfa4773df9421b8ad3e58088101 ] XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - move curleft cloning to a separate patch Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state") Signed-off-by: Antony Antony Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 958c224a99d39347700fcbea83959e3453d44f6c Author: Antony Antony Date: Fri Sep 4 08:49:38 2020 +0200 xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate [ Upstream commit 545e5c571662b1cd79d9588f9d3b6e36985b8007 ] XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Antony Antony Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 954adf70118902179e284934f234067a2a965426 Author: Lu Baolu Date: Sun Sep 27 14:24:28 2020 +0800 iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() [ Upstream commit 1a3f2fd7fc4e8f24510830e265de2ffb8e3300d2 ] Lock(&iommu->lock) without disabling irq causes lockdep warnings. [ 12.703950] ======================================================== [ 12.703962] WARNING: possible irq lock inversion dependency detected [ 12.703975] 5.9.0-rc6+ #659 Not tainted [ 12.703983] -------------------------------------------------------- [ 12.703995] systemd-udevd/284 just changed the state of lock: [ 12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.57+0x2e/0x90 [ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 12.704043] (&iommu->lock){+.+.}-{2:2} [ 12.704045] and interrupts could create inverse lock ordering between them. [ 12.704073] other info that might help us debug this: [ 12.704085] Possible interrupt unsafe locking scenario: [ 12.704097] CPU0 CPU1 [ 12.704106] ---- ---- [ 12.704115] lock(&iommu->lock); [ 12.704123] local_irq_disable(); [ 12.704134] lock(device_domain_lock); [ 12.704146] lock(&iommu->lock); [ 12.704158] [ 12.704164] lock(device_domain_lock); [ 12.704174] *** DEADLOCK *** Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit 31bc10ac6d01902586fb3c6535e46bb899bc7b5f Author: Josef Bacik Date: Thu Aug 20 11:18:27 2020 -0400 btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks [ Upstream commit a466c85edc6fbe845facc8f57c408c544f42899e ] When closing and freeing the source device we could end up doing our final blkdev_put() on the bdev, which will grab the bd_mutex. As such we want to be holding as few locks as possible, so move this call outside of the dev_replace->lock_finishing_cancel_unmount lock. Since we're modifying the fs_devices we need to make sure we're holding the uuid_mutex here, so take that as well. There's a report from syzbot probably hitting one of the cases where the bd_mutex and device_list_mutex are taken in the wrong order, however it's not with device replace, like this patch fixes. As there's no reproducer available so far, we can't verify the fix. https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/ dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419 WARNING: possible circular locking dependency detected 5.9.0-rc5-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/6878 is trying to acquire lock: ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804 but task is already holding lock: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline] find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #3 (sb_internal#2){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_intwrite include/linux/fs.h:1690 [inline] start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline] find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __flush_work+0x60e/0xac0 kernel/workqueue.c:3041 wb_shutdown+0x180/0x220 mm/backing-dev.c:355 bdi_unregister+0x174/0x590 mm/backing-dev.c:872 del_gendisk+0x820/0xa10 block/genhd.c:933 loop_remove drivers/block/loop.c:2192 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lo_open+0x19/0xd0 drivers/block/loop.c:1893 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507 blkdev_get fs/block_dev.c:1639 [inline] blkdev_open+0x227/0x300 fs/block_dev.c:1753 do_dentry_open+0x4b9/0x11b0 fs/open.c:817 do_open fs/namei.c:3251 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3368 do_filp_open+0x17e/0x3c0 fs/namei.c:3395 do_sys_openat2+0x16d/0x420 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_sys_open fs/open.c:1192 [inline] __se_sys_open fs/open.c:1188 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1188 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(sb_internal#2); lock(&fs_devs->device_list_mutex); lock(&bdev->bd_mutex); *** DEADLOCK *** 3 locks held by syz-executor.0/6878: #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365 #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178 #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 stack backtrace: CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x460027 RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027 RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0 RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460 R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460 Signed-off-by: Josef Bacik [ add syzbot reference ] Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit b50aa502610f0b7246beaa743d353eb3a9773d4d Author: Flora Cui Date: Wed Sep 23 14:42:59 2020 +0800 drm/amd/display: fix return value check for hdcp_work [ Upstream commit 898c7302f4de1d91065e80fc46552b3ec70894ff ] max_caps might be 0, thus hdcp_work might be ZERO_SIZE_PTR Signed-off-by: Flora Cui Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin commit b02b690b4bb3a6771396f8d3647f30ca38b54301 Author: Sudheesh Mavila Date: Tue Sep 15 12:48:20 2020 +0530 drm/amd/pm: Removed fixed clock in auto mode DPM [ Upstream commit 97cf32996c46d9935cc133d910a75fb687dd6144 ] SMU10_UMD_PSTATE_PEAK_FCLK value should not be used to set the DPM. Suggested-by: Evan Quan Reviewed-by: Evan Quan Signed-off-by: Sudheesh Mavila Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin commit 9e184961ddb71783a8ffe363a20bf49deb56d075 Author: Jens Axboe Date: Mon Sep 28 08:57:48 2020 -0600 io_uring: fix potential ABBA deadlock in ->show_fdinfo() [ Upstream commit fad8e0de4426a776c9bcb060555e7c09e2d08db6 ] syzbot reports a potential lock deadlock between the normal IO path and ->show_fdinfo(): ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc6-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.2/19710 is trying to acquire lock: ffff888098ddc450 (sb_writers#4){.+.+}-{0:0}, at: io_write+0x6b5/0xb30 fs/io_uring.c:3296 but task is already holding lock: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ctx->uring_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 __io_uring_show_fdinfo fs/io_uring.c:8417 [inline] io_uring_show_fdinfo+0x194/0xc70 fs/io_uring.c:8460 seq_show+0x4a8/0x700 fs/proc/fd.c:65 seq_read+0x432/0x1070 fs/seq_file.c:208 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&p->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 seq_read+0x61/0x1070 fs/seq_file.c:155 pde_read fs/proc/inode.c:306 [inline] proc_reg_read+0x221/0x300 fs/proc/inode.c:318 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (sb_writers#4){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: sb_writers#4 --> &p->lock --> &ctx->uring_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctx->uring_lock); lock(&p->lock); lock(&ctx->uring_lock); lock(sb_writers#4); *** DEADLOCK *** 1 lock held by syz-executor.2/19710: #0: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348 stack backtrace: CPU: 0 PID: 19710 Comm: syz-executor.2 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45e179 Code: 3d b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1194e74c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00000000000082c0 RCX: 000000000045e179 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 RBP: 000000000118cf98 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c R13: 00007ffd1aa5756f R14: 00007f1194e759c0 R15: 000000000118cf4c Fix this by just not diving into details if we fail to trylock the io_uring mutex. We know the ctx isn't going away during this operation, but we cannot safely iterate buffers/files/personalities if we don't hold the io_uring mutex. Reported-by: syzbot+2f8fa4e860edc3066aba@syzkaller.appspotmail.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit 287d8f00338d6b7dac3e13984e1b7940112c50bf Author: Josef Bacik Date: Thu Aug 20 11:18:26 2020 -0400 btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing [ Upstream commit 313b085851c13ca08320372a05a7047ea25d3dd4 ] We need to move the closing of the src_device out of all the device replace locking, but we definitely want to zero out the superblock before we commit the last time to make sure the device is properly removed. Handle this by pushing btrfs_scratch_superblocks into btrfs_dev_replace_finishing, and then later on we'll move the src_device closing and freeing stuff where we need it to be. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit cefd370cb723f8a36a335b9406688c2eb554359f Author: Philip Yang Date: Tue Sep 15 17:07:35 2020 -0400 drm/amdgpu: prevent double kfree ttm->sg [ Upstream commit 1d0e16ac1a9e800598dcfa5b6bc53b704a103390 ] Set ttm->sg to NULL after kfree, to avoid memory corruption backtrace: [ 420.932812] kernel BUG at /build/linux-do9eLF/linux-4.15.0/mm/slub.c:295! [ 420.934182] invalid opcode: 0000 [#1] SMP NOPTI [ 420.935445] Modules linked in: xt_conntrack ipt_MASQUERADE [ 420.951332] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 1.5.4 07/09/2020 [ 420.952887] RIP: 0010:__slab_free+0x180/0x2d0 [ 420.954419] RSP: 0018:ffffbe426291fa60 EFLAGS: 00010246 [ 420.955963] RAX: ffff9e29263e9c30 RBX: ffff9e29263e9c30 RCX: 000000018100004b [ 420.957512] RDX: ffff9e29263e9c30 RSI: fffff3d33e98fa40 RDI: ffff9e297e407a80 [ 420.959055] RBP: ffffbe426291fb00 R08: 0000000000000001 R09: ffffffffc0d39ade [ 420.960587] R10: ffffbe426291fb20 R11: ffff9e49ffdd4000 R12: ffff9e297e407a80 [ 420.962105] R13: fffff3d33e98fa40 R14: ffff9e29263e9c30 R15: ffff9e2954464fd8 [ 420.963611] FS: 00007fa2ea097780(0000) GS:ffff9e297e840000(0000) knlGS:0000000000000000 [ 420.965144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 420.966663] CR2: 00007f16bfffefb8 CR3: 0000001ff0c62000 CR4: 0000000000340ee0 [ 420.968193] Call Trace: [ 420.969703] ? __page_cache_release+0x3c/0x220 [ 420.971294] ? amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.972789] kfree+0x168/0x180 [ 420.974353] ? amdgpu_ttm_tt_set_user_pages+0x64/0xc0 [amdgpu] [ 420.975850] ? kfree+0x168/0x180 [ 420.977403] amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.978888] ttm_tt_unpopulate.part.10+0x53/0x60 [amdttm] [ 420.980357] ttm_tt_destroy.part.11+0x4f/0x60 [amdttm] [ 420.981814] ttm_tt_destroy+0x13/0x20 [amdttm] [ 420.983273] ttm_bo_cleanup_memtype_use+0x36/0x80 [amdttm] [ 420.984725] ttm_bo_release+0x1c9/0x360 [amdttm] [ 420.986167] amdttm_bo_put+0x24/0x30 [amdttm] [ 420.987663] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 420.989165] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x9ca/0xb10 [amdgpu] [ 420.990666] kfd_ioctl_alloc_memory_of_gpu+0xef/0x2c0 [amdgpu] Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin commit 9c6944b53f1db4fafbdf13fe0ab222e0ca563887 Author: Dumitru Ceara Date: Wed Oct 7 17:48:03 2020 +0200 openvswitch: handle DNAT tuple collision commit 8aa7b526dc0b5dbf40c1b834d76a667ad672a410 upstream. With multiple DNAT rules it's possible that after destination translation the resulting tuples collide. For example, two openvswitch flows: nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) Assuming two TCP clients initiating the following connections: 10.0.0.10:5000->10.0.0.10:10 10.0.0.10:5000->10.0.0.20:10 Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing nf_conntrack_confirm() to fail because of tuple collision. Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. Reported-at: https://bugzilla.redhat.com/1877128 Suggested-by: Florian Westphal Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Dumitru Ceara Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 0388ffce1059898aa5fdd3803c496f7a9da7592f Author: Anant Thazhemadam Date: Mon Oct 5 02:25:36 2020 +0530 net: team: fix memory leak in __team_options_register commit 9a9e77495958c7382b2438bc19746dd3aaaabb8e upstream. The variable "i" isn't initialized back correctly after the first loop under the label inst_rollback gets executed. The value of "i" is assigned to be option_count - 1, and the ensuing loop (under alloc_rollback) begins by initializing i--. Thus, the value of i when the loop begins execution will now become i = option_count - 2. Thus, when kfree(dst_opts[i]) is called in the second loop in this order, (i.e., inst_rollback followed by alloc_rollback), dst_optsp[option_count - 2] is the first element freed, and dst_opts[option_count - 1] does not get freed, and thus, a memory leak is caused. This memory leak can be fixed, by assigning i = option_count (instead of option_count - 1). Fixes: 80f7c6683fe0 ("team: add support for per-port options") Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 70af9c28d42350a91e56c738f0f7594c5b2bc365 Author: Eric Dumazet Date: Fri Sep 25 06:38:08 2020 -0700 team: set dev->needed_headroom in team_setup_by_port() commit 89d01748b2354e210b5d4ea47bc25a42a1b42c82 upstream. Some devices set needed_headroom. If we ignore it, we might end up crashing in various skb_push() for example in ipgre_header() since some layers assume enough headroom has been reserved. Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9360901e714d702d3a89ec29d7159d1578764565 Author: Eric Dumazet Date: Thu Oct 8 01:38:31 2020 -0700 sctp: fix sctp_auth_init_hmacs() error path commit d42ee76ecb6c49d499fc5eb32ca34468d95dbc3e upstream. After freeing ep->auth_hmacs we have to clear the pointer or risk use-after-free as reported by syzbot: BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874 CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203 sctp_endpoint_put net/sctp/endpointola.c:236 [inline] sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183 sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981 sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415 sk_common_release+0x64/0x390 net/core/sock.c:3254 sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475 __sock_release+0xcd/0x280 net/socket.c:596 sock_close+0x18/0x20 net/socket.c:1277 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 exit_task_work include/linux/task_work.h:25 [inline] do_exit+0xb7d/0x29f0 kernel/exit.c:806 do_group_exit+0x125/0x310 kernel/exit.c:903 __do_sys_exit_group kernel/exit.c:914 [inline] __se_sys_exit_group kernel/exit.c:912 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x43f278 Code: Bad RIP value. RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kmalloc_array include/linux/slab.h:593 [inline] kcalloc include/linux/slab.h:605 [inline] sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3422 [inline] kfree+0x10e/0x2b0 mm/slab.c:3760 sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_init_hmacs net/sctp/auth.c:496 [inline] sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals") Signed-off-by: Eric Dumazet Cc: Vlad Yasevich Cc: Neil Horman Cc: Marcelo Ricardo Leitner Acked-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit d63492ab001bfaf7f2bdf2b3be23c40835affc5a Author: Cristian Ciocaltea Date: Fri Oct 9 00:44:39 2020 +0300 i2c: owl: Clear NACK and BUS error bits commit f5b3f433641c543ebe5171285a42aa6adcdb2d22 upstream. When the NACK and BUS error bits are set by the hardware, the driver is responsible for clearing them by writing "1" into the corresponding status registers. Hence perform the necessary operations in owl_i2c_interrupt(). Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver") Reported-by: Manivannan Sadhasivam Signed-off-by: Cristian Ciocaltea Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit 08a1313bfca0751bc2a03448d9abb9a5e69228bb Author: Nicolas Belin Date: Wed Oct 7 10:07:51 2020 +0200 i2c: meson: fixup rate calculation with filter delay commit 1334d3b4e49e35d8912a7c37ffca4c5afb9a0516 upstream. Apparently, 15 cycles of the peripheral clock are used by the controller for sampling and filtering. Because this was not known before, the rate calculation is slightly off. Clean up and fix the calculation taking this filtering delay into account. Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller") Signed-off-by: Nicolas Belin Signed-off-by: Jerome Brunet Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit 3531df70c312a0d9996f6800d2e55de1ed48e9f7 Author: Jerome Brunet Date: Wed Oct 7 10:07:50 2020 +0200 i2c: meson: keep peripheral clock enabled commit 79e137b1540165f788394658442284d55a858984 upstream. SCL rate appears to be different than what is expected. For example, We get 164kHz on i2c3 of the vim3 when 400kHz is expected. This is partially due to the peripheral clock being disabled when the clock is set. Let's keep the peripheral clock on after probe to fix the problem. This does not affect the SCL output which is still gated when i2c is idle. Fixes: 09af1c2fa490 ("i2c: meson: set clock divider in probe instead of setting it for each transfer") Signed-off-by: Jerome Brunet Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit fe6124585cfe77a38c785cd5acbf9abc6a543a6a Author: Jerome Brunet Date: Wed Oct 7 10:07:49 2020 +0200 i2c: meson: fix clock setting overwrite commit 28683e847e2f20eed22cdd24f185d7783db396d3 upstream. When the slave address is written in do_start(), SLAVE_ADDR is written completely. This may overwrite some setting related to the clock rate or signal filtering. Fix this by writing only the bits related to slave address. To avoid causing unexpected changed, explicitly disable filtering or high/low clock mode which may have been left over by the bootloader. Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller") Signed-off-by: Jerome Brunet Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit d681bce5bc031135a184ec0b3fb6aefcaffef4bd Author: Vladimir Zapolskiy Date: Sat Oct 10 21:25:54 2020 +0300 cifs: Fix incomplete memory allocation on setxattr path commit 64b7f674c292207624b3d788eda2dde3dc1415df upstream. On setxattr() syscall path due to an apprent typo the size of a dynamically allocated memory chunk for storing struct smb2_file_full_ea_info object is computed incorrectly, to be more precise the first addend is the size of a pointer instead of the wanted object size. Coincidentally it makes no difference on 64-bit platforms, however on 32-bit targets the following memcpy() writes 4 bytes of data outside of the dynamically allocated memory. ============================================================================= BUG kmalloc-16 (Not tainted): Redzone overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201 INFO: Object 0x6f171df3 @offset=352 fp=0x00000000 Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi Redzone 79e69a6f: 73 68 32 0a sh2. Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: dump_stack+0x54/0x6e print_trailer+0x12c/0x134 check_bytes_and_report.cold+0x3e/0x69 check_object+0x18c/0x250 free_debug_processing+0xfe/0x230 __slab_free+0x1c0/0x300 kfree+0x1d3/0x220 smb2_set_ea+0x27d/0x540 cifs_xattr_set+0x57f/0x620 __vfs_setxattr+0x4e/0x60 __vfs_setxattr_noperm+0x4e/0x100 __vfs_setxattr_locked+0xae/0xd0 vfs_setxattr+0x4e/0xe0 setxattr+0x12c/0x1a0 path_setxattr+0xa4/0xc0 __ia32_sys_lsetxattr+0x1d/0x20 __do_fast_syscall_32+0x40/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+") Signed-off-by: Vladimir Zapolskiy Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 80683929112b453970b9d6234fcf1c26aa079c53 Author: Sabrina Dubroca Date: Thu Aug 13 16:24:04 2020 +0200 espintcp: restore IP CB before handing the packet to xfrm commit 4eb2e13415757a2bce5bb0d580d22bbeef1f5346 upstream. Xiumei reported a bug with espintcp over IPv6 in transport mode, because xfrm6_transport_finish expects to find IP6CB data (struct inet6_skb_cb). Currently, espintcp zeroes the CB, but the relevant part is actually preserved by previous layers (first set up by tcp, then strparser only zeroes a small part of tcp_skb_tb), so we can just relocate it to the start of skb->cb. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Reported-by: Xiumei Mu Signed-off-by: Sabrina Dubroca Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 1427c13cc16fd402fb42753c5e39d64a9c085dd4 Author: Sabrina Dubroca Date: Tue Aug 4 11:37:29 2020 +0200 xfrmi: drop ignore_df check before updating pmtu commit 45a36a18d01907710bad5258d81f76c18882ad88 upstream. xfrm interfaces currently test for !skb->ignore_df when deciding whether to update the pmtu on the skb's dst. Because of this, no pmtu exception is created when we do something like: ping -s 1438 By dropping this check, the pmtu exception will be created and the next ping attempt will work. Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: Xiumei Mu Signed-off-by: Sabrina Dubroca Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit c2a55388badaa6dee08121b3091ca006e1b9711f Author: Coly Li Date: Fri Oct 2 16:27:30 2020 +0800 nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() commit 7d4194abfc4de13a2663c7fee6891de8360f7a52 upstream. Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to send slab pages. But for pages allocated by __get_free_pages() without __GFP_COMP, which also have refcount as 0, they are still sent by kernel_sendpage() to remote end, this is problematic. The new introduced helper sendpage_ok() checks both PageSlab tag and page_count counter, and returns true if the checking page is OK to be sent by kernel_sendpage(). This patch fixes the page checking issue of nvme_tcp_try_send_data() with sendpage_ok(). If sendpage_ok() returns true, send this page by kernel_sendpage(), otherwise use sock_no_sendpage to handle this page. Signed-off-by: Coly Li Cc: Chaitanya Kulkarni Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Jan Kara Cc: Jens Axboe Cc: Mikhail Skorzhinskii Cc: Philipp Reisner Cc: Sagi Grimberg Cc: Vlastimil Babka Cc: stable@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f4abc5911a9ecdbe4c03d3204a3d71c6355bd7c0 Author: Coly Li Date: Fri Oct 2 16:27:31 2020 +0800 tcp: use sendpage_ok() to detect misused .sendpage commit cf83a17edeeb36195596d2dae060a7c381db35f1 upstream. commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") adds the checks for Slab pages, but the pages don't have page_count are still missing from the check. Network layer's sendpage method is not designed to send page_count 0 pages neither, therefore both PageSlab() and page_count() should be both checked for the sending page. This is exactly what sendpage_ok() does. This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused .sendpage, to make the code more robust. Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") Suggested-by: Eric Dumazet Signed-off-by: Coly Li Cc: Vasily Averin Cc: David S. Miller Cc: stable@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 854828e10e2d90061f38c2f3cd2bdb7a56d12130 Author: Coly Li Date: Fri Oct 2 16:27:28 2020 +0800 net: introduce helper sendpage_ok() in include/linux/net.h commit c381b07941adc2274ce552daf86c94701c5e265a upstream. The original problem was from nvme-over-tcp code, who mistakenly uses kernel_sendpage() to send pages allocated by __get_free_pages() without __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on tail pages, sending them by kernel_sendpage() may trigger a kernel panic from a corrupted kernel heap, because these pages are incorrectly freed in network stack as page_count 0 pages. This patch introduces a helper sendpage_ok(), it returns true if the checking page, - is not slab page: PageSlab(page) is false. - has page refcount: page_count(page) is not zero All drivers who want to send page to remote end by kernel_sendpage() may use this helper to check whether the page is OK. If the helper does not return true, the driver should try other non sendpage method (e.g. sock_no_sendpage()) to handle the page. Signed-off-by: Coly Li Cc: Chaitanya Kulkarni Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Jan Kara Cc: Jens Axboe Cc: Mikhail Skorzhinskii Cc: Philipp Reisner Cc: Sagi Grimberg Cc: Vlastimil Babka Cc: stable@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 89bec0adbf5054ed056efcb8f49e96d112aee9e0 Author: Hugh Dickins Date: Fri Oct 9 20:07:59 2020 -0700 mm/khugepaged: fix filemap page_to_pgoff(page) != offset commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream. There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y. Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels. collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page) filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page) // collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page) collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again // since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE. It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse. Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested). Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does. Reported-by: Denis Lisov Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org Reported-by: Qian Cai Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f994c81fe4c58166be5ec508400b1a695852da0a Author: Andy Shevchenko Date: Mon Oct 5 16:10:44 2020 +0300 gpiolib: Disable compat ->read() code in UML case commit 47e538d86d5776ac8152146c3ed3d22326243190 upstream. It appears that UML (arch/um) has no compat.h header defined and hence can't compile a recently provided piece of code in GPIO library. Disable compat ->read() code in UML case to avoid compilation errors. While at it, use pattern which is already being used in the kernel elsewhere. Fixes: 5ad284ab3a01 ("gpiolib: Fix line event handling in syscall compatible mode") Reported-by: Geert Uytterhoeven Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20201005131044.87276-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 987c12d5640285219f66689a3552d6248fba5350 Author: Atish Patra Date: Thu Oct 1 12:04:56 2020 -0700 RISC-V: Make sure memblock reserves the memory containing DT commit a78c6f5956a949b496a5b087188dde52483edf51 upstream. Currently, the memory containing DT is not reserved. Thus, that region of memory can be reallocated or reused for other purposes. This may result in corrupted DT for nommu virt board in Qemu. We may not face any issue in kendryte as DT is embedded in the kernel image for that. Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt Signed-off-by: Greg Kroah-Hartman commit 659a68b11df3cb48c5837834770a59ca2053c7cd Author: Eric Dumazet Date: Wed Oct 7 01:42:46 2020 -0700 macsec: avoid use-after-free in macsec_handle_frame() commit c7cc9200e9b4a2ac172e990ef1975cd42975dad6 upstream. De-referencing skb after call to gro_cells_receive() is not allowed. We need to fetch skb->len earlier. Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices") Signed-off-by: Eric Dumazet Cc: Paolo Abeni Acked-by: Paolo Abeni Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 8c995b27d066dd30552f022de303b91c202f113d Author: Chaitanya Kulkarni Date: Tue Oct 6 16:36:47 2020 -0700 nvme-core: put ctrl ref when module ref get fail commit 4bab69093044ca81f394bd0780be1b71c5a4d308 upstream. When try_module_get() fails in the nvme_dev_open() it returns without releasing the ctrl reference which was taken earlier. Put the ctrl reference which is taken before calling the try_module_get() in the error return code path. Fixes: 52a3974feb1a "nvme-core: get/put ctrl and transport module in nvme_dev_open/release()" Signed-off-by: Chaitanya Kulkarni Reviewed-by: Logan Gunthorpe Signed-off-by: Christoph Hellwig Signed-off-by: Greg Kroah-Hartman commit 3113391293be4df46d771542f4677134547a5fab Author: Aaron Ma Date: Sat Oct 3 01:09:16 2020 +0800 platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse commit 720ef73d1a239e33c3ad8fac356b9b1348e68aaf upstream. Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0. When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered. Re-initialize buffer size will make ACPI evaluate successfully. Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL") Signed-off-by: Aaron Ma Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman commit 46a00e3e9275564a9f0eb4b185ab904f76880780 Author: Hans de Goede Date: Wed Sep 30 15:19:05 2020 +0200 platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting commit 8169bd3e6e193497cab781acddcff8fde5d0c416 upstream. 2 recent commits: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") 1fac39fd0316 ("platform/x86: intel-vbtn: Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types") Enabled reporting of SW_TABLET_MODE on more devices since the vbtn ACPI interface is used by the firmware on some of those devices to report this. Testing has shown that unconditionally enabling SW_TABLET_MODE reporting on all devices with a chassis type of 8 ("Portable") or 10 ("Notebook") which support the VGBS method is a very bad idea. Many of these devices are normal laptops (non 2-in-1) models with a VGBS which always returns 0, which we translate to SW_TABLET_MODE=1. This in turn causes userspace (libinput) to suppress events from the builtin keyboard and touchpad, making the laptop essentially unusable. Since the problem of wrongly reporting SW_TABLET_MODE=1 in combination with libinput, leads to a non-usable system. Where as OTOH many people will not even notice when SW_TABLET_MODE is not being reported, this commit changes intel_vbtn_has_switches() to use a DMI based allow-list. The new DMI based allow-list matches on the 31 ("Convertible") and 32 ("Detachable") chassis-types, as these clearly are 2-in-1s and so far if they support the intel-vbtn ACPI interface they all have properly working SW_TABLET_MODE reporting. Besides these 2 generic matches, it also contains model specific matches for 2-in-1 models which use a different chassis-type and which are known to have properly working SW_TABLET_MODE reporting. This has been tested on the following 2-in-1 devices: Dell Venue 11 Pro 7130 vPro HP Pavilion X2 10-p002nd HP Stream x360 Convertible PC 11 Medion E1239T Fixes: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") BugLink: https://forum.manjaro.org/t/keyboard-and-touchpad-only-work-on-kernel-5-6/22668 BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175599 Cc: Barnabás Pőcze Cc: Takashi Iwai Signed-off-by: Hans de Goede Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman commit 402ee2f96fb910bc996e8d6dae9f69f0beeee43f Author: Heiner Kallweit Date: Wed Oct 7 13:34:51 2020 +0200 r8169: consider that PHY reset may still be in progress after applying firmware commit 47dda78671a3d5cee3fb2229e37997d2ac8a3b54 upstream. Some firmware files trigger a PHY soft reset and don't wait for it to be finished. PHY register writes directly after applying the firmware may fail or provide unexpected results therefore. Fix this by waiting for bit BMCR_RESET to be cleared after applying firmware. There's nothing wrong with the referenced change, it's just that the fix will apply cleanly only after this change. Fixes: 89fbd26cca7e ("r8169: fix firmware not resetting tp->ocp_base") Signed-off-by: Heiner Kallweit Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit a73bb4ddee832cc0160dbcffba65d984a53d0a86 Author: Tony Ambardar Date: Sat Sep 19 22:01:34 2020 -0700 bpf: Prevent .BTF section elimination commit 65c204398928f9c79f1a29912b410439f7052635 upstream. Systems with memory or disk constraints often reduce the kernel footprint by configuring LD_DEAD_CODE_DATA_ELIMINATION. However, this can result in removal of any BTF information. Use the KEEP() macro to preserve the BTF data as done with other important sections, while still allowing for smaller kernels. Fixes: 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF") Signed-off-by: Tony Ambardar Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/a635b5d3e2da044e7b51ec1315e8910fbce0083f.1600417359.git.Tony.Ambardar@gmail.com Signed-off-by: Greg Kroah-Hartman commit bc33b9bb075744edcf29ef9ce63c3b7d911a282e Author: Tony Ambardar Date: Sat Sep 19 22:01:33 2020 -0700 bpf: Fix sysfs export of empty BTF section commit e23bb04b0c938588eae41b7f4712b722290ed2b8 upstream. If BTF data is missing or removed from the ELF section it is still exported via sysfs as a zero-length file: root@OpenWrt:/# ls -l /sys/kernel/btf/vmlinux -r--r--r-- 1 root root 0 Jul 18 02:59 /sys/kernel/btf/vmlinux Moreover, reads from this file succeed and leak kernel data: root@OpenWrt:/# hexdump -C /sys/kernel/btf/vmlinux|head -10 000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000cc0 00 00 00 00 00 00 00 00 00 00 00 00 80 83 b0 80 |................| 000cd0 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ce0 00 00 00 00 00 00 00 00 00 00 00 00 57 ac 6e 9d |............W.n.| 000cf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 002650 00 00 00 00 00 00 00 10 00 00 00 01 00 00 00 01 |................| 002660 80 82 9a c4 80 85 97 80 81 a9 51 68 00 00 00 02 |..........Qh....| 002670 80 25 44 dc 80 85 97 80 81 a9 50 24 81 ab c4 60 |.%D.......P$...`| This situation was first observed with kernel 5.4.x, cross-compiled for a MIPS target system. Fix by adding a sanity-check for export of zero-length data sections. Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs") Signed-off-by: Tony Ambardar Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/b38db205a66238f70823039a8c531535864eaac5.1600417359.git.Tony.Ambardar@gmail.com Signed-off-by: Greg Kroah-Hartman commit 944e354acfc3f4cd8d5a1e701ca490a47122b299 Author: Hans de Goede Date: Wed Sep 16 16:14:39 2020 +0200 platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models commit 1797d588af15174d4a4e7159dac8c800538e4f8c upstream. Commit b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for SW_TABLET_MODE") added support for reporting SW_TABLET_MODE using the Asus 0x00120063 WMI-device-id to see if various transformer models were docked into their keyboard-dock (SW_TABLET_MODE=0) or if they were being used as a tablet. The new SW_TABLET_MODE support (naively?) assumed that non Transformer devices would either not support the 0x00120063 WMI-device-id at all, or would NOT set ASUS_WMI_DSTS_PRESENCE_BIT in their reply when querying the device-id. Unfortunately this is not true and we have received many bug reports about this change causing the asus-wmi driver to always report SW_TABLET_MODE=1 on non Transformer devices. This causes libinput to think that these are 360 degree hinges style 2-in-1s folded into tablet-mode. Making libinput suppress keyboard and touchpad events from the builtin keyboard and touchpad. So effectively this causes the keyboard and touchpad to not work on many non Transformer Asus models. This commit fixes this by using the existing DMI based quirk mechanism in asus-nb-wmi.c to allow using the 0x00120063 device-id for reporting SW_TABLET_MODE on Transformer models and ignoring it on all other models. Fixes: b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for SW_TABLET_MODE") Link: https://patchwork.kernel.org/patch/11780901/ BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209011 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1876997 Reported-by: Samuel Čavoj Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman commit 88ddba3ebc3cbcc8fa998c5833c18663745a19c1 Author: Tom Rix Date: Sun Sep 13 12:02:03 2020 -0700 platform/x86: thinkpad_acpi: initialize tp_nvram_state variable commit 5f38b06db8af3ed6c2fc1b427504ca56fae2eacc upstream. clang static analysis flags this represenative problem thinkpad_acpi.c:2523:7: warning: Branch condition evaluates to a garbage value if (!oldn->mute || ^~~~~~~~~~~ In hotkey_kthread() mute is conditionally set by hotkey_read_nvram() but unconditionally checked by hotkey_compare_and_issue_event(). So the tp_nvram_state variable s[2] needs to be initialized. Fixes: 01e88f25985d ("ACPI: thinkpad-acpi: add CMOS NVRAM polling for hot keys (v9)") Signed-off-by: Tom Rix Reviewed-by: Hans de Goede Acked-by: mark gross Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman commit b9c0333ac6c82b1269790c3039f80ead9bedd782 Author: Hans de Goede Date: Sat Sep 12 11:35:32 2020 +0200 platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 commit d823346876a970522ff9e4d2b323c9b734dcc4de upstream. Commit cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") restored SW_TABLET_MODE reporting on the HP stream x360 11 series on which it was previously broken by commit de9647efeaa9 ("platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's"). It turns out that enabling SW_TABLET_MODE reporting on devices with a chassis-type of 10 ("Notebook") causes SW_TABLET_MODE to always report 1 at boot on the HP Pavilion 11 x360, which causes libinput to disable the kbd and touchpad. The HP Pavilion 11 x360's ACPI VGBS method sets bit 4 instead of bit 6 when NOT in tablet mode at boot. Inspecting all the DSDTs in my DSDT collection shows only one other model, the Medion E1239T ever setting bit 4 and it always sets this together with bit 6. So lets treat bit 4 as a second bit which when set indicates the device not being in tablet-mode, as we already do for bit 6. While at it also prefix all VGBS constant defines with "VGBS_". Fixes: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") Signed-off-by: Hans de Goede Acked-by: Mark Gross Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman commit 6b010ed04d505121d4608ea21a64868eb75fbe9a Author: Dinghao Liu Date: Sun Aug 23 19:12:11 2020 +0800 Platform: OLPC: Fix memleak in olpc_ec_probe commit 4fd9ac6bd3044734a7028bd993944c3617d1eede upstream. When devm_regulator_register() fails, ec should be freed just like when olpc_ec_cmd() fails. Fixes: 231c0c216172a ("Platform: OLPC: Add a regulator for the DCON") Signed-off-by: Dinghao Liu Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman commit 6ad52d3ee27885d1e9273ed0bf7cd01b85063198 Author: Linus Torvalds Date: Mon Oct 5 11:26:27 2020 -0700 splice: teach splice pipe reading about empty pipe buffers commit d1a819a2ec2d3b5e6a8f8a9f67386bda0ad315bc upstream. Tetsuo Handa reports that splice() can return 0 before the real EOF, if the data in the splice source pipe is an empty pipe buffer. That empty pipe buffer case doesn't happen in any normal situation, but you can trigger it by doing a write to a pipe that fails due to a page fault. Tetsuo has a test-case to show the behavior: #define _GNU_SOURCE #include #include #include #include int main(int argc, char *argv[]) { const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600); int pipe_fd[2] = { -1, -1 }; pipe(pipe_fd); write(pipe_fd[1], NULL, 4096); /* This splice() should wait unless interrupted. */ return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0); } which results in write(5, NULL, 4096) = -1 EFAULT (Bad address) splice(4, NULL, 3, NULL, 65536, 0) = 0 and this can confuse splice() users into believing they have hit EOF prematurely. The issue was introduced when the pipe write code started pre-allocating the pipe buffers before copying data from user space. This is modified verion of Tetsuo's original patch. Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/ Reported-by: Tetsuo Handa Acked-by: Tetsuo Handa Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c679280057ee6f7d3779ab1e782d7e990b75ea67 Author: Linus Torvalds Date: Mon Oct 5 10:56:22 2020 -0700 usermodehelper: reset umask to default before executing user process commit 4013c1496c49615d90d36b9d513eee8e369778e9 upstream. Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd). It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete. [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ] However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional. Just reset umask to the original default (0022) before actually starting the usermode helper program. Reported-by: Vegard Nossum Cc: Al Viro Acked-by: Eric W. Biederman Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3d36be053e584ce364a4dc374700ffe61944d1d1 Author: Greg Kurz Date: Sat Oct 3 12:02:03 2020 +0200 vhost: Use vhost_get_used_size() in vhost_vring_set_addr() commit 71878fa46c7e3b40fa7b3f1b6e4ba3f92f1ac359 upstream. The open-coded computation of the used size doesn't take the event into account when the VIRTIO_RING_F_EVENT_IDX feature is present. Fix that by using vhost_get_used_size(). Fixes: 8ea8cf89e19a ("vhost: support event index") Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman commit 3480587d9b9d7643b618c97a2d9ba81e57ff8c53 Author: Greg Kurz Date: Sat Oct 3 12:01:52 2020 +0200 vhost: Don't call access_ok() when using IOTLB commit 0210a8db2aeca393fb3067e234967877e3146266 upstream. When the IOTLB device is enabled, the vring addresses we get from userspace are GIOVAs. It is thus wrong to pass them down to access_ok() which only takes HVAs. Access validation is done at prefetch time with IOTLB. Teach vq_access_ok() about that by moving the (vq->iotlb) check from vhost_vq_access_ok() to vq_access_ok(). This prevents vhost_vring_set_addr() to fail when verifying the accesses. No behavior change for vhost_vq_access_ok(). BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084 Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Cc: jasowang@redhat.com CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kurz Acked-by: Jason Wang Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman commit 145a5510ef6af8e4cf818375eb3b8144d0b25f3f Author: Peilin Ye Date: Fri Oct 2 10:22:23 2020 -0400 block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg() commit 6d53a9fe5a1983490bc14b3a64d49fabb4ccc651 upstream. scsi_put_cdrom_generic_arg() is copying uninitialized stack memory to userspace, since the compiler may leave a 3-byte hole in the middle of `cgc32`. Fix it by adding a padding field to `struct compat_cdrom_generic_command`. Cc: stable@vger.kernel.org Fixes: f3ee6e63a9df ("compat_ioctl: move CDROM_SEND_PACKET handling into scsi") Suggested-by: Dan Carpenter Suggested-by: Arnd Bergmann Reported-by: syzbot+85433a479a646a064ab3@syzkaller.appspotmail.com Signed-off-by: Peilin Ye Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 128f5fe7c1020f663650058ed7526075f5fdc36d Author: Christoph Hellwig Date: Wed Oct 7 14:40:09 2020 +0200 partitions/ibm: fix non-DASD devices commit 7370997d48520ad923e8eb4deb59ebf290396202 upstream. Don't error out if the dasd_biodasdinfo symbol is not available. Cc: stable@vger.kernel.org Fixes: 26d7e28e3820 ("s390/dasd: remove ioctl_by_bdev calls") Reported-by: Christian Borntraeger Signed-off-by: Christoph Hellwig Tested-by: Christian Borntraeger Reviewed-by: Stefan Haberland Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit ef29249b066f64c2e9ea9bb854a2d486902b2bb4 Author: Karol Herbst Date: Wed Oct 7 00:05:28 2020 +0200 drm/nouveau/mem: guard against NULL pointer access in mem_del commit d10285a25e29f13353bbf7760be8980048c1ef2f upstream. other drivers seems to do something similar Signed-off-by: Karol Herbst Cc: dri-devel Cc: Dave Airlie Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-2-kherbst@redhat.com Signed-off-by: Greg Kroah-Hartman commit e82867e1c2b4baf8b4139e5a42d3b1ab19d2de41 Author: Karol Herbst Date: Wed Oct 7 00:05:27 2020 +0200 drm/nouveau/device: return error for unknown chipsets commit c3e0276c31ca8c7b8615da890727481260d4676f upstream. Previously the code relied on device->pri to be NULL and to fail probing later. We really should just return an error inside nvkm_device_ctor for unsupported GPUs. Fixes: 24d5ff40a732 ("drm/nouveau/device: rework mmio mapping code to get rid of second map") Signed-off-by: Karol Herbst Cc: dann frazier Cc: dri-devel Cc: Dave Airlie Cc: stable@vger.kernel.org Reviewed-by: Jeremy Cline Signed-off-by: Dave Airlie Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-1-kherbst@redhat.com Signed-off-by: Greg Kroah-Hartman commit bc7382371b2d888bd1274bcc0a3d2356155a7bf3 Author: Anant Thazhemadam Date: Wed Oct 7 09:24:01 2020 +0530 net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() commit 3dc289f8f139997f4e9d3cfccf8738f20d23e47b upstream. In nl80211_parse_key(), key.idx is first initialized as -1. If this value of key.idx remains unmodified and gets returned, and nl80211_key_allowed() also returns 0, then rdev_del_key() gets called with key.idx = -1. This causes an out-of-bounds array access. Handle this issue by checking if the value of key.idx after nl80211_parse_key() is called and return -EINVAL if key.idx < 0. Cc: stable@vger.kernel.org Reported-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com Tested-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam Link: https://lore.kernel.org/r/20201007035401.9522-1-anant.thazhemadam@gmail.com Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 82dfd230b0c08299a9e28ab2777a488ebc9f1bd3 Author: Namjae Jeon Date: Tue Sep 29 09:09:49 2020 +0900 exfat: fix use of uninitialized spinlock on error path commit 8ff006e57ad3a25f909c456d053aa498b6673a39 upstream. syzbot reported warning message: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 register_lock_class+0xf06/0x1520 kernel/locking/lockdep.c:893 __lock_acquire+0xfd/0x2ae0 kernel/locking/lockdep.c:4320 lock_acquire+0x148/0x720 kernel/locking/lockdep.c:5029 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] exfat_cache_inval_inode+0x30/0x280 fs/exfat/cache.c:226 exfat_evict_inode+0x124/0x270 fs/exfat/inode.c:660 evict+0x2bb/0x6d0 fs/inode.c:576 exfat_fill_super+0x1e07/0x27d0 fs/exfat/super.c:681 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 If exfat_read_root() returns an error, spinlock is used in exfat_evict_inode() without initialization. This patch combines exfat_cache_init_inode() with exfat_inode_init_once() to initialize spinlock by slab constructor. Fixes: c35b6810c495 ("exfat: add exfat cache") Cc: stable@vger.kernel.org # v5.7+ Reported-by: syzbot Signed-off-by: Namjae Jeon Signed-off-by: Greg Kroah-Hartman commit 6a4bf26a176daa46297aa476efc0ffd3f830de96 Author: Jeremy Linton Date: Tue Oct 6 11:33:26 2020 -0500 crypto: arm64: Use x16 with indirect branch to bti_c commit 39e4716caa598a07a98598b2e7cd03055ce25fb9 upstream. The AES code uses a 'br x7' as part of a function called by a macro. That branch needs a bti_j as a target. This results in a panic as seen below. Using x16 (or x17) with an indirect branch keeps the target bti_c. Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1 pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-) pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs] sp : ffff80001052b730 aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] __xts_crypt+0xb0/0x2dc [aes_neon_bs] xts_encrypt+0x28/0x3c [aes_neon_bs] crypto_skcipher_encrypt+0x50/0x84 simd_skcipher_encrypt+0xc8/0xe0 crypto_skcipher_encrypt+0x50/0x84 test_skcipher_vec_cfg+0x224/0x5f0 test_skcipher+0xbc/0x120 alg_test_skcipher+0xa0/0x1b0 alg_test+0x3dc/0x47c cryptomgr_test+0x38/0x60 Fixes: 0e89640b640d ("crypto: arm64 - Use modern annotations for assembly functions") Cc: # 5.6.x- Signed-off-by: Jeremy Linton Suggested-by: Dave P Martin Reviewed-by: Ard Biesheuvel Reviewed-by: Mark Brown Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.com Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit fc5b5ae8ac3cf955bebd0c219a93219bf9c3f4a2 Author: Daniel Borkmann Date: Wed Oct 7 15:48:58 2020 +0200 bpf: Fix scalar32_min_max_or bounds tracking commit 5b9fbeb75b6a98955f628e205ac26689bcb1383e upstream. Simon reported an issue with the current scalar32_min_max_or() implementation. That is, compared to the other 32 bit subreg tracking functions, the code in scalar32_min_max_or() stands out that it's using the 64 bit registers instead of 32 bit ones. This leads to bounds tracking issues, for example: [...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...] The bound tests on the map value force the upper unsigned bound to be 25769803777 in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By using OR they are truncated and thus result in the range [1,1] for the 32 bit reg tracker. This is incorrect given the only thing we know is that the value must be positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes sense, for example, for the case where we update dst_reg->s32_{min,max}_value in the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as we know that these are positive. Previously, in the else branch the 64 bit values of umin_value=1 and umax_value=32212254719 were used and latter got truncated to be 1 as upper bound there. After the fix the subreg range is now correct: [...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...] Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Simon Scannell Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 849d01ef1894d9495d7405746ed948fbce4b831b Author: Geert Uytterhoeven Date: Tue Sep 22 09:29:31 2020 +0200 Revert "ravb: Fixed to be able to unload modules" commit 77972b55fb9d35d4a6b0abca99abffaa4ec6a85b upstream. This reverts commit 1838d6c62f57836639bd3d83e7855e0ee4f6defc. This commit moved the ravb_mdio_init() call (and thus the of_mdiobus_register() call) from the ravb_probe() to the ravb_open() call. This causes a regression during system resume (s2idle/s2ram), as new PHY devices cannot be bound while suspended. During boot, the Micrel PHY is detected like this: Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=228) ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off During system suspend, (A) defer_all_probes is set to true, and (B) usermodehelper_disabled is set to UMH_DISABLED, to avoid drivers being probed while suspended. A. If CONFIG_MODULES=n, phy_device_register() calling device_add() merely adds the device, but does not probe it yet, as really_probe() returns early due to defer_all_probes being set: dpm_resume+0x128/0x4f8 device_resume+0xcc/0x1b0 dpm_run_callback+0x74/0x340 ravb_resume+0x190/0x1b8 ravb_open+0x84/0x770 of_mdiobus_register+0x1e0/0x468 of_mdiobus_register_phy+0x1b8/0x250 of_mdiobus_phy_device_register+0x178/0x1e8 phy_device_register+0x114/0x1b8 device_add+0x3d4/0x798 bus_probe_device+0x98/0xa0 device_initial_probe+0x10/0x18 __device_attach+0xe4/0x140 bus_for_each_drv+0x64/0xc8 __device_attach_driver+0xb8/0xe0 driver_probe_device.part.11+0xc4/0xd8 really_probe+0x32c/0x3b8 Later, phy_attach_direct() notices no PHY driver has been bound, and falls back to the Generic PHY, leading to degraded operation: Generic PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=POLL) ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off B. If CONFIG_MODULES=y, request_module() returns early with -EBUSY due to UMH_DISABLED, and MDIO initialization fails completely: mdio_bus e6800000.ethernet-ffffffff:00: error -16 loading PHY driver module for ID 0x00221622 ravb e6800000.ethernet eth0: failed to initialize MDIO PM: dpm_run_callback(): ravb_resume+0x0/0x1b8 returns -16 PM: Device e6800000.ethernet failed to resume: error -16 Ignoring -EBUSY in phy_request_driver_module(), like was done for -ENOENT in commit 21e194425abd65b5 ("net: phy: fix issue with loading PHY driver w/o initramfs"), would makes it fall back to the Generic PHY, like in the CONFIG_MODULES=n case. Signed-off-by: Geert Uytterhoeven Cc: stable@vger.kernel.org Reviewed-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e57db2fee8b123b180b7829e1edf17311877b86b Author: Peilin Ye Date: Thu Sep 24 09:43:48 2020 -0400 fbcon: Fix global-out-of-bounds read in fbcon_get_font() commit 5af08640795b2b9a940c9266c0260455377ae262 upstream. fbcon_get_font() is reading out-of-bounds. A malicious user may resize `vc->vc_font.height` to a large value, causing fbcon_get_font() to read out of `fontdata`. fbcon_get_font() handles both built-in and user-provided fonts. Fortunately, recently we have added FONT_EXTRA_WORDS support for built-in fonts, so fix it by adding range checks using FNTSIZE(). This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h", and patch "Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts". Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+29d4ed7f3bdedf2aa2fd@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecfd Signed-off-by: Peilin Ye Reviewed-by: Greg Kroah-Hartman Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/b34544687a1a09d6de630659eb7a773f4953238b.1600953813.git.yepeilin.cs@gmail.com Signed-off-by: Greg Kroah-Hartman commit 34873e40e8d864e9f1b9f524a84b9a622b78bdf0 Author: Peilin Ye Date: Thu Sep 24 09:42:22 2020 -0400 Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts commit 6735b4632def0640dbdf4eb9f99816aca18c4f16 upstream. syzbot has reported an issue in the framebuffer layer, where a malicious user may overflow our built-in font data buffers. In order to perform a reliable range check, subsystems need to know `FONTDATAMAX` for each built-in font. Unfortunately, our font descriptor, `struct console_font` does not contain `FONTDATAMAX`, and is part of the UAPI, making it infeasible to modify it. For user-provided fonts, the framebuffer layer resolves this issue by reserving four extra words at the beginning of data buffers. Later, whenever a function needs to access them, it simply uses the following macros: Recently we have gathered all the above macros to . Let us do the same thing for built-in fonts, prepend four extra words (including `FONTDATAMAX`) to their data buffers, so that subsystems can use these macros for all fonts, no matter built-in or user-provided. This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h". Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecfd Signed-off-by: Peilin Ye Reviewed-by: Greg Kroah-Hartman Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/ef18af00c35fb3cc826048a5f70924ed6ddce95b.1600953813.git.yepeilin.cs@gmail.com Signed-off-by: Greg Kroah-Hartman commit 3714c5596a9d93f7c69c24130b685e0b4c2e3870 Author: Peilin Ye Date: Thu Sep 24 09:40:53 2020 -0400 fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h commit bb0890b4cd7f8203e3aa99c6d0f062d6acdaad27 upstream. drivers/video/console/newport_con.c is borrowing FONT_EXTRA_WORDS macros from drivers/video/fbdev/core/fbcon.h. To keep things simple, move all definitions into . Since newport_con now uses four extra words, initialize the fourth word in newport_set_font() properly. Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye Reviewed-by: Greg Kroah-Hartman Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/7fb8bc9b0abc676ada6b7ac0e0bd443499357267.1600953813.git.yepeilin.cs@gmail.com Signed-off-by: Greg Kroah-Hartman