Skip to content
  1. Aug 08, 2021
  2. Aug 04, 2021
    • Greg Kroah-Hartman's avatar
      Linux 4.19.201 · 6ca2f514
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20210802134334.081433902@linuxfoundation.org
      Tested-by: default avatarPavel Machek (CIP) <pavel@denx.de>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v4.19.201
      6ca2f514
    • Lukasz Cieplicki's avatar
      i40e: Add additional info to PHY type error · fbb04f7c
      Lukasz Cieplicki authored
      commit dc614c46 upstream.
      
      In case of PHY type error occurs, the message was too generic.
      Add additional info to PHY type error indicating that it can be
      wrong cable connected.
      
      Fixes: 124ed15b
      
       ("i40e: Add dual speed module support")
      Signed-off-by: default avatarLukasz Cieplicki <lukaszx.cieplicki@intel.com>
      Signed-off-by: default avatarMichal Maloszewski <michal.maloszewski@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbb04f7c
    • Arnaldo Carvalho de Melo's avatar
      Revert "perf map: Fix dso->nsinfo refcounting" · 265c33f7
      Arnaldo Carvalho de Melo authored
      commit 9bac1bd6 upstream.
      
      This makes 'perf top' abort in some cases, and the right fix will
      involve surgery that is too much to do at this stage, so revert for now
      and fix it in the next merge window.
      
      This reverts commit 2d6b74ba
      
      .
      
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      265c33f7
    • Srikar Dronamraju's avatar
      powerpc/pseries: Fix regression while building external modules · 91bbeacf
      Srikar Dronamraju authored
      commit 333cf507 upstream.
      
      With commit c9f34013 ("powerpc: Always enable queued spinlocks for
      64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always
      enabled on ppc64le, external modules that use spinlock APIs are
      failing.
      
        ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor'
      
      Before the above commit, modules were able to build without any
      issues. Also this problem is not seen on other architectures. This
      problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in
      the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by
      default and only enabled in certain conditions like
      CONFIG_DEBUG_SPINLOCKS is set in the kernel config.
      
        #include <linux/module.h>
        spinlock_t spLock;
      
        static int __init spinlock_test_init(void)
        {
                spin_lock_init(&spLock);
                spin_lock(&spLock);
                spin_unlock(&spLock);
                return 0;
        }
      
        static void __exit spinlock_test_exit(void)
        {
        	printk("spinlock_test unloaded\n");
        }
        module_init(spinlock_test_init);
        module_exit(spinlock_test_exit);
      
        MODULE_DESCRIPTION ("spinlock_test");
        MODULE_LICENSE ("non-GPL");
        MODULE_AUTHOR ("Srikar Dronamraju");
      
      Given that spin locks are one of the basic facilities for module code,
      this effectively makes it impossible to build/load almost any non GPL
      modules on ppc64le.
      
      This was first reported at https://github.com/openzfs/zfs/issues/11172
      
      Currently shared_processor is exported as GPL only symbol.
      Fix this for parity with other architectures by exposing
      shared_processor to non-GPL modules too.
      
      Fixes: 14c73bd3
      
       ("powerpc/vcpu: Assume dedicated processors as non-preempt")
      Cc: stable@vger.kernel.org # v5.5+
      Reported-by: default avatar <marc.c.dionne@gmail.com>
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729060449.292780-1-srikar@linux.vnet.ibm.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91bbeacf
    • Dan Carpenter's avatar
      can: hi311x: fix a signedness bug in hi3110_cmd() · 5a6c550c
      Dan Carpenter authored
      [ Upstream commit f6b3c784 ]
      
      The hi3110_cmd() is supposed to return zero on success and negative
      error codes on failure, but it was accidentally declared as a u8 when
      it needs to be an int type.
      
      Fixes: 57e83fb9
      
       ("can: hi311x: Add Holt HI-311x CAN driver")
      Link: https://lore.kernel.org/r/20210729141246.GA1267@kili
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a6c550c
    • Wang Hai's avatar
      sis900: Fix missing pci_disable_device() in probe and remove · 6bee3885
      Wang Hai authored
      [ Upstream commit 89fb62fd ]
      
      Replace pci_enable_device() with pcim_enable_device(),
      pci_disable_device() and pci_release_regions() will be
      called in release automatically.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6bee3885
    • Wang Hai's avatar
      tulip: windbond-840: Fix missing pci_disable_device() in probe and remove · fdb90238
      Wang Hai authored
      [ Upstream commit 76a16be0 ]
      
      Replace pci_enable_device() with pcim_enable_device(),
      pci_disable_device() and pci_release_regions() will be
      called in release automatically.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fdb90238
    • Marcelo Ricardo Leitner's avatar
      sctp: fix return value check in __sctp_rcv_asconf_lookup · d9b6f0a2
      Marcelo Ricardo Leitner authored
      [ Upstream commit 557fb586
      
       ]
      
      As Ben Hutchings noticed, this check should have been inverted: the call
      returns true in case of success.
      
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 0c5dc070
      
       ("sctp: validate from_addr_param return")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9b6f0a2
    • Maor Gottlieb's avatar
      net/mlx5: Fix flow table chaining · 2a7b6a52
      Maor Gottlieb authored
      [ Upstream commit 8b54874e ]
      
      Fix a bug when flow table is created in priority that already
      has other flow tables as shown in the below diagram.
      If the new flow table (FT-B) has the lowest level in the priority,
      we need to connect the flow tables from the previous priority (p0)
      to this new table. In addition when this flow table is destroyed
      (FT-B), we need to connect the flow tables from the previous
      priority (p0) to the next level flow table (FT-C) in the same
      priority of the destroyed table (if exists).
      
                             ---------
                             |root_ns|
                             ---------
                                  |
                  --------------------------------
                  |               |              |
             ----------      ----------      ---------
             |p(prio)-x|     |   p-y  |      |   p-n |
             ----------      ----------      ---------
                  |               |
           ----------------  ------------------
           |ns(e.g bypass)|  |ns(e.g. kernel) |
           ----------------  ------------------
                  |            |           |
      	-------	       ------       ----
              |  p0 |        | p1 |       |p2|
              -------        ------       ----
                 |             |    \
              --------       ------- ------
              | FT-A |       |FT-B | |FT-C|
              --------       ------- ------
      
      Fixes: f90edfd2
      
       ("net/mlx5_core: Connect flow tables")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a7b6a52
    • Pavel Skripkin's avatar
      net: llc: fix skb_over_panic · 9a27cd7f
      Pavel Skripkin authored
      [ Upstream commit c7c9d210 ]
      
      Syzbot reported skb_over_panic() in llc_pdu_init_as_xid_cmd(). The
      problem was in wrong LCC header manipulations.
      
      Syzbot's reproducer tries to send XID packet. llc_ui_sendmsg() is
      doing following steps:
      
      	1. skb allocation with size = len + header size
      		len is passed from userpace and header size
      		is 3 since addr->sllc_xid is set.
      
      	2. skb_reserve() for header_len = 3
      	3. filling all other space with memcpy_from_msg()
      
      Ok, at this moment we have fully loaded skb, only headers needs to be
      filled.
      
      Then code comes to llc_sap_action_send_xid_c(). This function pushes 3
      bytes for LLC PDU header and initializes it. Then comes
      llc_pdu_init_as_xid_cmd(). It initalizes next 3 bytes *AFTER* LLC PDU
      header and call skb_push(skb, 3). This looks wrong for 2 reasons:
      
      	1. Bytes rigth after LLC header are user data, so this function
      	   was overwriting payload.
      
      	2. skb_push(skb, 3) call can cause skb_over_panic() since
      	   all free space was filled in llc_ui_sendmsg(). (This can
      	   happen is user passed 686 len: 686 + 14 (eth header) + 3 (LLC
      	   header) = 703. SKB_DATA_ALIGN(703) = 704)
      
      So, in this patch I added 2 new private constansts: LLC_PDU_TYPE_U_XID
      and LLC_PDU_LEN_U_XID. LLC_PDU_LEN_U_XID is used to correctly reserve
      header size to handle LLC + XID case. LLC_PDU_TYPE_U_XID is used by
      llc_pdu_header_init() function to push 6 bytes instead of 3. And finally
      I removed skb_push() call from llc_pdu_init_as_xid_cmd().
      
      This changes should not affect other parts of LLC, since after
      all steps we just transmit buffer.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-and-tested-by: default avatar <syzbot+5e5a981ad7cc54c4b2b4@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a27cd7f
    • Jiapeng Chong's avatar
      mlx4: Fix missing error code in mlx4_load_one() · 6f243cef
      Jiapeng Chong authored
      [ Upstream commit 7e4960b3
      
       ]
      
      The error code is missing in this code scenario, add the error code
      '-EINVAL' to the return value 'err'.
      
      Eliminate the follow smatch warning:
      
      drivers/net/ethernet/mellanox/mlx4/main.c:3538 mlx4_load_one() warn:
      missing error code 'err'.
      
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Fixes: 7ae0e400
      
       ("net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs")
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f243cef
    • Hoang Le's avatar
      tipc: fix sleeping in tipc accept routine · 01f178e5
      Hoang Le authored
      [ Upstream commit d237a7f1 ]
      
      The release_sock() is blocking function, it would change the state
      after sleeping. In order to evaluate the stated condition outside
      the socket lock context, switch to use wait_woken() instead.
      
      Fixes: 6398e23c
      
       ("tipc: standardize accept routine")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      01f178e5
    • Jedrzej Jagielski's avatar
      i40e: Fix log TC creation failure when max num of queues is exceeded · 498d7ab1
      Jedrzej Jagielski authored
      [ Upstream commit ea52faae ]
      
      Fix missing failed message if driver does not have enough queues to
      complete TC command. Without this fix no message is displayed in dmesg.
      
      Fixes: a9ce82f7
      
       ("i40e: Enable 'channel' mode in mqprio for TC configs")
      Signed-off-by: default avatarGrzegorz Szczurek <grzegorzx.szczurek@intel.com>
      Signed-off-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: default avatarImam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      498d7ab1
    • Arkadiusz Kubalewski's avatar
      i40e: Fix logic of disabling queues · 6d51a5fb
      Arkadiusz Kubalewski authored
      [ Upstream commit 65662a8d ]
      
      Correct the message flow between driver and firmware when disabling
      queues.
      
      Previously in case of PF reset (due to required reinit after reconfig),
      the error like: "VSI seid 397 Tx ring 60 disable timeout" could show up
      occasionally. The error was not a real issue of hardware or firmware,
      it was caused by wrong sequence of messages invoked by the driver.
      
      Fixes: 41c445ff
      
       ("i40e: main driver core")
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d51a5fb
    • Pablo Neira Ayuso's avatar
      netfilter: nft_nat: allow to specify layer 4 protocol NAT only · 1cb5995a
      Pablo Neira Ayuso authored
      [ Upstream commit a33f387e ]
      
      nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified.
      
      Fixes: d07db988
      
       ("netfilter: nf_tables: introduce nft_validate_register_load()")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1cb5995a
    • Florian Westphal's avatar
      netfilter: conntrack: adjust stop timestamp to real expiry value · 512fd52e
      Florian Westphal authored
      [ Upstream commit 30a56a2b ]
      
      In case the entry is evicted via garbage collection there is
      delay between the timeout value and the eviction event.
      
      This adjusts the stop value based on how much time has passed.
      
      Fixes: b87a2f91
      
       ("netfilter: conntrack: add gc worker to remove timed-out entries")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      512fd52e
    • Nguyen Dinh Phi's avatar
      cfg80211: Fix possible memory leak in function cfg80211_bss_update · 672f6ea5
      Nguyen Dinh Phi authored
      commit f9a5c358
      
       upstream.
      
      When we exceed the limit of BSS entries, this function will free the
      new entry, however, at this time, it is the last door to access the
      inputed ies, so these ies will be unreferenced objects and cause memory
      leak.
      Therefore we should free its ies before deallocating the new entry, beside
      of dropping it from hidden_list.
      
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Link: https://lore.kernel.org/r/20210628132334.851095-1-phind.uet@gmail.com
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      672f6ea5
    • Krzysztof Kozlowski's avatar
      nfc: nfcsim: fix use after free during module unload · 0bf3eb2e
      Krzysztof Kozlowski authored
      commit 5e7b30d2 upstream.
      
      There is a use after free memory corruption during module exit:
       - nfcsim_exit()
        - nfcsim_device_free(dev0)
          - nfc_digital_unregister_device()
            This iterates over command queue and frees all commands,
          - dev->up = false
          - nfcsim_link_shutdown()
            - nfcsim_link_recv_wake()
              This wakes the sleeping thread nfcsim_link_recv_skb().
      
       - nfcsim_link_recv_skb()
         Wake from wait_event_interruptible_timeout(),
         call directly the deb->cb callback even though (dev->up == false),
         - digital_send_cmd_complete()
           Dereference of "struct digital_cmd" cmd which was freed earlier by
           nfc_digital_unregister_device().
      
      This causes memory corruption shortly after (with unrelated stack
      trace):
      
        nfc nfc0: NFC: nfcsim_recv_wq: Device is down
        llcp: nfc_llcp_recv: err -19
        nfc nfc1: NFC: nfcsim_recv_wq: Device is down
        BUG: unable to handle page fault for address: ffffffffffffffed
        Call Trace:
         fsnotify+0x54b/0x5c0
         __fsnotify_parent+0x1fe/0x300
         ? vfs_write+0x27c/0x390
         vfs_write+0x27c/0x390
         ksys_write+0x63/0xe0
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      KASAN report:
      
        BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50
        Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71
        Workqueue: events nfcsim_recv_wq [nfcsim]
        Call Trace:
         dump_stack_lvl+0x45/0x59
         print_address_description.constprop.0+0x21/0x140
         ? digital_send_cmd_complete+0x16/0x50
         ? digital_send_cmd_complete+0x16/0x50
         kasan_report.cold+0x7f/0x11b
         ? digital_send_cmd_complete+0x16/0x50
         ? digital_dep_link_down+0x60/0x60
         digital_send_cmd_complete+0x16/0x50
         nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]
         ? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]
         ? lock_is_held_type+0x98/0x110
         ? finish_wait+0x110/0x110
         ? rcu_read_lock_sched_held+0x9c/0xd0
         ? rcu_read_lock_bh_held+0xb0/0xb0
         ? lockdep_hardirqs_on_prepare+0x12e/0x1f0
      
      This flow of calling digital_send_cmd_complete() callback on driver exit
      is specific to nfcsim which implements reading and sending work queues.
      Since the NFC digital device was unregistered, the callback should not
      be called.
      
      Fixes: 204bddcb
      
       ("NFC: nfcsim: Make use of the Digital layer")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bf3eb2e
    • Paul Jakma's avatar
      NIU: fix incorrect error return, missed in previous revert · 4cd2534c
      Paul Jakma authored
      commit 15bbf8bb upstream.
      
      Commit 7930742d, reverting 26fd962b, missed out on reverting an incorrect
      change to a return value.  The niu_pci_vpd_scan_props(..) == 1 case appears
      to be a normal path - treating it as an error and return -EINVAL was
      breaking VPD_SCAN and causing the driver to fail to load.
      
      Fix, so my Neptune card works again.
      
      Cc: Kangjie Lu <kjlu@umn.edu>
      Cc: Shannon Nelson <shannon.lee.nelson@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Fixes: 7930742d
      
       ('Revert "niu: fix missing checks of niu_pci_eeprom_read"')
      Signed-off-by: default avatarPaul Jakma <paul@jakma.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cd2534c
    • Pavel Skripkin's avatar
      can: esd_usb2: fix memory leak · 9c9e4511
      Pavel Skripkin authored
      commit 928150fa upstream.
      
      In esd_usb2_setup_rx_urbs() MAX_RX_URBS coherent buffers are allocated
      and there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see esd_usb2_setup_rx_urbs) and this flag cannot be used
         with coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 96d8e903
      
       ("can: Add driver for esd CAN-USB/2 device")
      Link: https://lore.kernel.org/r/b31b096926dcb35998ad0271aac4b51770ca7cc8.1627404470.git.paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c9e4511
    • Pavel Skripkin's avatar
      can: ems_usb: fix memory leak · 71758cd0
      Pavel Skripkin authored
      commit 9969e3c5 upstream.
      
      In ems_usb_start() MAX_RX_URBS coherent buffers are allocated and
      there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see ems_usb_start) and this flag cannot be used with
         coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 702171ad
      
       ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
      Link: https://lore.kernel.org/r/59aa9fbc9a8cbf9af2bbd2f61a659c480b415800.1627404470.git.paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71758cd0
    • Pavel Skripkin's avatar
      can: usb_8dev: fix memory leak · 6c9d6198
      Pavel Skripkin authored
      commit 0e865f0c upstream.
      
      In usb_8dev_start() MAX_RX_URBS coherent buffers are allocated and
      there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see usb_8dev_start) and this flag cannot be used with
         coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 0024d8ad
      
       ("can: usb_8dev: Add support for USB2CAN interface from 8 devices")
      Link: https://lore.kernel.org/r/d39b458cd425a1cf7f512f340224e6e9563b07bd.1627404470.git.paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c9d6198
    • Pavel Skripkin's avatar
      can: mcba_usb_start(): add missing urb->transfer_dma initialization · ab9597bc
      Pavel Skripkin authored
      commit fc43fb69 upstream.
      
      Yasushi reported, that his Microchip CAN Analyzer stopped working
      since commit 91c02557 ("can: mcba_usb: fix memory leak in
      mcba_usb"). The problem was in missing urb->transfer_dma
      initialization.
      
      In my previous patch to this driver I refactored mcba_usb_start() code
      to avoid leaking usb coherent buffers. To archive it, I passed local
      stack variable to usb_alloc_coherent() and then saved it to private
      array to correctly free all coherent buffers on ->close() call. But I
      forgot to initialize urb->transfer_dma with variable passed to
      usb_alloc_coherent().
      
      All of this was causing device to not work, since dma addr 0 is not
      valid and following log can be found on bug report page, which points
      exactly to problem described above.
      
      | DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
      
      Fixes: 91c02557
      
       ("can: mcba_usb: fix memory leak in mcba_usb")
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990850
      Link: https://lore.kernel.org/r/20210725103630.23864-1-paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatarYasushi SHOJI <yasushi.shoji@gmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Tested-by: default avatarYasushi SHOJI <yashi@spacecubics.com>
      [mkl: fixed typos in commit message - thanks Yasushi SHOJI]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab9597bc
    • Ziyang Xuan's avatar
      can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF · 96faa82a
      Ziyang Xuan authored
      commit 54f93336 upstream.
      
      We get a bug during ltp can_filter test as following.
      
      ===========================================
      [60919.264984] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [60919.265223] PGD 8000003dda726067 P4D 8000003dda726067 PUD 3dda727067 PMD 0
      [60919.265443] Oops: 0000 [#1] SMP PTI
      [60919.265550] CPU: 30 PID: 3638365 Comm: can_filter Kdump: loaded Tainted: G        W         4.19.90+ #1
      [60919.266068] RIP: 0010:selinux_socket_sock_rcv_skb+0x3e/0x200
      [60919.293289] RSP: 0018:ffff8d53bfc03cf8 EFLAGS: 00010246
      [60919.307140] RAX: 0000000000000000 RBX: 000000000000001d RCX: 0000000000000007
      [60919.320756] RDX: 0000000000000001 RSI: ffff8d5104a8ed00 RDI: ffff8d53bfc03d30
      [60919.334319] RBP: ffff8d9338056800 R08: ffff8d53bfc29d80 R09: 0000000000000001
      [60919.347969] R10: ffff8d53bfc03ec0 R11: ffffb8526ef47c98 R12: ffff8d53bfc03d30
      [60919.350320] perf: interrupt took too long (3063 > 2500), lowering kernel.perf_event_max_sample_rate to 65000
      [60919.361148] R13: 0000000000000001 R14: ffff8d53bcf90000 R15: 0000000000000000
      [60919.361151] FS:  00007fb78b6b3600(0000) GS:ffff8d53bfc00000(0000) knlGS:0000000000000000
      [60919.400812] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [60919.413730] CR2: 0000000000000010 CR3: 0000003e3f784006 CR4: 00000000007606e0
      [60919.426479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [60919.439339] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [60919.451608] PKRU: 55555554
      [60919.463622] Call Trace:
      [60919.475617]  <IRQ>
      [60919.487122]  ? update_load_avg+0x89/0x5d0
      [60919.498478]  ? update_load_avg+0x89/0x5d0
      [60919.509822]  ? account_entity_enqueue+0xc5/0xf0
      [60919.520709]  security_sock_rcv_skb+0x2a/0x40
      [60919.531413]  sk_filter_trim_cap+0x47/0x1b0
      [60919.542178]  ? kmem_cache_alloc+0x38/0x1b0
      [60919.552444]  sock_queue_rcv_skb+0x17/0x30
      [60919.562477]  raw_rcv+0x110/0x190 [can_raw]
      [60919.572539]  can_rcv_filter+0xbc/0x1b0 [can]
      [60919.582173]  can_receive+0x6b/0xb0 [can]
      [60919.591595]  can_rcv+0x31/0x70 [can]
      [60919.600783]  __netif_receive_skb_one_core+0x5a/0x80
      [60919.609864]  process_backlog+0x9b/0x150
      [60919.618691]  net_rx_action+0x156/0x400
      [60919.627310]  ? sched_clock_cpu+0xc/0xa0
      [60919.635714]  __do_softirq+0xe8/0x2e9
      [60919.644161]  do_softirq_own_stack+0x2a/0x40
      [60919.652154]  </IRQ>
      [60919.659899]  do_softirq.part.17+0x4f/0x60
      [60919.667475]  __local_bh_enable_ip+0x60/0x70
      [60919.675089]  __dev_queue_xmit+0x539/0x920
      [60919.682267]  ? finish_wait+0x80/0x80
      [60919.689218]  ? finish_wait+0x80/0x80
      [60919.695886]  ? sock_alloc_send_pskb+0x211/0x230
      [60919.702395]  ? can_send+0xe5/0x1f0 [can]
      [60919.708882]  can_send+0xe5/0x1f0 [can]
      [60919.715037]  raw_sendmsg+0x16d/0x268 [can_raw]
      
      It's because raw_setsockopt() concurrently with
      unregister_netdevice_many(). Concurrent scenario as following.
      
      	cpu0						cpu1
      raw_bind
      raw_setsockopt					unregister_netdevice_many
      						unlist_netdevice
      dev_get_by_index				raw_notifier
      raw_enable_filters				......
      can_rx_register
      can_rcv_list_find(..., net->can.rx_alldev_list)
      
      ......
      
      sock_close
      raw_release(sock_a)
      
      ......
      
      can_receive
      can_rcv_filter(net->can.rx_alldev_list, ...)
      raw_rcv(skb, sock_a)
      BUG
      
      After unlist_netdevice(), dev_get_by_index() return NULL in
      raw_setsockopt(). Function raw_enable_filters() will add sock
      and can_filter to net->can.rx_alldev_list. Then the sock is closed.
      Followed by, we sock_sendmsg() to a new vcan device use the same
      can_filter. Protocol stack match the old receiver whose sock has
      been released on net->can.rx_alldev_list in can_rcv_filter().
      Function raw_rcv() uses the freed sock. UAF BUG is triggered.
      
      We can find that the key issue is that net_device has not been
      protected in raw_setsockopt(). Use rtnl_lock to protect net_device
      in raw_setsockopt().
      
      Fixes: c18ce101
      
       ("[CAN]: Add raw protocol")
      Link: https://lore.kernel.org/r/20210722070819.1048263-1-william.xuanziyang@huawei.com
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96faa82a
    • Junxiao Bi's avatar
      ocfs2: issue zeroout to EOF blocks · 40b5e3f5
      Junxiao Bi authored
      commit 9449ad33 upstream.
      
      For punch holes in EOF blocks, fallocate used buffer write to zero the
      EOF blocks in last cluster.  But since ->writepage will ignore EOF
      pages, those zeros will not be flushed.
      
      This "looks" ok as commit 6bba4471
      
       ("ocfs2: fix data corruption by
      fallocate") will zero the EOF blocks when extend the file size, but it
      isn't.  The problem happened on those EOF pages, before writeback, those
      pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
      set, when writeback run by write_cache_pages(), DIRTY flag on the page
      was cleared, but DIRTY flag on the buffer_head not.
      
      When next write happened to those EOF pages, since buffer_head already
      had DIRTY flag set, it would not mark page DIRTY again.  That made
      writeback ignore them forever.  That will cause data corruption.  Even
      directio write can't work because it will fail when trying to drop pages
      caches before direct io, as it found the buffer_head for those pages
      still had DIRTY flag set, then it will fall back to buffer io mode.
      
      To make a summary of the issue, as writeback ingores EOF pages, once any
      EOF page is generated, any write to it will only go to the page cache,
      it will never be flushed to disk even file size extends and that page is
      not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
      write.
      
      The following code snippet from qemu-img could trigger the corruption.
      
        656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
        ...
        660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
        660   fallocate(11, 0, 2275868672, 327680) = 0
        658   pwrite64(11, "
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40b5e3f5
    • Junxiao Bi's avatar
      ocfs2: fix zero out valid data · 4c97ed43
      Junxiao Bi authored
      commit f267aeb6 upstream.
      
      If append-dio feature is enabled, direct-io write and fallocate could
      run in parallel to extend file size, fallocate used "orig_isize" to
      record i_size before taking "ip_alloc_sem", when
      ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
      extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
      out.
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
      Fixes: 6bba4471
      
       ("ocfs2: fix data corruption by fallocate")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c97ed43
    • Juergen Gross's avatar
      x86/kvm: fix vcpu-id indexed array sizes · ce1fd5a2
      Juergen Gross authored
      commit 76b4f357
      
       upstream.
      
      KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
      of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
      elements.
      
      Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
      an odd number, resulting in always enough padding being available at
      the end of those arrays.
      
      Nevertheless this should be fixed in order to avoid rare problems in
      case someone is using an even number for KVM_MAX_VCPU_ID.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210701154105.23215-2-jgross@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce1fd5a2
    • Desmond Cheong Zhi Xi's avatar
      btrfs: fix rw device counting in __btrfs_free_extra_devids · d96a56a7
      Desmond Cheong Zhi Xi authored
      commit b2a61667 upstream.
      
      When removing a writeable device in __btrfs_free_extra_devids, the rw
      device count should be decremented.
      
      This error was caught by Syzbot which reported a warning in
      close_fs_devices:
      
        WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        Modules linked in:
        CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
        RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
        RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
        R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
        R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
        FS:  00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
         open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
         btrfs_fill_super fs/btrfs/super.c:1382 [inline]
         btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         fc_mount fs/namespace.c:993 [inline]
         vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
         btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         do_new_mount fs/namespace.c:2905 [inline]
         path_mount+0x196f/0x2be0 fs/namespace.c:3235
         do_mount fs/namespace.c:3248 [inline]
         __do_sys_mount fs/namespace.c:3456 [inline]
         __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Because fs_devices->rw_devices was not 0 after
      closing all devices. Here is the call trace that was observed:
      
        btrfs_mount_root():
          btrfs_scan_one_device():
            device_list_add();   <---------------- device added
          btrfs_open_devices():
            open_fs_devices():
              btrfs_open_one_device();   <-------- writable device opened,
      	                                     rw device count ++
          btrfs_fill_super():
            open_ctree():
              btrfs_free_extra_devids():
      	  __btrfs_free_extra_devids();  <--- writable device removed,
      	                              rw device count not decremented
      	  fail_tree_roots:
      	    btrfs_close_devices():
      	      close_fs_devices();   <------- rw device count off by 1
      
      As a note, prior to commit cf89af14 ("btrfs: dev-replace: fail
      mount if we don't have replace item with target device"), rw_devices
      was decremented on removing a writable device in
      __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
      was not set for the device. However, this check does not need to be
      reinstated as it is now redundant and incorrect.
      
      In __btrfs_free_extra_devids, we skip removing the device if it is the
      target for replacement. This is done by checking whether device->devid
      == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
      only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
      should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
      and so it's redundant to test for that bit.
      
      Additionally, following commit 82372bc8
      
       ("Btrfs: make
      the logic of source device removing more clear"), rw_devices is
      incremented whenever a writeable device is added to the alloc
      list (including the target device in btrfs_dev_replace_finishing), so
      all removals of writable devices from the alloc list should also be
      accompanied by a decrement to rw_devices.
      
      Reported-by: default avatar <syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com>
      Fixes: cf89af14
      
       ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
      CC: stable@vger.kernel.org # 5.10+
      Tested-by: default avatar <syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d96a56a7
    • Jan Kiszka's avatar
      x86/asm: Ensure asm/proto.h can be included stand-alone · 6c8c88a4
      Jan Kiszka authored
      [ Upstream commit f7b21a0e
      
       ]
      
      Fix:
      
        ../arch/x86/include/asm/proto.h:14:30: warning: ‘struct task_struct’ declared \
          inside parameter list will not be visible outside of this definition or declaration
        long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2);
                                     ^~~~~~~~~~~
      
        .../arch/x86/include/asm/proto.h:40:34: warning: ‘struct task_struct’ declared \
          inside parameter list will not be visible outside of this definition or declaration
         long do_arch_prctl_common(struct task_struct *task, int option,
                                          ^~~~~~~~~~~
      
      if linux/sched.h hasn't be included previously. This fixes a build error
      when this header is used outside of the kernel tree.
      
       [ bp: Massage commit message. ]
      
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/b76b4be3-cf66-f6b2-9a6c-3e7ef54f9845@web.de
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6c8c88a4
    • Eric Dumazet's avatar
      gro: ensure frag0 meets IP header alignment · d94d95ae
      Eric Dumazet authored
      commit 38ec4944 upstream.
      
      After commit 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
      Guenter Roeck reported one failure in his tests using sh architecture.
      
      After much debugging, we have been able to spot silent unaligned accesses
      in inet_gro_receive()
      
      The issue at hand is that upper networking stacks assume their header
      is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
      bytes before the Ethernet header to make that happen.
      
      This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
      if the fragment is not properly aligned.
      
      Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
      as 0, this extra check will be a NOP for them.
      
      Note that if frag0 is not used, GRO will call pskb_may_pull()
      as many times as needed to pull network and transport headers.
      
      Fixes: 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
      Fixes: 78a478d0
      
       ("gro: Inline skb_gro_header and cache frag0 virtual address")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d94d95ae
    • Eric Dumazet's avatar
      virtio_net: Do not pull payload in skb->head · 16851e34
      Eric Dumazet authored
      commit 0f6925b3 upstream.
      
      Xuan Zhuo reported that commit 3226b158 ("net: avoid 32 x truesize
      under-estimation for tiny skbs") brought  a ~10% performance drop.
      
      The reason for the performance drop was that GRO was forced
      to chain sk_buff (using skb_shinfo(skb)->frag_list), which
      uses more memory but also cause packet consumers to go over
      a lot of overhead handling all the tiny skbs.
      
      It turns out that virtio_net page_to_skb() has a wrong strategy :
      It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
      copies 128 bytes from the page, before feeding the packet to GRO stack.
      
      This was suboptimal before commit 3226b158 ("net: avoid 32 x truesize
      under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
      meaning we were not packing MSS with 100% efficiency.
      
      Fix is to pull only the ethernet header in page_to_skb()
      
      Then, we change virtio_net_hdr_to_skb() to pull the missing
      headers, instead of assuming they were already pulled by callers.
      
      This fixes the performance regression, but could also allow virtio_net
      to accept packets with more than 128bytes of headers.
      
      Many thanks to Xuan Zhuo for his report, and his tests/help.
      
      Fixes: 3226b158
      
       ("net: avoid 32 x truesize under-estimation for tiny skbs")
      Reported-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Link: https://www.spinics.net/lists/netdev/msg731397.html
      Co-Developed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16851e34
  3. Jul 31, 2021