Skip to content
  1. Jul 02, 2022
  2. Jun 25, 2022
    • Greg Kroah-Hartman's avatar
      Linux 4.9.320 · 4ffa4be5
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20220623164344.053938039@linuxfoundation.org
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarPavel Machek (CIP) <pavel@denx.de>
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v4.9.320
      4ffa4be5
    • Willy Tarreau's avatar
      tcp: drop the hash_32() part from the index calculation · a81a6b20
      Willy Tarreau authored
      commit e8161345 upstream.
      
      In commit 190cc824
      
       ("tcp: change source port randomizarion at
      connect() time"), the table_perturb[] array was introduced and an
      index was taken from the port_offset via hash_32(). But it turns
      out that hash_32() performs a multiplication while the input here
      comes from the output of SipHash in secure_seq, that is well
      distributed enough to avoid the need for yet another hash.
      
      Suggested-by: default avatarAmit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a81a6b20
    • Willy Tarreau's avatar
      tcp: increase source port perturb table to 2^16 · 3c78eea6
      Willy Tarreau authored
      commit 4c2c8f03
      
       upstream.
      
      Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
      identify a client by forcing it to emit only 40 times more connections
      than there are entries in the table_perturb[] table. The previous two
      improvements consisting in resalting the secret every 10s and adding
      randomness to each port selection only slightly improved the situation,
      and the current value of 2^8 was too small as it's not very difficult
      to make a client emit 10k connections in less than 10 seconds.
      
      Thus we're increasing the perturb table from 2^8 to 2^16 so that the
      same precision now requires 2.6M connections, which is more difficult in
      this time frame and harder to hide as a background activity. The impact
      is that the table now uses 256 kB instead of 1 kB, which could mostly
      affect devices making frequent outgoing connections. However such
      components usually target a small set of destinations (load balancers,
      database clients, perf assessment tools), and in practice only a few
      entries will be visited, like before.
      
      A live test at 1 million connections per second showed no performance
      difference from the previous value.
      
      Reported-by: default avatarMoshe Kol <moshe.kol@mail.huji.ac.il>
      Reported-by: default avatarYossi Gilad <yossi.gilad@mail.huji.ac.il>
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c78eea6
    • Willy Tarreau's avatar
      tcp: dynamically allocate the perturb table used by source ports · dd82067b
      Willy Tarreau authored
      commit e9261476
      
       upstream.
      
      We'll need to further increase the size of this table and it's likely
      that at some point its size will not be suitable anymore for a static
      table. Let's allocate it on boot from inet_hashinfo2_init(), which is
      called from tcp_init().
      
      Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
      Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
      Cc: Amit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      [bwh: Backported to 4.9:
       - There is no inet_hashinfo2_init(), so allocate the table in
         inet_hashinfo_init() when called by TCP
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd82067b
    • Willy Tarreau's avatar
      tcp: add small random increments to the source port · aa772252
      Willy Tarreau authored
      commit ca7af040
      
       upstream.
      
      Here we're randomly adding between 0 and 7 random increments to the
      selected source port in order to add some noise in the source port
      selection that will make the next port less predictable.
      
      With the default port range of 32768-60999 this means a worst case
      reuse scenario of 14116/8=1764 connections between two consecutive
      uses of the same port, with an average of 14116/4.5=3137. This code
      was stressed at more than 800000 connections per second to a fixed
      target with all connections closed by the client using RSTs (worst
      condition) and only 2 connections failed among 13 billion, despite
      the hash being reseeded every 10 seconds, indicating a perfectly
      safe situation.
      
      Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
      Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
      Cc: Amit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa772252
    • Willy Tarreau's avatar
      tcp: use different parts of the port_offset for index and offset · 2ed413f1
      Willy Tarreau authored
      commit 9e9b70ae
      
       upstream.
      
      Amit Klein suggests that we use different parts of port_offset for the
      table's index and the port offset so that there is no direct relation
      between them.
      
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
      Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
      Cc: Amit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ed413f1
    • Willy Tarreau's avatar
      secure_seq: use the 64 bits of the siphash for port offset calculation · 576696ed
      Willy Tarreau authored
      commit b2d05756 upstream.
      
      SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
      7cd23e53
      
       ("secure_seq: use SipHash in place of MD5"), but the output
      remained truncated to 32-bit only. In order to exploit more bits from the
      hash, let's make the functions return the full 64-bit of siphash_3u32().
      We also make sure the port offset calculation in __inet_hash_connect()
      remains done on 32-bit to avoid the need for div_u64_rem() and an extra
      cost on 32-bit systems.
      
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
      Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
      Cc: Amit Klein <aksecurity@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576696ed
    • Eric Dumazet's avatar
      tcp: add some entropy in __inet_hash_connect() · 05a12e5c
      Eric Dumazet authored
      commit c579bd1b
      
       upstream.
      
      Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
      Port Selection Algorithm), a patient attacker could still be able
      to collect enough state from an otherwise idle host.
      
      Idea of this patch is to inject some noise, in the
      cases __inet_hash_connect() found a candidate in the first
      attempt.
      
      This noise should not significantly reduce the collision
      avoidance, and should be zero if connection table
      is already well used.
      
      Note that this is not implementing RFC 6056 3.3.5
      because we think Algorithm 5 could hurt typical
      workloads.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Dworken <ddworken@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05a12e5c
    • Eric Dumazet's avatar
      tcp: change source port randomizarion at connect() time · 136b4799
      Eric Dumazet authored
      commit 190cc824
      
       upstream.
      
      RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
      provides good summary of why source selection needs extra care.
      
      David Dworken reminded us that linux implements Algorithm 3
      as described in RFC 6056 3.3.3
      
      Quoting David :
         In the context of the web, this creates an interesting info leak where
         websites can count how many TCP connections a user's computer is
         establishing over time. For example, this allows a website to count
         exactly how many subresources a third party website loaded.
         This also allows:
         - Distinguishing between different users behind a VPN based on
             distinct source port ranges.
         - Tracking users over time across multiple networks.
         - Covert communication channels between different browsers/browser
             profiles running on the same computer
         - Tracking what applications are running on a computer based on
             the pattern of how fast source ports are getting incremented.
      
      Section 3.3.4 describes an enhancement, that reduces
      attackers ability to use the basic information currently
      stored into the shared 'u32 hint'.
      
      This change also decreases collision rate when
      multiple applications need to connect() to
      different destinations.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDavid Dworken <ddworken@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      136b4799
    • Miklos Szeredi's avatar
      fuse: fix pipe buffer lifetime for direct_io · b79d4d0d
      Miklos Szeredi authored
      commit 0c4bcfde
      
       upstream.
      
      In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
      fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
      imports the write buffer with fuse_get_user_pages(), which uses
      iov_iter_get_pages() to grab references to userspace pages instead of
      actually copying memory.
      
      On the filesystem device side, these pages can then either be read to
      userspace (via fuse_dev_read()), or splice()d over into a pipe using
      fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
      
      This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
      the userspace filesystem can mark the request as completed, causing write()
      to return. At that point, the userspace filesystem should no longer have
      access to the pipe buffer.
      
      Fix by copying pages coming from the user address space to new pipe
      buffers.
      
      Reported-by: default avatarJann Horn <jannh@google.com>
      Fixes: c3021629
      
       ("fuse: support splice() reading from fuse device")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarZach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b79d4d0d
    • Linus Torvalds's avatar
      Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · fd97de9c
      Linus Torvalds authored
      commit 901c7280 upstream.
      
      Halil Pasic points out [1] that the full revert of that commit (revert
      in bddac7c1), and that a partial revert that only reverts the
      problematic case, but still keeps some of the cleanups is probably
      better.  
      
      And that partial revert [2] had already been verified by Oleksandr
      Natalenko to also fix the issue, I had just missed that in the long
      discussion.
      
      So let's reinstate the cleanups from commit aa6f8dcb
      
       ("swiotlb:
      rework "fix info leak with DMA_FROM_DEVICE""), and effectively only
      revert the part that caused problems.
      
      Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1]
      Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2]
      Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3]
      Suggested-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [OP: backport to 4.14: apply swiotlb_tbl_map_single() changes in lib/swiotlb.c]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd97de9c
    • Halil Pasic's avatar
      swiotlb: fix info leak with DMA_FROM_DEVICE · c132f2ba
      Halil Pasic authored
      commit ddbd89de upstream.
      
      The problem I'm addressing was discovered by the LTP test covering
      cve-2018-1000204.
      
      A short description of what happens follows:
      1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
         interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
         and a corresponding dxferp. The peculiar thing about this is that TUR
         is not reading from the device.
      2) In sg_start_req() the invocation of blk_rq_map_user() effectively
         bounces the user-space buffer. As if the device was to transfer into
         it. Since commit a45b599a
      
       ("scsi: sg: allocate with __GFP_ZERO in
         sg_build_indirect()") we make sure this first bounce buffer is
         allocated with GFP_ZERO.
      3) For the rest of the story we keep ignoring that we have a TUR, so the
         device won't touch the buffer we prepare as if the we had a
         DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
         and the  buffer allocated by SG is mapped by the function
         virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
         scatter-gather and not scsi generics). This mapping involves bouncing
         via the swiotlb (we need swiotlb to do virtio in protected guest like
         s390 Secure Execution, or AMD SEV).
      4) When the SCSI TUR is done, we first copy back the content of the second
         (that is swiotlb) bounce buffer (which most likely contains some
         previous IO data), to the first bounce buffer, which contains all
         zeros.  Then we copy back the content of the first bounce buffer to
         the user-space buffer.
      5) The test case detects that the buffer, which it zero-initialized,
        ain't all zeros and fails.
      
      One can argue that this is an swiotlb problem, because without swiotlb
      we leak all zeros, and the swiotlb should be transparent in a sense that
      it does not affect the outcome (if all other participants are well
      behaved).
      
      Copying the content of the original buffer into the swiotlb buffer is
      the only way I can think of to make swiotlb transparent in such
      scenarios. So let's do just that if in doubt, but allow the driver
      to tell us that the whole mapped buffer is going to be overwritten,
      in which case we can preserve the old behavior and avoid the performance
      impact of the extra bounce.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [OP: backport to 4.14: apply swiotlb_tbl_map_single() changes in lib/swiotlb.c]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c132f2ba
    • Colin Ian King's avatar
      xprtrdma: fix incorrect header size calculations · ca6226b5
      Colin Ian King authored
      commit 91228844 upstream.
      
      Currently the header size calculations are using an assignment
      operator instead of a += operator when accumulating the header
      size leading to incorrect sizes.  Fix this by using the correct
      operator.
      
      Addresses-Coverity: ("Unused value")
      Fixes: 302d3deb
      
       ("xprtrdma: Prevent inline overflow")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca6226b5
    • Christian Borntraeger's avatar
      s390/mm: use non-quiescing sske for KVM switch to keyed guest · 26b31915
      Christian Borntraeger authored
      commit 3ae11dbc
      
       upstream.
      
      The switch to a keyed guest does not require a classic sske as the other
      guest CPUs are not accessing the key before the switch is complete.
      By using the NQ SSKE things are faster especially with multiple guests.
      
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Suggested-by: default avatarJanis Schoetterl-Glausch <scgl@linux.ibm.com>
      Reviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220530092706.11637-3-borntraeger@linux.ibm.com
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26b31915
    • James Chapman's avatar
      l2tp: fix race in pppol2tp_release with session object destroy · 267b8fa3
      James Chapman authored
      commit d02ba2a6 upstream.
      
      pppol2tp_release uses call_rcu to put the final ref on its socket. But
      the session object doesn't hold a ref on the session socket so may be
      freed while the pppol2tp_put_sk RCU callback is scheduled. Fix this by
      having the session hold a ref on its socket until the session is
      destroyed. It is this ref that is dropped via call_rcu.
      
      Sessions are also deleted via l2tp_tunnel_closeall. This must now also put
      the final ref via call_rcu. So move the call_rcu call site into
      pppol2tp_session_close so that this happens in both destroy paths. A
      common destroy path should really be implemented, perhaps with
      l2tp_tunnel_closeall calling l2tp_session_delete like pppol2tp_release
      does, but this will be looked at later.
      
      ODEBUG: activate active (active state 1) object type: rcu_head hint:           (null)
      WARNING: CPU: 3 PID: 13407 at lib/debugobjects.c:291 debug_print_object+0x166/0x220
      Modules linked in:
      CPU: 3 PID: 13407 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      RIP: 0010:debug_print_object+0x166/0x220
      RSP: 0018:ffff880013647a00 EFLAGS: 00010082
      RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff814d3333
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001a59f6d0
      RBP: ffff880013647a40 R08: 0000000000000000 R09: 0000000000000001
      R10: ffff8800136479a8 R11: 0000000000000000 R12: 0000000000000001
      R13: ffffffff86161420 R14: ffffffff85648b60 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020e77000 CR3: 0000000006022000 CR4: 00000000000006e0
      Call Trace:
       debug_object_activate+0x38b/0x530
       ? debug_object_assert_init+0x3b0/0x3b0
       ? __mutex_unlock_slowpath+0x85/0x8b0
       ? pppol2tp_session_destruct+0x110/0x110
       __call_rcu.constprop.66+0x39/0x890
       ? __call_rcu.constprop.66+0x39/0x890
       call_rcu_sched+0x17/0x20
       pppol2tp_release+0x2c7/0x440
       ? fcntl_setlk+0xca0/0xca0
       ? sock_alloc_file+0x340/0x340
       sock_release+0x92/0x1e0
       sock_close+0x1b/0x20
       __fput+0x296/0x6e0
       ____fput+0x1a/0x20
       task_work_run+0x127/0x1a0
       do_exit+0x7f9/0x2ce0
       ? SYSC_connect+0x212/0x310
       ? mm_update_next_owner+0x690/0x690
       ? up_read+0x1f/0x40
       ? __do_page_fault+0x3c8/0xca0
       do_group_exit+0x10d/0x330
       ? do_group_exit+0x330/0x330
       SyS_exit_group+0x22/0x30
       do_syscall_64+0x1e0/0x730
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x7f362e471259
      RSP: 002b:00007ffe389abe08 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f362e471259
      RDX: 00007f362e471259 RSI: 000000000000002e RDI: 0000000000000000
      RBP: 00007ffe389abe30 R08: 0000000000000000 R09: 00007f362e944270
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
      R13: 00007ffe389abf50 R14: 0000000000000000 R15: 0000000000000000
      Code: 8d 3c dd a0 8f 64 85 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 8b 14 dd a0 8f 64 85 4c 89 f6 48 c7 c7 20 85 64 85 e
      8 2a 55 14 ff <0f> 0b 83 05 ad 2a 68 04 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41
      
      Fixes: ee40fb2e
      
       ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Lee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      267b8fa3
    • James Chapman's avatar
      l2tp: don't use inet_shutdown on ppp session destroy · 357fa382
      James Chapman authored
      commit 225eb264 upstream.
      
      Previously, if a ppp session was closed, we called inet_shutdown to mark
      the socket as unconnected such that userspace would get errors and
      then close the socket. This could race with userspace closing the
      socket. Instead, leave userspace to close the socket in its own time
      (our session will be detached anyway).
      
      BUG: KASAN: use-after-free in inet_shutdown+0x5d/0x1c0
      Read of size 4 at addr ffff880010ea3ac0 by task syzbot_347bd5ac/8296
      
      CPU: 3 PID: 8296 Comm: syzbot_347bd5ac Not tainted 4.16.0-rc1+ #91
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      Call Trace:
       dump_stack+0x101/0x157
       ? inet_shutdown+0x5d/0x1c0
       print_address_description+0x78/0x260
       ? inet_shutdown+0x5d/0x1c0
       kasan_report+0x240/0x360
       __asan_load4+0x78/0x80
       inet_shutdown+0x5d/0x1c0
       ? pppol2tp_show+0x80/0x80
       pppol2tp_session_close+0x68/0xb0
       l2tp_tunnel_closeall+0x199/0x210
       ? udp_v6_flush_pending_frames+0x90/0x90
       l2tp_udp_encap_destroy+0x6b/0xc0
       ? l2tp_tunnel_del_work+0x2e0/0x2e0
       udpv6_destroy_sock+0x8c/0x90
       sk_common_release+0x47/0x190
       udp_lib_close+0x15/0x20
       inet_release+0x85/0xd0
       inet6_release+0x43/0x60
       sock_release+0x53/0x100
       ? sock_alloc_file+0x260/0x260
       sock_close+0x1b/0x20
       __fput+0x19f/0x380
       ____fput+0x1a/0x20
       task_work_run+0xd2/0x110
       exit_to_usermode_loop+0x18d/0x190
       do_syscall_64+0x389/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      RIP: 0033:0x7fe240a45259
      RSP: 002b:00007fe241132df8 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe240a45259
      RDX: 00007fe240a45259 RSI: 0000000000000000 RDI: 00000000000000a5
      RBP: 00007fe241132e20 R08: 00007fe241133700 R09: 0000000000000000
      R10: 00007fe241133700 R11: 0000000000000297 R12: 0000000000000000
      R13: 00007ffc49aff84f R14: 0000000000000000 R15: 00007fe241141040
      
      Allocated by task 8331:
       save_stack+0x43/0xd0
       kasan_kmalloc+0xad/0xe0
       kasan_slab_alloc+0x12/0x20
       kmem_cache_alloc+0x144/0x3e0
       sock_alloc_inode+0x22/0x130
       alloc_inode+0x3d/0xf0
       new_inode_pseudo+0x1c/0x90
       sock_alloc+0x30/0x110
       __sock_create+0xaa/0x4c0
       SyS_socket+0xbe/0x130
       do_syscall_64+0x128/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      
      Freed by task 8314:
       save_stack+0x43/0xd0
       __kasan_slab_free+0x11a/0x170
       kasan_slab_free+0xe/0x10
       kmem_cache_free+0x88/0x2b0
       sock_destroy_inode+0x49/0x50
       destroy_inode+0x77/0xb0
       evict+0x285/0x340
       iput+0x429/0x530
       dentry_unlink_inode+0x28c/0x2c0
       __dentry_kill+0x1e3/0x2f0
       dput.part.21+0x500/0x560
       dput+0x24/0x30
       __fput+0x2aa/0x380
       ____fput+0x1a/0x20
       task_work_run+0xd2/0x110
       exit_to_usermode_loop+0x18d/0x190
       do_syscall_64+0x389/0x3b0
       entry_SYSCALL_64_after_hwframe+0x26/0x9b
      
      Fixes: fd558d18
      
       ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Lee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      357fa382
    • Zhang Yi's avatar
      ext4: add reserved GDT blocks check · 0dc2fca8
      Zhang Yi authored
      commit b55c3cd1
      
       upstream.
      
      We capture a NULL pointer issue when resizing a corrupt ext4 image which
      is freshly clear resize_inode feature (not run e2fsck). It could be
      simply reproduced by following steps. The problem is because of the
      resize_inode feature was cleared, and it will convert the filesystem to
      meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
      not reduced to zero, so could we mistakenly call reserve_backup_gdb()
      and passing an uninitialized resize_inode to it when adding new group
      descriptors.
      
       mkfs.ext4 /dev/sda 3G
       tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
       mount /dev/sda /mnt
       resize2fs /dev/sda 8G
      
       ========
       BUG: kernel NULL pointer dereference, address: 0000000000000028
       CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
       ...
       RIP: 0010:ext4_flex_group_add+0xe08/0x2570
       ...
       Call Trace:
        <TASK>
        ext4_resize_fs+0xbec/0x1660
        __ext4_ioctl+0x1749/0x24e0
        ext4_ioctl+0x12/0x20
        __x64_sys_ioctl+0xa6/0x110
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f2dd739617b
       ========
      
      The fix is simple, add a check in ext4_resize_begin() to make sure that
      the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
      disabled.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <ritesh.list@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dc2fca8
    • Ding Xiang's avatar
      ext4: make variable "count" signed · 984ceb2f
      Ding Xiang authored
      commit bc75a6eb upstream.
      
      Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
      be a signed integer so we can correctly check for an error code returned
      by dx_make_map().
      
      Fixes: 46c116b9
      
       ("ext4: verify dir block before splitting it")
      Cc: stable@kernel.org
      Signed-off-by: default avatarDing Xiang <dingxiang@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      984ceb2f
    • Baokun Li's avatar
      ext4: fix bug_on ext4_mb_use_inode_pa · 6880fb2e
      Baokun Li authored
      commit a08f789d upstream.
      
      Hulk Robot reported a BUG_ON:
      ==================================================================
      kernel BUG at fs/ext4/mballoc.c:3211!
      [...]
      RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
      [...]
      Call Trace:
       ext4_mb_new_blocks+0x9df/0x5d30
       ext4_ext_map_blocks+0x1803/0x4d80
       ext4_map_blocks+0x3a4/0x1a10
       ext4_writepages+0x126d/0x2c30
       do_writepages+0x7f/0x1b0
       __filemap_fdatawrite_range+0x285/0x3b0
       file_write_and_wait_range+0xb1/0x140
       ext4_sync_file+0x1aa/0xca0
       vfs_fsync_range+0xfb/0x260
       do_fsync+0x48/0xa0
      [...]
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      do_fsync
       vfs_fsync_range
        ext4_sync_file
         file_write_and_wait_range
          __filemap_fdatawrite_range
           do_writepages
            ext4_writepages
             mpage_map_and_submit_extent
              mpage_map_one_extent
               ext4_map_blocks
                ext4_mb_new_blocks
                 ext4_mb_normalize_request
                  >>> start + size <= ac->ac_o_ex.fe_logical
                 ext4_mb_regular_allocator
                  ext4_mb_simple_scan_group
                   ext4_mb_use_best_found
                    ext4_mb_new_preallocation
                     ext4_mb_new_inode_pa
                      ext4_mb_use_inode_pa
                       >>> set ac->ac_b_ex.fe_len <= 0
                 ext4_mb_mark_diskspace_used
                  >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
      
      we can easily reproduce this problem with the following commands:
      	`fallocate -l100M disk`
      	`mkfs.ext4 -b 1024 -g 256 disk`
      	`mount disk /mnt`
      	`fsstress -d /mnt -l 0 -n 1000 -p 1`
      
      The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
      Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
      when the size is truncated. So start should be the start position of
      the group where ac_o_ex.fe_logical is located after alignment.
      In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
      is very large, the value calculated by start_off is more accurate.
      
      Cc: stable@kernel.org
      Fixes: cd648b8a
      
       ("ext4: trim allocation requests to group size")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6880fb2e