Skip to content
  1. Aug 22, 2020
  2. Aug 20, 2020
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio · 7eac66d0
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Fix lockdep issue reported for recursive read-lock (Alex Williamson)
      
       - Fix missing unwind in type1 replay function (Alex Williamson)
      
      * tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Add proper error unwind for vfio_iommu_replay()
        vfio-pci: Avoid recursive read-lock usage
      7eac66d0
    • Arvind Sankar's avatar
      lib/string.c: Use freestanding environment · 33d0f96f
      Arvind Sankar authored
      
      
      gcc can transform the loop in a naive implementation of memset/memcpy
      etc into a call to the function itself.  This optimization is enabled by
      -ftree-loop-distribute-patterns.
      
      This has been the case for a while, but gcc-10.x enables this option at
      -O2 rather than -O3 as in previous versions.
      
      Add -ffreestanding, which implicitly disables this optimization with
      gcc.  It is unclear whether clang performs such optimizations, but
      hopefully it will also not do so in a freestanding environment.
      
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33d0f96f
    • Arvind Sankar's avatar
      x86/boot/compressed: Use builtin mem functions for decompressor · 394b19d6
      Arvind Sankar authored
      
      
      Since commits
      
        c041b5ad ("x86, boot: Create a separate string.h file to provide standard string functions")
        fb4cac57 ("x86, boot: Move memcmp() into string.h and string.c")
      
      the decompressor stub has been using the compiler's builtin memcpy,
      memset and memcmp functions, _except_ where it would likely have the
      largest impact, in the decompression code itself.
      
      Remove the #undef's of memcpy and memset in misc.c so that the
      decompressor code also uses the compiler builtins.
      
      The rationale given in the comment doesn't really apply: just because
      some functions use the out-of-line version is no reason to not use the
      builtin version in the rest.
      
      Replace the comment with an explanation of why memzero and memmove are
      being #define'd.
      
      Drop the suggestion to #undef in boot/string.h as well: the out-of-line
      versions are not really optimized versions, they're generic code that's
      good enough for the preboot environment. The compiler will likely
      generate better code for constant-size memcpy/memset/memcmp if it is
      allowed to.
      
      Most decompressors' performance is unchanged, with the exception of LZ4
      and 64-bit ZSTD.
      
      	Before	After ARCH
      LZ4	  73ms	 10ms   32
      LZ4	 120ms	 10ms	64
      ZSTD	  90ms	 74ms	64
      
      Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels.
      
      Decompressor code size has small differences, with the largest being
      that 64-bit ZSTD decreases just over 2k. The largest code size increase
      was on 64-bit XZ, of about 400 bytes.
      
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Suggested-by: default avatarNick Terrell <nickrterrell@gmail.com>
      Tested-by: default avatarNick Terrell <nickrterrell@gmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      394b19d6
  3. Aug 19, 2020
  4. Aug 18, 2020
    • Linus Torvalds's avatar
      Merge tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 06a4ec1d
      Linus Torvalds authored
      Pull mailmap update from Kees Cook:
       "This was originally part of my pstore tree, but when I realized that
        mailmap needed re-alphabetizing, I decided to wait until -rc1 to send
        this, as I saw a lot of mailmap additions pending in -next for the
        merge window.
      
        It's a programmatic reordering and the addition of a pstore
        contributor's preferred email address"
      
      * tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mailmap: Add WeiXiong Liao
        mailmap: Restore dictionary sorting
      06a4ec1d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4cf75621
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Another batch of fixes:
      
        1) Remove nft_compat counter flush optimization, it generates warnings
           from the refcount infrastructure. From Florian Westphal.
      
        2) Fix BPF to search for build id more robustly, from Jiri Olsa.
      
        3) Handle bogus getopt lengths in ebtables, from Florian Westphal.
      
        4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and
           Oleksij Rempel.
      
        5) Reset iter properly on mptcp sendmsg() error, from Florian
           Westphal.
      
        6) Show a saner speed in bonding broadcast mode, from Jarod Wilson.
      
        7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones.
      
        8) Fix double unregister in bonding during namespace tear down, from
           Cong Wang.
      
        9) Disable RP filter during icmp_redirect selftest, from David Ahern"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
        otx2_common: Use devm_kcalloc() in otx2_config_npa()
        net: qrtr: fix usage of idr in port assignment to socket
        selftests: disable rp_filter for icmp_redirect.sh
        Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"
        phylink: <linux/phylink.h>: fix function prototype kernel-doc warning
        mptcp: sendmsg: reset iter on error redux
        net: devlink: Remove overzealous WARN_ON with snapshots
        tipc: not enable tipc when ipv6 works as a module
        tipc: fix uninit skb->data in tipc_nl_compat_dumpit()
        net: Fix potential wrong skb->protocol in skb_vlan_untag()
        net: xdp: pull ethernet header off packet after computing skb->protocol
        ipvlan: fix device features
        bonding: fix a potential double-unregister
        can: j1939: add rxtimer for multipacket broadcast session
        can: j1939: abort multipacket broadcast session when timeout occurs
        can: j1939: cancel rxtimer on multipacket broadcast session complete
        can: j1939: fix support for multipacket broadcast message
        net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs'
        net: fddi: skfp: cfm: Remove set but unused variable 'oldstate'
        net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs'
        ...
      4cf75621
    • vulab's avatar
      otx2_common: Use devm_kcalloc() in otx2_config_npa() · bf2bcd6f
      vulab authored
      
      
      A multiplication for the size determination of a memory allocation
      indicated that an array data structure should be processed.
      Thus use the corresponding function "devm_kcalloc".
      
      Signed-off-by: default avatarXu Wang <vulab@iscas.ac.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf2bcd6f
    • Necip Fazil Yildiran's avatar
      net: qrtr: fix usage of idr in port assignment to socket · 8dfddfb7
      Necip Fazil Yildiran authored
      
      
      Passing large uint32 sockaddr_qrtr.port numbers for port allocation
      triggers a warning within idr_alloc() since the port number is cast
      to int, and thus interpreted as a negative number. This leads to
      the rejection of such valid port numbers in qrtr_port_assign() as
      idr_alloc() fails.
      
      To avoid the problem, switch to idr_alloc_u32() instead.
      
      Fixes: bdabad3e ("net: Add Qualcomm IPC router")
      Reported-by: default avatar <syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNecip Fazil Yildiran <necip@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dfddfb7
    • David Ahern's avatar
      selftests: disable rp_filter for icmp_redirect.sh · bcf7ddb0
      David Ahern authored
      
      
      h1 is initially configured to reach h2 via r1 rather than the
      more direct path through r2. If rp_filter is set and inherited
      for r2, forwarding fails since the source address of h1 is
      reachable from eth0 vs the packet coming to it via r1 and eth1.
      Since rp_filter setting affects the test, explicitly reset it.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcf7ddb0
    • Kees Cook's avatar
      mailmap: Add WeiXiong Liao · 5a4fe062
      Kees Cook authored
      
      
      WeiXiong Liao noted to me offlist that his preference for email address
      had changed and that he'd like it updated in the mailmap so people
      discussing pstore/blk would be able to reach him.
      
      Cc: WeiXiong Liao <gmpy.liaowx@gmail.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      5a4fe062
    • Kees Cook's avatar
      mailmap: Restore dictionary sorting · d6bd5201
      Kees Cook authored
      
      
      Several names had been recently appended (instead of inserted). While
      git-shortlog doesn't need this file to be sorted, it helps humans to
      keep it organized this way. Sort the entire file (which includes some
      minor shuffling for dictionary order).
      
      Done with the following commands:
      
      	grep -E '^(#|$)' .mailmap > .mailmap.head
      	grep -Ev '^(#|$)' .mailmap > .mailmap.body
       	sort -f .mailmap.body > .mailmap.body.sort
      	cat .mailmap.head .mailmap.body.sort > .mailmap
      	rm .mailmap.head .mailmap.body.sort
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      d6bd5201
    • Jessica Clarke's avatar
      arch/ia64: Restore arch-specific pgd_offset_k implementation · bd05220c
      Jessica Clarke authored
      
      
      IA-64 is special and treats pgd_offset_k() differently to pgd_offset(),
      using different formulae to calculate the indices into the kernel and user
      PGDs.  The index into the user PGDs takes into account the region number,
      but the index into the kernel (init_mm) PGD always assumes a predefined
      kernel region number. Commit 974b9b2c ("mm: consolidate pte_index() and
      pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which
      incorrectly used pgd_index() for kernel page tables.  As a result, the
      index into the kernel PGD was going out of bounds and the kernel hung
      during early boot.
      
      Allow overrides of pgd_offset_k() and override it on IA-64 with the old
      implementation that will correctly index the kernel PGD.
      
      Fixes: 974b9b2c ("mm: consolidate pte_index() and pte_offset_*() definitions")
      Reported-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Signed-off-by: default avatarJessica Clarke <jrtc27@jrtc27.com>
      Tested-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      bd05220c
    • David S. Miller's avatar
      Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" · 7f9bf6e8
      David S. Miller authored
      
      
      This reverts commit f8414a8d.
      
      eth_type_trans() does the necessary pull on the skb.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f9bf6e8
    • Randy Dunlap's avatar
      phylink: <linux/phylink.h>: fix function prototype kernel-doc warning · 0b76e642
      Randy Dunlap authored
      
      
      Fix a kernel-doc warning for the pcs_config() function prototype:
      
      ../include/linux/phylink.h:406: warning: Excess function parameter 'permit_pause_to_mac' description in 'pcs_config'
      
      Fixes: 7137e18f ("net: phylink: add struct phylink_pcs")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b76e642
    • Alex Williamson's avatar
      vfio/type1: Add proper error unwind for vfio_iommu_replay() · aae7a75a
      Alex Williamson authored
      
      
      The vfio_iommu_replay() function does not currently unwind on error,
      yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma
      structure to indicate IOMMU mapping.  The IOMMU mappings are torn down
      when the domain is destroyed, but the other actions go on to cause
      trouble later.  For example, the iommu->domain_list can be empty if we
      only have a non-IOMMU backed mdev attached.  We don't currently check
      if the list is empty before getting the first entry in the list, which
      leads to a bogus domain pointer.  If a vfio_dma entry is erroneously
      marked as iommu_mapped, we'll attempt to use that bogus pointer to
      retrieve the existing physical page addresses.
      
      This is the scenario that uncovered this issue, attempting to hot-add
      a vfio-pci device to a container with an existing mdev device and DMA
      mappings, one of which could not be pinned, causing a failure adding
      the new group to the existing container and setting the conditions
      for a subsequent attempt to explode.
      
      To resolve this, we can first check if the domain_list is empty so
      that we can reject replay of a bogus domain, should we ever encounter
      this inconsistent state again in the future.  The real fix though is
      to add the necessary unwind support, which means cleaning up the
      current pinning if an IOMMU mapping fails, then walking back through
      the r-b tree of DMA entries, reading from the IOMMU which ranges are
      mapped, and unmapping and unpinning those ranges.  To be able to do
      this, we also defer marking the DMA entry as IOMMU mapped until all
      entries are processed, in order to allow the unwind to know the
      disposition of each entry.
      
      Fixes: a54eb550 ("vfio iommu type1: Add support for mediated devices")
      Reported-by: default avatarZhiyi Guo <zhguo@redhat.com>
      Tested-by: default avatarZhiyi Guo <zhguo@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      aae7a75a
    • Alex Williamson's avatar
      vfio-pci: Avoid recursive read-lock usage · bc93b9ae
      Alex Williamson authored
      
      
      A down_read on memory_lock is held when performing read/write accesses
      to MMIO BAR space, including across the copy_to/from_user() callouts
      which may fault.  If the user buffer for these copies resides in an
      mmap of device MMIO space, the mmap fault handler will acquire a
      recursive read-lock on memory_lock.  Avoid this by reducing the lock
      granularity.  Sequential accesses requiring multiple ioread/iowrite
      cycles are expected to be rare, therefore typical accesses should not
      see additional overhead.
      
      VGA MMIO accesses are expected to be non-fatal regardless of the PCI
      memory enable bit to allow legacy probing, this behavior remains with
      a comment added.  ioeventfds are now included in memory access testing,
      with writes dropped while memory space is disabled.
      
      Fixes: abafbc55 ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
      Reported-by: default avatarZhiyi Guo <zhguo@redhat.com>
      Tested-by: default avatarZhiyi Guo <zhguo@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      bc93b9ae
    • David Howells's avatar
      watch_queue: Limit the number of watches a user can hold · 29e44f45
      David Howells authored
      
      
      Impose a limit on the number of watches that a user can hold so that
      they can't use this mechanism to fill up all the available memory.
      
      This is done by putting a counter in user_struct that's incremented when
      a watch is allocated and decreased when it is released.  If the number
      exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN.
      
      This can be tested by the following means:
      
       (1) Create a watch queue and attach it to fd 5 in the program given - in
           this case, bash:
      
      	keyctl watch_session /tmp/nlog /tmp/gclog 5 bash
      
       (2) In the shell, set the maximum number of files to, say, 99:
      
      	ulimit -n 99
      
       (3) Add 200 keyrings:
      
      	for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done
      
       (4) Try to watch all of the keyrings:
      
      	for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done
      
           This should fail when the number of watches belonging to the user hits
           99.
      
       (5) Remove all the keyrings and all of those watches should go away:
      
      	for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done
      
       (6) Kill off the watch queue by exiting the shell spawned by
           watch_session.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29e44f45
  5. Aug 17, 2020
  6. Aug 16, 2020
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      
      
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96 ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56