Skip to content
  1. Aug 05, 2017
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 65f4740e
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "This comes a bit later than I planned, and as a consequence is a
        larger than it should be.
      
        Most of the changes are devicetree fixes, across lots of platforms:
        Renesas, Samsung Exynos, Marvell EBU, TI OMAP, Rockchips, Amlogic
        Meson, Sigma Desings Tango, Allwinner SUNxi and TI Davinci.
      
        Also across many platforms, I applied an older series of simple
        randconfig build fixes. This includes making the CONFIG_MTD_XIP option
        compile again, which had been broken for many years and probably has
        not been missed, but it felt wrong to just remove it completely.
      
        The only other changes are:
      
         - We enable HWSPINLOCK in defconfig to get some Qualcomm boards to
           work out of the box.
      
         - A few regression fixes for Texas Instruments OMAP2+.
      
         - A boot regression fix for the Renesas regulator quirk.
      
         - A suspend/resume fix for Uniphier SoCs, fixing the resume of the
           system bus"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
        ARM: dts: tango4: Request RGMII RX and TX clock delays
        bus: uniphier-system-bus: set up registers when resuming
        ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge
        ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk
        arm64: defconfig: enable missing HWSPINLOCK
        ARM: pxa: select both FB and FB_W100 for eseries
        ARM: ixp4xx: fix ioport_unmap definition
        ARM: ep93xx: use ARM_PATCH_PHYS_VIRT correctly
        ARM: mmp: mark usb_dma_mask as __maybe_unused
        ARM: omap2: mark unused functions as __maybe_unused
        ARM: omap1: avoid unused variable warning
        ARM: sirf: mark sirfsoc_init_late as __maybe_unused
        ARM: ixp4xx: use normal prototype for {read,write}s{b,w,l}
        ARM: omap1/ams-delta: warn about failed regulator enable
        ARM: rpc: rename RAM_SIZE macro
        ARM: w90x900: normalize clk API
        ARM: ep93xx: normalize clk API
        ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros
        arm64: allwinner: sun50i-a64: Correct emac register size
        ARM: dts: sunxi: h3/h5: Correct emac register size
        ...
      65f4740e
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b3c6858f
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here are some more arm64 fixes for 4.13. The main one is the PTE race
        with the hardware walker, but there are a couple of other things too.
      
         - Report correct timer frequency to userspace when trapping
           CNTFRQ_EL0
      
         - Fix race with hardware page table updates when updating access
           flags
      
         - Silence clang overflow warning in VA_START and PAGE_OFFSET
           calculations"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: avoid overflow in VA_START and PAGE_OFFSET
        arm64: Fix potential race with hardware DBM in ptep_set_access_flags()
        arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0
      b3c6858f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 0a23ea65
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
      
       - block interrupts properly across the entire MMU context change (both
         the hw MMU context change and the TSB table change) so that we don't
         get a perf event interrupt in the middle. From Rob Gardner.
      
       - be sure to register hugepages early enough, from Nitin Gupta.
      
       - UltraSPARC-III user copy exception handling would return garbage for
         the copied length in some circumstances.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix exception handling in UltraSPARC-III memcpy.
        sbus: Convert to using %pOF instead of full_name
        sparc: defconfig: Cleanup from old Kconfig options
        sparc64: Register hugepages during arch init
        sparc64: Prevent perf from running during super critical sections
      0a23ea65
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client · c63716ab
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A bunch of fixes and follow-ups for -rc1 Luminous patches: issues with
        ->reencode_message() and last minute RADOS semantic changes in
        v12.1.2"
      
      * tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client:
        libceph: make RECOVERY_DELETES feature create a new interval
        libceph: upmap semantic changes
        crush: assume weight_set != null imples weight_set_size > 0
        libceph: fallback for when there isn't a pool-specific choose_arg
        libceph: don't call ->reencode_message() more than once per message
        libceph: make encode_request_*() work with r_mempool requests
      c63716ab
    • Linus Torvalds's avatar
      Merge tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a64c40e7
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Now we hit the usual ASoC-fix-flood in the middle of release.
      
        Most of the changes are trivial and device-specific, while one
        significant change is the fix for unbalanced of_graph_*() refcounts.
        This involved a change in the graph API itself that had been a bit
        messy"
      
      * tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Fix speaker output from VAIO VPCL14M1R
        device property: Fix usecount for of_graph_get_port_parent()
        ASoC: rt5665: fix wrong register for bclk ratio control
        ASoC: Intel: Use MCLK instead of BLCK as the sysclock for RT5514 codec on kabylake platform
        ASoC: Intel: Enabling ASRC for RT5663 codec on kabylake platform
        ASoC: codecs: msm8916-analog: fix DIG_CLK_CTL_RXD3_CLK_EN define
        ASoC: Intel: Skylake: Fix missing sentinels in sst_acpi_mach
        ASoC: sh: hac: add missing "int ret"
        ASoC: samsung: odroid: Fix EPLL frequency values
        ASoC: sgtl5000: Use snd_soc_kcontrol_codec()
        ASoC: rt5665: fix GPIO6 pin function define
        ASoC: ux500: Restore platform DAI assignments
        ASoC: fix pcm-creation regression
        ASoC: do not close shared backend dailink
        ASoC: pxa: SND_PXA2XX_SOC should depend on HAS_DMA
        ASoC: Intel: Skylake: Fix default dma_buffer_size
        ASoC: rt5663: Update the HW default values based on the shipping version
        ASoC: imx-ssi: add check on platform_get_irq return value
      a64c40e7
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 83b89ea4
      Linus Torvalds authored
      Pull IOMMU fixes from Joerg Roedel:
      
       - fix a scheduling-while-atomic bug in the AMD IOMMU driver. It was
         found after the checker was enabled earlier.
      
       - a fix for the virtual APIC code in the AMD IOMMU driver which
         delivers device interrupts directly into KVM guests for assigned
         devices.
      
       - fixes for the recently merged lock-less page-table code for ARM. The
         redundant TLB syncs got reverted and locks added again around the TLB
         sync code.
      
       - fix for error handling in arm_smmu_add_device()
      
       - address sanitization fix for arm io-pgtable code
      
      * tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Fix schedule-while-atomic BUG in initialization code
        iommu/amd: Enable ga_log_intr when enabling guest_mode
        iommu/io-pgtable: Sanitise map/unmap addresses
        iommu/arm-smmu: Fix the error path in arm_smmu_add_device
        Revert "iommu/io-pgtable: Avoid redundant TLB syncs"
        iommu/mtk: Avoid redundant TLB syncs locally
        iommu/arm-smmu: Reintroduce locking around TLB sync operations
      83b89ea4
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 8145f373
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "A couple of mmc fixes intended for v4.13-rc4.
      
        MMC core:
         - Fix NULL pointer dereference for block I/O during hotplug
      
        MMC host:
         - sdhci-of-at91: Fix card detect for non-removable cards"
      
      * tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: block: bypass the queue even if usage is present for hotplug
        mmc: sdhci-of-at91: force card detect value for non removable devices
      8145f373
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux · 47d47585
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Either my email ate everything or everyone is on holidays, either way
        all I can find is some lonely AMD fixes"
      
      [ Europe might be on vacation, and the Pacific NW is too hot for work. ]
      
      * tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux:
        drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
        drm/amdgpu: Fix undue fallthroughs in golden registers initialization
        drm/amdgpu: fix header on gfx9 clear state
      47d47585
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 841fe953
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fixes for recently merged code:
         - a fix for the _PAGE_DEVMAP support, which was breaking KVM on
           Power9 radix
         - avoid a (harmless) lockdep warning in the early SMP code
         - return failure for some uses of dma_set_mask() rather than falling
           back to 32-bits
         - fix stack setup in watchdog soft_nmi_common() to use emergency
           stack
         - fix of_irq_to_resource() error check in of_fsl_spi_probe()
      
        Two fixes going to stable:
         - fix saving of Transactional Memory SPRs in core dump
         - fix __check_irq_replay missing decrementer interrupt
      
        And two misc:
         - fix 64-bit boot wrapper build with non-biarch compiler
         - work around a POWER9 PMU hang after state-loss idle
      
        Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
        Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
        O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"
      
      * tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64: Fix __check_irq_replay missing decrementer interrupt
        powerpc/perf: POWER9 PMU stops after idle workaround
        powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
        powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
        powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
        powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
        powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
        powerpc/tm: Fix saving of TM SPRs in core dump
        powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
      841fe953
    • David S. Miller's avatar
      sparc64: Fix exception handling in UltraSPARC-III memcpy. · 0ede1c40
      David S. Miller authored
      
      
      Mikael Pettersson reported that some test programs in the strace-4.18
      testsuite cause an OOPS.
      
      After some debugging it turns out that garbage values are returned
      when an exception occurs, causing the fixup memset() to be run with
      bogus arguments.
      
      The problem is that two of the exception handler stubs write the
      successfully copied length into the wrong register.
      
      Fixes: ee841d0a ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
      Reported-by: default avatarMikael Pettersson <mikpelinux@gmail.com>
      Tested-by: default avatarMikael Pettersson <mikpelinux@gmail.com>
      Reviewed-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ede1c40
  2. Aug 04, 2017
  3. Aug 03, 2017
    • Shawn Lin's avatar
      mmc: block: bypass the queue even if usage is present for hotplug · 7c84b8b4
      Shawn Lin authored
      
      
      The commit 304419d8 ("mmc: core: Allocate per-request data using the
      block layer core") refactored mechanism of queue handling caused
      mmc_init_request() can be called just after mmc_cleanup_queue() caused null
      pointer dereference.
      
      Another commit bbdc74dc ("mmc: block: Prevent new req entering queue
      after its cleanup") tried to fix the problem. However it actually miss one
      corner case.
      
      We could still reproduce the issue mentioned with these steps:
      (1) insert a SD card and mount it
      (2) hotplug it, so it will leave md->usage still be counted
      (3) reboot the system which will sync data and umount the card
      
      [Unable to handle kernel NULL pointer dereference at virtual address
      00000000
      [user pgtable: 4k pages, 48-bit VAs, pgd = ffff80007bab3000
      [[0000000000000000] *pgd=000000007a828003, *pud=0000000078dce003,
      *pmd=000000007aab6003, *pte=0000000000000000
      [Internal error: Oops: 96000007 [#1] PREEMPT SMP
      [Modules linked in:
      [CPU: 3 PID: 3507 Comm: umount Tainted: G        W
      4.13.0-rc1-next-20170720-00012-g9d9bf45 #33
      [Hardware name: Firefly-RK3399 Board (DT)
      [task: ffff80007a1de200 task.stack: ffff80007a01c000
      [PC is at mmc_init_request+0x14/0xc4
      [LR is at alloc_request_size+0x4c/0x74
      [pc : [<ffff0000087d7150>] lr : [<ffff000008378fe0>] pstate: 600001c5
      [sp : ffff80007a01f8f0
      
      ....
      
      [[<ffff0000087d7150>] mmc_init_request+0x14/0xc4
      [[<ffff000008378fe0>] alloc_request_size+0x4c/0x74
      [[<ffff00000817ac28>] mempool_create_node+0xb8/0x17c
      [[<ffff00000837aadc>] blk_init_rl+0x9c/0x120
      [[<ffff000008396580>] blkg_alloc+0x110/0x234
      [[<ffff000008396ac8>] blkg_create+0x424/0x468
      [[<ffff00000839877c>] blkg_lookup_create+0xd8/0x14c
      [[<ffff0000083796bc>] generic_make_request_checks+0x368/0x3b0
      [[<ffff00000837b050>] generic_make_request+0x1c/0x240
      
      So mmc_blk_put wouldn't calling blk_cleanup_queue which actually the
      QUEUE_FLAG_DYING and QUEUE_FLAG_BYPASS should stay. Block core expect
      blk_queue_bypass_{start, end} internally to bypass/drain the queue before
      actually dying the queue, so it didn't expose API to set the queue bypass.
      I think we should set QUEUE_FLAG_BYPASS whenever queue is removed, although
      the md->usage is still counted, as no dispatch queue could be found then.
      
      Fixes: 304419d8 ("mmc: core: Allocate per-request data using the block layer core")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      7c84b8b4
    • Ludovic Desroches's avatar
      mmc: sdhci-of-at91: force card detect value for non removable devices · 7a1e3f14
      Ludovic Desroches authored
      
      
      When the device is non removable, the card detect signal is often used
      for another purpose i.e. muxed to another SoC peripheral or used as a
      GPIO. It could lead to wrong behaviors depending the default value of
      this signal if not muxed to the SDHCI controller.
      
      Fixes: bb5f8ea4 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      7a1e3f14
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs · 19ec50a4
      Linus Torvalds authored
      Pull NFS client fixes from Anna Schumaker:
       "Two fixes from Trond this time, now that he's back from his vacation.
        The first is a stable fix for the EXCHANGE_ID issue on the mailing
        list, and the other fixes a double-free situation that he found at the
        same time.
      
        Stable fix:
         - Fix EXCHANGE_ID corrupt verifier issue
      
        Other fix:
         - Fix double frees in nfs4_test_session_trunk()"
      
      * tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFSv4: Fix double frees in nfs4_test_session_trunk()
        NFSv4: Fix EXCHANGE_ID corrupt verifier issue
      19ec50a4
    • Annie Cherkaev's avatar
      isdn/i4l: fix buffer overflow · 9f5af546
      Annie Cherkaev authored
      
      
      This fixes a potential buffer overflow in isdn_net.c caused by an
      unbounded strcpy.
      
      [ ISDN seems to be effectively unmaintained, and the I4L driver in
        particular is long deprecated, but in case somebody uses this..
          - Linus ]
      
      Signed-off-by: default avatarJiten Thakkar <jitenmt@gmail.com>
      Signed-off-by: default avatarAnnie Cherkaev <annie.cherk@gmail.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f5af546
    • Jan Kara's avatar
      ocfs2: don't clear SGID when inheriting ACLs · 19ec8e48
      Jan Kara authored
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0').  However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
      into ocfs2_iop_set_acl().  That way the function will not be called when
      inheriting ACLs which is what we want as it prevents SGID bit clearing
      and the mode has been properly set by posix_acl_create() anyway.  Also
      posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
      mode itself.
      
      Fixes: 07393101 ("posix_acl: Clear SGID bit when setting file permissions")
      Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
      
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      19ec8e48
    • Kan Liang's avatar
      mm: allow page_cache_get_speculative in interrupt context · 1ee1c3f5
      Kan Liang authored
      Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
      handler.
      
      The bug was introduced by commit 2947ba05 ("x86/mm/gup: Switch GUP
      to the generic get_user_page_fast() implementation").
      
      The original x86 __get_user_page_fast used plain get_page() or
      page_ref_add().  However, the generic __get_user_page_fast uses
      page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).
      
      There is no reason to prevent page_cache_get_speculative from using in
      interrupt context.  According to the author, putting a BUG_ON there is
      just because the code is not verifying correctness of interrupt races.
      I did some tests in interrupt context.  There is no issue found.
      
      Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().
      
      Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
      
      
      Fixes: 2947ba05 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ee1c3f5
    • Mike Rapoport's avatar
      userfaultfd: non-cooperative: flush event_wqh at release time · 5a18b64e
      Mike Rapoport authored
      There may still be threads waiting on event_wqh at the time the
      userfault file descriptor is closed.  Flush the events wait-queue to
      prevent waiting threads from hanging.
      
      Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
      
      
      Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report
      non-PF events from uffd descriptor")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a18b64e
    • Kees Cook's avatar
      ipc: add missing container_of()s for randstruct · ade9f91b
      Kees Cook authored
      When building with the randstruct gcc plugin, the layout of the IPC
      structs will be randomized, which requires any sub-structure accesses to
      use container_of().  The proc display handlers were missing the needed
      container_of()s since the iterator is passing in the top-level struct
      kern_ipc_perm.
      
      This would lead to crashes when running the "lsipc" program after the
      system had IPC registered (e.g. after starting up Gnome):
      
        general protection fault: 0000 [#1] PREEMPT SMP
        ...
        RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
        ...
        Call Trace:
          sysvipc_shm_proc_show+0x5e/0x150
          sysvipc_proc_show+0x1a/0x30
          seq_read+0x2e9/0x3f0
        ...
      
      Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
      
      
      Fixes: 3859a271 ("randstruct: Mark various structs for randomization")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ade9f91b
    • Dima Zavin's avatar
      cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() · 89affbf5
      Dima Zavin authored
      In codepaths that use the begin/retry interface for reading
      mems_allowed_seq with irqs disabled, there exists a race condition that
      stalls the patch process after only modifying a subset of the
      static_branch call sites.
      
      This problem manifested itself as a deadlock in the slub allocator,
      inside get_any_partial.  The loop reads mems_allowed_seq value (via
      read_mems_allowed_begin), performs the defrag operation, and then
      verifies the consistency of mem_allowed via the read_mems_allowed_retry
      and the cookie returned by xxx_begin.
      
      The issue here is that both begin and retry first check if cpusets are
      enabled via cpusets_enabled() static branch.  This branch can be
      rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
      x86 jump label code fully synchronizes across all CPUs for every entry
      it rewrites.  If it rewrites only one of the callsites (specifically the
      one in read_mems_allowed_retry) and then waits for the
      smp_call_function(do_sync_core) to complete while a CPU is inside the
      begin/retry section with IRQs off and the mems_allowed value is changed,
      we can hang.
      
      This is because begin() will always return 0 (since it wasn't patched
      yet) while retry() will test the 0 against the actual value of the seq
      counter.
      
      The fix is to use two different static keys: one for begin
      (pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
      first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
      always return a valid seqcount if are enabling cpusets.  Similarly, when
      disabling cpusets via cpuset_dec(), we first ensure that callers of
      cpuset_mems_allowed_retry() will start ignoring the seqcount value
      before we let cpuset_mems_allowed_begin() return 0.
      
      The relevant stack traces of the two stuck threads:
      
        CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
        RIP: smp_call_function_many+0x1f9/0x260
        Call Trace:
          smp_call_function+0x3b/0x70
          on_each_cpu+0x2f/0x90
          text_poke_bp+0x87/0xd0
          arch_jump_label_transform+0x93/0x100
          __jump_label_update+0x77/0x90
          jump_label_update+0xaa/0xc0
          static_key_slow_inc+0x9e/0xb0
          cpuset_css_online+0x70/0x2e0
          online_css+0x2c/0xa0
          cgroup_apply_control_enable+0x27f/0x3d0
          cgroup_mkdir+0x2b7/0x420
          kernfs_iop_mkdir+0x5a/0x80
          vfs_mkdir+0xf6/0x1a0
          SyS_mkdir+0xb7/0xe0
          entry_SYSCALL_64_fastpath+0x18/0xad
      
        ...
      
        CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8818087c0000 task.stack: ffffc90000030000
        RIP: int3+0x39/0x70
        Call Trace:
          <#DB> ? ___slab_alloc+0x28b/0x5a0
          <EOE> ? copy_process.part.40+0xf7/0x1de0
          __slab_alloc.isra.80+0x54/0x90
          copy_process.part.40+0xf7/0x1de0
          copy_process.part.40+0xf7/0x1de0
          kmem_cache_alloc_node+0x8a/0x280
          copy_process.part.40+0xf7/0x1de0
          _do_fork+0xe7/0x6c0
          _raw_spin_unlock_irq+0x2d/0x60
          trace_hardirqs_on_caller+0x136/0x1d0
          entry_SYSCALL_64_fastpath+0x5/0xad
          do_syscall_64+0x27/0x350
          SyS_clone+0x19/0x20
          do_syscall_64+0x60/0x350
          entry_SYSCALL64_slow_path+0x25/0x25
      
      Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
      
      
      Fixes: 46e700ab ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
      Signed-off-by: default avatarDima Zavin <dmitriyz@waymo.com>
      Reported-by: default avatarCliff Spradlin <cspradlin@waymo.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89affbf5