Skip to content
  1. Aug 26, 2020
  2. Aug 21, 2020
    • Greg Kroah-Hartman's avatar
      v4.9.233
      8d71b611
    • Oscar Salvador's avatar
      mm: Avoid calling build_all_zonelists_init under hotplug context · 23feab18
      Oscar Salvador authored
      
      
      Recently a customer of ours experienced a crash when booting the
      system while enabling memory-hotplug.
      
      The problem is that Normal zones on different nodes don't get their private
      zone->pageset allocated, and keep sharing the initial boot_pageset.
      The sharing between zones is normally safe as explained by the comment for
      boot_pageset - it's a percpu structure, and manipulations are done with
      disabled interrupts, and boot_pageset is set up in a way that any page placed
      on its pcplist is immediately flushed to shared zone's freelist, because
      pcp->high == 1.
      However, the hotplug operation updates pcp->high to a higher value as it
      expects to be operating on a private pageset.
      
      The problem is in build_all_zonelists(), which is called when the first range
      of pages is onlined for the Normal zone of node X or Y:
      
      	if (system_state == SYSTEM_BOOTING) {
      		build_all_zonelists_init();
      	} else {
      	#ifdef CONFIG_MEMORY_HOTPLUG
      		if (zone)
      			setup_zone_pageset(zone);
      	#endif
      		/* we have to stop all cpus to guarantee there is no user
      		of zonelist */
      		stop_machine(__build_all_zonelists, pgdat, NULL);
      		/* cpuset refresh routine should be here */
      	}
      
      When called during hotplug, it should execute the setup_zone_pageset(zone)
      which allocates the private pageset.
      However, with memhp_default_state=online, this happens early while
      system_state == SYSTEM_BOOTING is still true, hence this step is skipped.
      (and build_all_zonelists_init() is probably unsafe anyway at this point).
      
      Another hotplug operation on the same zone then leads to zone_pcp_update(zone)
      called from online_pages(), which updates the pcp->high for the shared
      boot_pageset to a value higher than 1.
      At that point, pages freed from Node X and Y Normal zones can end up on the same
      pcplist and from there they can be freed to the wrong zone's freelist,
      leading to the corruption and crashes.
      
      Please, note that upstream has fixed that differently (and unintentionally) by
      adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init().
      That should happen before memory hotplug events even with memhp_default_state=online.
      Backporting that would be too intrusive.
      
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Debugged-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: Michal Hocko <mhocko@suse.com> # for stable trees
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23feab18
    • Hugh Dickins's avatar
      khugepaged: retract_page_tables() remember to test exit · dc3ff4f6
      Hugh Dickins authored
      commit 18e77600 upstream.
      
      Only once have I seen this scenario (and forgot even to notice what forced
      the eventual crash): a sequence of "BUG: Bad page map" alerts from
      vm_normal_page(), from zap_pte_range() servicing exit_mmap();
      pmd:00000000, pte values corresponding to data in physical page 0.
      
      The pte mappings being zapped in this case were supposed to be from a huge
      page of ext4 text (but could as well have been shmem): my belief is that
      it was racing with collapse_file()'s retract_page_tables(), found *pmd
      pointing to a page table, locked it, but *pmd had become 0 by the time
      start_pte was decided.
      
      In most cases, that possibility is excluded by holding mmap lock; but
      exit_mmap() proceeds without mmap lock.  Most of what's run by khugepaged
      checks khugepaged_test_exit() after acquiring mmap lock:
      khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
      for example.  But retract_page_tables() did not: fix that.
      
      The fix is for retract_page_tables() to check khugepaged_test_exit(),
      after acquiring mmap lock, before doing anything to the page table.
      Getting the mmap lock serializes with __mmput(), which briefly takes and
      drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
      mm_users makes sure we don't touch the page table once exit_mmap() might
      reach it, since exit_mmap() will be proceeding without mmap lock, not
      expecting anyone to be racing with it.
      
      Fixes: f3f0e1d2
      
       ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      
      dc3ff4f6
    • Geert Uytterhoeven's avatar
      sh: landisk: Add missing initialization of sh_io_port_base · 11b29ede
      Geert Uytterhoeven authored
      [ Upstream commit 0c64a0dc ]
      
      The Landisk setup code maps the CF IDE area using ioremap_prot(), and
      passes the resulting virtual addresses to the pata_platform driver,
      disguising them as I/O port addresses.  Hence the pata_platform driver
      translates them again using ioport_map().
      As CONFIG_GENERIC_IOMAP=n, and CONFIG_HAS_IOPORT_MAP=y, the
      SuperH-specific mapping code in arch/sh/kernel/ioport.c translates
      I/O port addresses to virtual addresses by adding sh_io_port_base, which
      defaults to -1, thus breaking the assumption of an identity mapping.
      
      Fix this by setting sh_io_port_base to zero.
      
      Fixes: 37b7a978
      
       ("sh: machvec IO death.")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarRich Felker <dalias@libc.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11b29ede
    • Dinghao Liu's avatar
      ALSA: echoaudio: Fix potential Oops in snd_echo_resume() · ca95679a
      Dinghao Liu authored
      [ Upstream commit 5a25de6d ]
      
      Freeing chip on error may lead to an Oops at the next time
      the system goes to resume. Fix this by removing all
      snd_echo_free() calls on error.
      
      Fixes: 47b5d028
      
       ("ALSA: Echoaudio - Add suspend support #2")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Link: https://lore.kernel.org/r/20200813074632.17022-1-dinghao.liu@zju.edu.cn
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca95679a
    • Andy Shevchenko's avatar
      mfd: dln2: Run event handler loop under spinlock · 2a428481
      Andy Shevchenko authored
      [ Upstream commit 3d858942 ]
      
      The event handler loop must be run with interrupts disabled.
      Otherwise we will have a warning:
      
      [ 1970.785649] irq 31 handler lineevent_irq_handler+0x0/0x20 enabled interrupts
      [ 1970.792739] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159 __handle_irq_event_percpu+0x162/0x170
      [ 1970.860732] RIP: 0010:__handle_irq_event_percpu+0x162/0x170
      ...
      [ 1970.946994] Call Trace:
      [ 1970.949446]  <IRQ>
      [ 1970.951471]  handle_irq_event_percpu+0x2c/0x80
      [ 1970.955921]  handle_irq_event+0x23/0x43
      [ 1970.959766]  handle_simple_irq+0x57/0x70
      [ 1970.963695]  generic_handle_irq+0x42/0x50
      [ 1970.967717]  dln2_rx+0xc1/0x210 [dln2]
      [ 1970.971479]  ? usb_hcd_unmap_urb_for_dma+0xa6/0x1c0
      [ 1970.976362]  __usb_hcd_giveback_urb+0x77/0xe0
      [ 1970.980727]  usb_giveback_urb_bh+0x8e/0xe0
      [ 1970.984837]  tasklet_action_common.isra.0+0x4a/0xe0
      ...
      
      Recently xHCI driver switched to tasklets in the commit 36dc0165
      ("usb: host: xhci: Support running urb giveback in tasklet context").
      
      The handle_irq_event_* functions are expected to be called with interrupts
      disabled and they rightfully complain here because we run in tasklet context
      with interrupts enabled.
      
      Use a event spinlock to protect event handler from being interrupted.
      
      Note, that there are only two users of this GPIO and ADC drivers and both of
      them are using generic_handle_irq() which makes above happen.
      
      Fixes: 338a1281
      
       ("mfd: Add support for Diolan DLN-2 devices")
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a428481