Skip to content
  1. Dec 13, 2015
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 373ccbe5
      Michal Hocko authored
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      
      
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      373ccbe5
    • Vlastimil Babka's avatar
      mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo · 475a2f90
      Vlastimil Babka authored
      Commit 016c13da ("mm, page_alloc: use masks and shifts when
      converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
      MIGRATE_RECLAIMABLE in the enum definition.  However, migratetype_names
      wasn't updated to reflect that.
      
      As a result, the file /proc/pagetypeinfo shows the counts for Movable as
      Reclaimable and vice versa.
      
      Additionally, commit 0aaa29a5 ("mm, page_alloc: reserve pageblocks
      for high-order atomic allocations on demand") introduced
      MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
      show_migration_types(), so it doesn't appear in the listing of free
      areas during page alloc failures or oom kills.
      
      This patch fixes both problems.  The atomic reserves will show with a
      letter 'H' in the free areas listings.
      
      Fixes: 016c13da ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types")
      Fixes: 0aaa29a5
      
       ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      475a2f90
    • Vladimir Davydov's avatar
      memcg: fix memory.high target · 9516a18a
      Vladimir Davydov authored
      
      
      When the memory.high threshold is exceeded, try_charge() schedules a
      task_work to reclaim the excess.  The reclaim target is set to the
      number of pages requested by try_charge().
      
      This is wrong, because try_charge() usually charges more pages than
      requested (batch > nr_pages) in order to refill per cpu stocks.  As a
      result, a process in a cgroup can easily exceed memory.high
      significantly when doing a lot of charges w/o returning to userspace
      (e.g.  reading a file in big chunks).
      
      Fix this issue by assuring that when exceeding memory.high a process
      reclaims as many pages as were actually charged (i.e.  batch).
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9516a18a
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · a88c7695
      Naoya Horiguchi authored
      
      
      When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
      alloc_buddy_huge_page() to directly create a hugepage from the buddy
      allocator.
      
      In that case, however, if alloc_buddy_huge_page() succeeds we don't
      decrement h->resv_huge_pages, which means that successful
      hugetlb_fault() returns without releasing the reserve count.  As a
      result, subsequent hugetlb_fault() might fail despite that there are
      still free hugepages.
      
      This patch simply adds decrementing code on that code path.
      
      I reproduced this problem when testing v4.3 kernel in the following situation:
       - the test machine/VM is a NUMA system,
       - hugepage overcommiting is enabled,
       - most of hugepages are allocated and there's only one free hugepage
         which is on node 0 (for example),
       - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
         node 1, tries to allocate a hugepage,
       - the allocation should fail but the reserve count is still hold.
      
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org> [3.16+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a88c7695
  2. Dec 12, 2015
  3. Dec 11, 2015
    • Kirill A. Shutemov's avatar
      vgaarb: fix signal handling in vga_get() · 9f5bd308
      Kirill A. Shutemov authored
      
      
      There are few defects in vga_get() related to signal hadning:
      
        - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
          case;
      
        - if we found pending signal we must remove ourself from wait queue
          and change task state back to running;
      
        - -ERESTARTSYS is more appropriate, I guess.
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      9f5bd308
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 49307da3
      Dave Airlie authored
      some big endian fixes and one regression fix.
      
      * 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux:
        radeon: Fix VCE IB test on Big-Endian systems
        radeon: Fix VCE ring test for Big-Endian systems
        radeon/cik: Fix GFX IB test on Big-Endian
        drm/amdgpu: fix the lost duplicates checking
      49307da3
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 0bd0f1e6
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "Most are minor to important fixes.
      
        There is one performance enhancement that I took on the grounds that
        failing to check if other processes can run before running what's
        intended to be a background, idle-time task is a bug, even though the
        primary effect of the fix is to improve performance (and it was a very
        simple patch)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/mlx5: Postpone remove_keys under knowledge of coming preemption
        IB/mlx4: Use vmalloc for WR buffers when needed
        IB/mlx4: Use correct order of variables in log message
        iser-target: Remove explicit mlx4 work-around
        mlx4: Expose correct max_sge_rd limit
        IB/mad: Require CM send method for everything except ClassPortInfo
        IB/cma: Add a missing rcu_read_unlock()
        IB core: Fix ib_sg_to_pages()
        IB/srp: Fix srp_map_sg_fr()
        IB/srp: Fix indirect data buffer rkey endianness
        IB/srp: Initialize dma_length in srp_map_idb
        IB/srp: Fix possible send queue overflow
        IB/srp: Fix a memory leak
        IB/sa: Put netlink request into the request list before sending
        IB/iser: use sector_div instead of do_div
        IB/core: use RCU for uverbs id lookup
        IB/qib: Minor fixes to qib per SFF 8636
        IB/core: Fix user mode post wr corruption
        IB/qib: Fix qib_mr structure
      0bd0f1e6
    • Linus Torvalds's avatar
      Merge tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a80c47da
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Again less intensive changes in this rc: you can find only a few
        HD-audio fixes (noise fixes for Intel Broxton chip and a few Thinkpad
        models, quirks for Alienware 17 and Packard Bell DOTS) in addition to
        a long-standing rme96 bug fix"
      
      * tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/ca0132 - quirk for Alienware 17 2015
        ALSA: hda - Fix noise problems on Thinkpad T440s
        ALSA: hda - Fixing speaker noise on the two latest thinkpad models
        ALSA: hda - Add inverted dmic for Packard Bell DOTS
        ALSA: hda - Fix playback noise with 24/32 bit sample size on BXT
        ALSA: rme96: Fix unexpected volume reset after rate changes
      a80c47da
  4. Dec 10, 2015
    • Joe Thornber's avatar
      dm btree: fix bufio buffer leaks in dm_btree_del() error path · ed8b45a3
      Joe Thornber authored
      
      
      If dm_btree_del()'s call to push_frame() fails, e.g. due to
      btree_node_validator finding invalid metadata, the dm_btree_del() error
      path must unlock all frames (which have active dm-bufio buffers) that
      were pushed onto the del_stack.
      
      Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
      buffers have leaked, e.g.:
        device-mapper: bufio: leaked buffer 3, hold count 1, list 0
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      ed8b45a3
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio · 6764e5eb
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Various fixes for removing redundancy, const'ifying structs, avoiding
         stack usage, fixing WARN usage (Krzysztof Kozlowski, Julia Lawall,
         Kees Cook, Dan Carpenter)
      
       - Revert No-IOMMU mode as the intended user has not emerged (Alex
         Williamson)
      
      * tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio:
        Revert: "vfio: Include No-IOMMU mode"
        vfio: fix a warning message
        vfio: platform: remove needless stack usage
        vfio-pci: constify pci_error_handlers structures
        vfio: Drop owner assignment from platform_driver
      6764e5eb
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-4.4-rc4' of... · eef121f4
      Linus Torvalds authored
      Merge tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
      
      Pull DT fixes from Rob Herring:
       "I think this should be all for 4.4:
      
         - Fix incorrect warning about overlapping memory regions
      
         - Export of_irq_find_parent again which was made static in 4.4, but
           has users pending for 4.5.
      
         - Fix of_msi_map_rid declaration location
      
         - Fix re-entrancy for of_fdt_unflatten_tree
      
         - Clean-up of phys_addr_t printks"
      
      * tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        of/irq: move of_msi_map_rid declaration to the correct ifdef section
        of/irq: Export of_irq_find_parent again
        of/fdt: Add mutex protection for calls to __unflatten_device_tree()
        of/address: fix typo in comment block of of_translate_one()
        of: do not use 0x in front of %pa
        of: Fix comparison of reserved memory regions
      eef121f4
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · abb7e2b3
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One small build fix, a couple do_div() fixes, and a fix for the gpio
        basic clock type are the major changes here.  There's also a couple
        fixes for the TI, sunxi, and scpi clock drivers"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sunxi: pll2: Fix clock running too fast
        clk: scpi: add missing of_node_put
        clk: qoriq: fix memory leak
        imx/clk-pllv2: fix wrong do_div() usage
        imx/clk-pllv1: fix wrong do_div() usage
        clk: mmp: add linux/clk.h includes
        clk: ti: drop locking code from mux/divider drivers
        clk: ti816x: Add missing dmtimer clkdev entries
        clk: ti: fapll: fix wrong do_div() usage
        clk: ti: clkt_dpll: fix wrong do_div() usage
        clk: gpio: Get parent clk names in of_gpio_clk_setup()
      abb7e2b3
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi · 9a0f76fd
      Linus Torvalds authored
      Pull IPMI fix from Corey Minyard:
       "Fix an Oops if an interrupt occurs at startup.  This can happen on
        some hardware"
      
      * tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
        ipmi: move timer init to before irq is setup
      9a0f76fd
    • Jan Stancek's avatar
      ipmi: move timer init to before irq is setup · 27f972d3
      Jan Stancek authored
      
      
      We encountered a panic on boot in ipmi_si on a dell per320 due to an
      uninitialized timer as follows.
      
      static int smi_start_processing(void       *send_info,
                                      ipmi_smi_t intf)
      {
              /* Try to claim any interrupts. */
              if (new_smi->irq_setup)
                      new_smi->irq_setup(new_smi);
      
       --> IRQ arrives here and irq handler tries to modify uninitialized timer
      
          which triggers BUG_ON(!timer->function) in __mod_timer().
      
       Call Trace:
         <IRQ>
         [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
         [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
         [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
         [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
         [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
         [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
         [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
         [<ffffffff8100fc59>] handle_irq+0x49/0xa0
         [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
         [<ffffffff8100ba53>] ret_from_intr+0x0/0x11
      
              /* Set up the timer that drives the interface. */
              setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
      
      The following patch fixes the problem.
      
      To: Openipmi-developer@lists.sourceforge.net
      To: Corey Minyard <minyard@acm.org>
      CC: linux-kernel@vger.kernel.org
      
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarTony Camuso <tcamuso@redhat.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Cc: stable@vger.kernel.org # Applies cleanly to 3.10-, needs small rework before
      27f972d3
    • Sasha Levin's avatar
      bitops.h: correctly handle rol32 with 0 byte shift · d7e35dfa
      Sasha Levin authored
      ROL on a 32 bit integer with a shift of 32 or more is undefined and the
      result is arch-dependent. Avoid this by handling the trivial case of
      roling by 0 correctly.
      
      The trivial solution of checking if shift is 0 breaks gcc's detection
      of this code as a ROL instruction, which is unacceptable.
      
      This bug was reported and fixed in GCC
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
      
      ):
      
      	The standard rotate idiom,
      
      	  (x << n) | (x >> (32 - n))
      
      	is recognized by gcc (for concreteness, I discuss only the case that x
      	is an uint32_t here).
      
      	However, this is portable C only for n in the range 0 < n < 32. For n
      	== 0, we get x >> 32 which gives undefined behaviour according to the
      	C standard (6.5.7, Bitwise shift operators). To portably support n ==
      	0, one has to write the rotate as something like
      
      	  (x << n) | (x >> ((-n) & 31))
      
      	And this is apparently not recognized by gcc.
      
      Note that this is broken on older GCCs and will result in slower ROL.
      
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7e35dfa
    • Joe Thornber's avatar
      dm space map metadata: fix ref counting bug when bootstrapping a new space map · 50dd842a
      Joe Thornber authored
      
      
      When applying block operations (BOPs) do not remove them from the
      uncommitted BOP ring-buffer until after they've been applied -- in case
      we recurse.
      
      Also, perform BOP_INC operation, in dm_sm_metadata_create() and
      sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
      than using direct calls to sm_ll_inc().
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      50dd842a
    • Joe Thornber's avatar
      dm thin metadata: fix bug when taking a metadata snapshot · 49e99fc7
      Joe Thornber authored
      
      
      When you take a metadata snapshot the btree roots for the mapping and
      details tree need to have their reference counts incremented so they
      persist for the lifetime of the metadata snap.
      
      The roots being incremented were those currently written in the
      superblock, which could possibly be out of date if concurrent IO is
      triggering new mappings, breaking of sharing, etc.
      
      Fix this by performing a commit with the metadata lock held while taking
      a metadata snapshot.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      49e99fc7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 626d114f
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A couple of fixes, both -stable fodder (9p one all way back to 2.6.32,
        dio - to all branches where "Fix negative return from dio read beyond
        eof" will end up it; it's a fixup to commit marked for -stable)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix the regression from "direct-io: Fix negative return from dio read beyond eof"
        9p: ->evict_inode() should kick out ->i_data, not ->i_mapping
      626d114f
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 978d6a90
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "These are more fixes I'd like to have in v4.4.  Several for the Altera
        driver added for v4.4, and one for an MSI domain problem that affects
        several arm64 platforms:
      
        MSI:
         - Only use the generic MSI layer when domain is hierarchical (Marc
           Zyngier)
      
        Altera host bridge driver:
         - Fix loop in tlp_read_packet() (Dan Carpenter)
         - Fix Requester ID for config accesses (Ley Foon Tan)
         - Check TLP completion status (Ley Foon Tan)
         - Fix error when INTx is 4 (Ley Foon Tan)"
      
      * tag 'pci-v4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: altera: Fix error when INTx is 4
        PCI: altera: Check TLP completion status
        PCI: altera: Fix Requester ID for config accesses
        PCI: altera: Fix loop in tlp_read_packet()
        PCI/MSI: Only use the generic MSI layer when domain is hierarchical
      978d6a90
    • Gabriele Martino's avatar
      ALSA: hda/ca0132 - quirk for Alienware 17 2015 · 5328e1ea
      Gabriele Martino authored
      
      
      The Alienware 17 (2015) has the same card and pin configuration of the
      Alienware 15, so the same quirks must be applied.
      
      Signed-off-by: default avatarGabriele Martino <g.martino@gmx.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      5328e1ea
  5. Dec 09, 2015