Skip to content
  1. Mar 24, 2018
    • Petr Machata's avatar
      ip_tunnel: Emit events for post-register MTU changes · f6cc9c05
      Petr Machata authored
      For tunnels created with IFLA_MTU, MTU of the netdevice is set by
      rtnl_create_link() (called from rtnl_newlink()) before the device is
      registered. However without IFLA_MTU that's not done.
      
      rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
      via ip_tunnel_newlink() calls register_netdevice(), and that emits
      NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
      MTU of 0.
      
      After ip_tunnel_newlink() corrects the MTU after registering the
      netdevice, but since there's no event, the listeners don't get to know
      about the MTU until something else happens--such as a NETDEV_UP event.
      That's not ideal.
      
      So instead of setting the MTU directly, go through dev_set_mtu(), which
      takes care of distributing the necessary NETDEV_PRECHANGEMTU and
      NETDEV_CHANGEMTU events.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6cc9c05
  2. Mar 23, 2018
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · f36b7534
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, thp: do not cause memcg oom for thp
        mm/vmscan: wake up flushers for legacy cgroups too
        Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
        mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
        mm/thp: do not wait for lock_page() in deferred_split_scan()
        mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
        x86/mm: implement free pmd/pte page interfaces
        mm/vmalloc: add interfaces to free unmapped page table
        h8300: remove extraneous __BIG_ENDIAN definition
        hugetlbfs: check for pgoff value overflow
        lockdep: fix fs_reclaim warning
        MAINTAINERS: update Mark Fasheh's e-mail
        mm/mempolicy.c: avoid use uninitialized preferred_node
      f36b7534
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 8401c72c
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "Two regression fixes, two bug fixes for older issues, two fixes for
        new functionality added this cycle that have userspace ABI concerns,
        and a small cleanup. These have appeared in a linux-next release and
        have a build success report from the 0day robot.
      
         * The 4.16 rework of altmap handling led to some configurations
           leaking page table allocations due to freeing from the altmap
           reservation rather than the page allocator.
      
           The impact without the fix is leaked memory and a WARN() message
           when tearing down libnvdimm namespaces. The rework also missed a
           place where error handling code needed to be removed that can lead
           to a crash if devm_memremap_pages() fails.
      
         * acpi_map_pxm_to_node() had a latent bug whereby it could
           misidentify the closest online node to a given proximity domain.
      
         * Block integrity handling was reworked several kernels back to allow
           calling add_disk() after setting up the integrity profile.
      
           The nd_btt and nd_blk drivers are just now catching up to fix
           automatic partition detection at driver load time.
      
         * The new peristence_domain attribute, a platform indicator of
           whether cpu caches are powerfail protected for example, is meant to
           be a single value enum and not a set of flags.
      
           This oversight was caught while reviewing new userspace code in
           libndctl to communicate the attribute.
      
           Fix this new enabling up so that we are not stuck with an unwanted
           userspace ABI"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, nfit: fix persistence domain reporting
        libnvdimm, region: hide persistence_domain when unknown
        acpi, numa: fix pxm to online numa node associations
        x86, memremap: fix altmap accounting at free
        libnvdimm: remove redundant assignment to pointer 'dev'
        libnvdimm, {btt, blk}: do integrity setup before add_disk()
        kernel/memremap: Remove stale devres_free() call
      8401c72c
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux · 9ec7ccc8
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes all over the place (core, i915, amdgpu, imx, sun4i,
        ast, tegra, vmwgfx), nothing too serious or worrying at this stage.
      
         - one uapi fix to stop multi-planar images with getfb
      
         - Sun4i error path and clock fixes
      
         - udl driver mmap offset fix
      
         - i915 DP MST and GPU reset fixes
      
         - vmwgfx mutex and black screen fixes
      
         - imx array underflow fix and vblank fix
      
         - amdgpu: display fixes
      
         - exynos devicetree fix
      
         - ast mode fix"
      
      * tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux: (29 commits)
        drm/ast: Fixed 1280x800 Display Issue
        drm: udl: Properly check framebuffer mmap offsets
        drm/i915: Specify which engines to reset following semaphore/event lockups
        drm/vmwgfx: Fix a destoy-while-held mutex problem.
        drm/vmwgfx: Fix black screen and device errors when running without fbdev
        drm: Reject getfb for multi-plane framebuffers
        drm/amd/display: Add one to EDID's audio channel count when passing to DC
        drm/amd/display: We shouldn't set format_default on plane as atomic driver
        drm/amd/display: Fix FMT truncation programming
        drm/amd/display: Allow truncation to 10 bits
        drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
        drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
        drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
        drm/amd/display: fix dereferencing possible ERR_PTR()
        drm/amd/display: Refine disable VGA
        drm/tegra: Shutdown on driver unbind
        drm/tegra: dsi: Don't disable regulator on ->exit()
        drm/tegra: dc: Detach IOMMU group from domain only once
        dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
        drm/imx: move arming of the vblank event to atomic_flush
        ...
      9ec7ccc8
    • David Rientjes's avatar
      mm, thp: do not cause memcg oom for thp · 9d3c3354
      David Rientjes authored
      Commit 25160354 ("mm, thp: remove __GFP_NORETRY from khugepaged and
      madvised allocations") changed the page allocator to no longer detect
      thp allocations based on __GFP_NORETRY.
      
      It did not, however, modify the mem cgroup try_charge() path to avoid
      oom kill for either khugepaged collapsing or thp faulting.  It is never
      expected to oom kill a process to allocate a hugepage for thp; reclaim
      is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
      (and charging) should fallback instead of oom killing processes.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
      Fixes: 25160354
      
       ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d3c3354
    • Andrey Ryabinin's avatar
      mm/vmscan: wake up flushers for legacy cgroups too · 1c610d5f
      Andrey Ryabinin authored
      Commit 726d061f ("mm: vmscan: kick flushers when we encounter dirty
      pages on the LRU") added flusher invocation to shrink_inactive_list()
      when many dirty pages on the LRU are encountered.
      
      However, shrink_inactive_list() doesn't wake up flushers for legacy
      cgroup reclaim, so the next commit bbef9384 ("mm: vmscan: remove old
      flusher wakeup from direct reclaim path") removed the only source of
      flusher's wake up in legacy mem cgroup reclaim path.
      
      This leads to premature OOM if there is too many dirty pages in cgroup:
          # mkdir /sys/fs/cgroup/memory/test
          # echo $$ > /sys/fs/cgroup/memory/test/tasks
          # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
          # dd if=/dev/zero of=tmp_file bs=1M count=100
          Killed
      
          dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
      
          Call Trace:
           dump_stack+0x46/0x65
           dump_header+0x6b/0x2ac
           oom_kill_process+0x21c/0x4a0
           out_of_memory+0x2a5/0x4b0
           mem_cgroup_out_of_memory+0x3b/0x60
           mem_cgroup_oom_synchronize+0x2ed/0x330
           pagefault_out_of_memory+0x24/0x54
           __do_page_fault+0x521/0x540
           page_fault+0x45/0x50
      
          Task in /test killed as a result of limit of /test
          memory: usage 51200kB, limit 51200kB, failcnt 73
          memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
          kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
          Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
                  mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
      	    active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
          Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
          Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
          oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      
      Wake up flushers in legacy cgroup reclaim too.
      
      Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
      Fixes: bbef9384
      
       ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Tested-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c610d5f
    • Daniel Vacek's avatar
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek authored
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      Fixes: b92df1de
      
       ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: default avatarDaniel Vacek <neelx@redhat.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf
    • Kirill A. Shutemov's avatar
      mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() · b3cd54b2
      Kirill A. Shutemov authored
      shmem_unused_huge_shrink() gets called from reclaim path.  Waiting for
      page lock may lead to deadlock there.
      
      There was a bug report that may be attributed to this:
      
        http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.net
      
      Replace lock_page() with trylock_page() and skip the page if we failed
      to lock it.  We will get to the page on the next scan.
      
      We can test for the PageTransHuge() outside the page lock as we only
      need protection against splitting the page under us.  Holding pin oni
      the page is enough for this.
      
      Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel.com
      Fixes: 779750d2
      
       ("shmem: split huge pages beyond i_size under memory pressure")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarEric Wheeler <linux-mm@lists.ewheeler.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3cd54b2
    • Kirill A. Shutemov's avatar
      mm/thp: do not wait for lock_page() in deferred_split_scan() · fa41b900
      Kirill A. Shutemov authored
      deferred_split_scan() gets called from reclaim path.  Waiting for page
      lock may lead to deadlock there.
      
      Replace lock_page() with trylock_page() and skip the page if we failed
      to lock it.  We will get to the page on the next scan.
      
      Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel.com
      Fixes: 9a982250
      
       ("thp: introduce deferred_split_huge_page()")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa41b900
    • Kirill A. Shutemov's avatar
      mm/khugepaged.c: convert VM_BUG_ON() to collapse fail · fece2029
      Kirill A. Shutemov authored
      khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
      mapped.  We do not collapse such pages.  See check
      khugepaged_scan_pmd().
      
      But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
      somebody managed to instantiate THP in the range and then split the PMD
      back to PTEs we would have a problem --
      VM_BUG_ON_PAGE(PageCompound(page)) will get triggered.
      
      It's possible since we drop mmap_sem during collapse to re-take for
      write.
      
      Replace the VM_BUG_ON() with graceful collapse fail.
      
      Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com
      Fixes: b1caa957
      
       ("khugepaged: ignore pmd tables with THP mapped with ptes")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fece2029
    • Toshi Kani's avatar
      x86/mm: implement free pmd/pte page interfaces · 28ee90fe
      Toshi Kani authored
      Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
      clear a given pud/pmd entry and free up lower level page table(s).
      
      The address range associated with the pud/pmd entry must have been
      purged by INVLPG.
      
      Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
      Fixes: e61ce6ad
      
       ("mm: change ioremap to set up huge I/O mappings")
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      28ee90fe
    • Toshi Kani's avatar
      mm/vmalloc: add interfaces to free unmapped page table · b6bdb751
      Toshi Kani authored
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
      
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
      
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
         up.
      
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
      
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
         purge.
      
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      entries.
      
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      Fixes: e61ce6ad
      
       ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6bdb751
    • Arnd Bergmann's avatar
      h8300: remove extraneous __BIG_ENDIAN definition · 1705f7c5
      Arnd Bergmann authored
      A bugfix I did earlier caused a build regression on h8300, which defines
      the __BIG_ENDIAN macro in a slightly different way than the generic
      code:
      
        arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined
      
      We don't need to define it here, as the same macro is already provided
      by the linux/byteorder/big_endian.h, and that version does not conflict.
      
      While this is a v4.16 regression, my earlier patch also got backported
      to the 4.14 and 4.15 stable kernels, so we need the fixup there as well.
      
      Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de
      Fixes: 101110f6
      
       ("Kbuild: always define endianess in kconfig.h")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1705f7c5
    • Mike Kravetz's avatar
      hugetlbfs: check for pgoff value overflow · 63489f8e
      Mike Kravetz authored
      A vma with vm_pgoff large enough to overflow a loff_t type when
      converted to a byte offset can be passed via the remap_file_pages system
      call.  The hugetlbfs mmap routine uses the byte offset to calculate
      reservations and file size.
      
      A sequence such as:
      
        mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
        remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
      
      will result in the following when task exits/file closed,
      
        kernel BUG at mm/hugetlb.c:749!
        Call Trace:
          hugetlbfs_evict_inode+0x2f/0x40
          evict+0xcb/0x190
          __dentry_kill+0xcb/0x150
          __fput+0x164/0x1e0
          task_work_run+0x84/0xa0
          exit_to_usermode_loop+0x7d/0x80
          do_syscall_64+0x18b/0x190
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      The overflowed pgoff value causes hugetlbfs to try to set up a mapping
      with a negative range (end < start) that leaves invalid state which
      causes the BUG.
      
      The previous overflow fix to this code was incomplete and did not take
      the remap_file_pages system call into account.
      
      [mike.kravetz@oracle.com: v3]
        Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
      [akpm@linux-foundation.org: include mmdebug.h]
      [akpm@linux-foundation.org: fix -ve left shift count on sh]
      Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
      Fixes: 045c7a3f
      
       ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarNic Losby <blurbdust@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63489f8e
    • Tetsuo Handa's avatar
      lockdep: fix fs_reclaim warning · 2e517d68
      Tetsuo Handa authored
      Dave Jones reported fs_reclaim lockdep warnings.
      
        ============================================
        WARNING: possible recursive locking detected
        4.15.0-rc9-backup-debug+ #1 Not tainted
        --------------------------------------------
        sshd/24800 is trying to acquire lock:
         (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
      
        but task is already holding lock:
         (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(fs_reclaim);
          lock(fs_reclaim);
      
         *** DEADLOCK ***
      
         May be due to missing lock nesting notation
      
        2 locks held by sshd/24800:
         #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
         #1:  (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
      
        stack backtrace:
        CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
        Call Trace:
         dump_stack+0xbc/0x13f
         __lock_acquire+0xa09/0x2040
         lock_acquire+0x12e/0x350
         fs_reclaim_acquire.part.102+0x29/0x30
         kmem_cache_alloc+0x3d/0x2c0
         alloc_extent_state+0xa7/0x410
         __clear_extent_bit+0x3ea/0x570
         try_release_extent_mapping+0x21a/0x260
         __btrfs_releasepage+0xb0/0x1c0
         btrfs_releasepage+0x161/0x170
         try_to_release_page+0x162/0x1c0
         shrink_page_list+0x1d5a/0x2fb0
         shrink_inactive_list+0x451/0x940
         shrink_node_memcg.constprop.88+0x4c9/0x5e0
         shrink_node+0x12d/0x260
         try_to_free_pages+0x418/0xaf0
         __alloc_pages_slowpath+0x976/0x1790
         __alloc_pages_nodemask+0x52c/0x5c0
         new_slab+0x374/0x3f0
         ___slab_alloc.constprop.81+0x47e/0x5a0
         __slab_alloc.constprop.80+0x32/0x60
         __kmalloc_track_caller+0x267/0x310
         __kmalloc_reserve.isra.40+0x29/0x80
         __alloc_skb+0xee/0x390
         sk_stream_alloc_skb+0xb8/0x340
         tcp_sendmsg_locked+0x8e6/0x1d30
         tcp_sendmsg+0x27/0x40
         inet_sendmsg+0xd0/0x310
         sock_write_iter+0x17a/0x240
         __vfs_write+0x2ab/0x380
         vfs_write+0xfb/0x260
         SyS_write+0xb6/0x140
         do_syscall_64+0x1e5/0xc05
         entry_SYSCALL64_slow_path+0x25/0x25
      
      This warning is caused by commit d92a8cfc ("locking/lockdep:
      Rework FS_RECLAIM annotation") which replaced the use of
      lockdep_{set,clear}_current_reclaim_state() in __perform_reclaim()
      and lockdep_trace_alloc() in slab_pre_alloc_hook() with
      fs_reclaim_acquire()/ fs_reclaim_release().
      
      Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC |
      __GFP_NOWARN to gfp_mask, and all reclaim path simply propagates
      __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() is
      trying to grab the 'fake' lock again when __perform_reclaim() already
      grabbed the 'fake' lock.
      
      The
      
        /* this guy won't enter reclaim */
        if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
                return false;
      
      test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
      was added by commit cf40bd16 ("lockdep: annotate reclaim context
      (__GFP_NOFS)").  But that test is outdated because PF_MEMALLOC thread
      won't enter reclaim regardless of __GFP_NOMEMALLOC after commit
      341ce06f ("page allocator: calculate the alloc_flags for allocation
      only once") added the PF_MEMALLOC safeguard (
      
        /* Avoid recursion of direct reclaim */
        if (p->flags & PF_MEMALLOC)
                goto nopage;
      
      in __alloc_pages_slowpath()).
      
      Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and
      allow __need_fs_reclaim() to return false.
      
      Link: http://lkml.kernel.org/r/201802280650.FJC73911.FOSOMLJVFFQtHO@I-love.SAKURA.ne.jp
      Fixes: d92a8cfc
      
       ("locking/lockdep: Rework FS_RECLAIM annotation")
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Tested-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e517d68
    • Mark Fasheh's avatar
      MAINTAINERS: update Mark Fasheh's e-mail · 296cefee
      Mark Fasheh authored
      I'd like to use my personal e-mail for Ocfs2 requests and review.
      
      Link: http://lkml.kernel.org/r/20180311231356.9385-1-mfasheh@versity.com
      
      
      Signed-off-by: default avatarMark Fasheh <mark@fasheh.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      296cefee
    • Yisheng Xie's avatar
      mm/mempolicy.c: avoid use uninitialized preferred_node · 8970a63e
      Yisheng Xie authored
      Alexander reported a use of uninitialized memory in __mpol_equal(),
      which is caused by incorrect use of preferred_node.
      
      When mempolicy in mode MPOL_PREFERRED with flags MPOL_F_LOCAL, it uses
      numa_node_id() instead of preferred_node, however, __mpol_equal() uses
      preferred_node without checking whether it is MPOL_F_LOCAL or not.
      
      [akpm@linux-foundation.org: slight comment tweak]
      Link: http://lkml.kernel.org/r/4ebee1c2-57f6-bcb8-0e2d-1833d1ee0bb7@huawei.com
      Fixes: fc36b8d3
      
       ("mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy")
      Signed-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8970a63e
    • Y.C. Chen's avatar
      drm/ast: Fixed 1280x800 Display Issue · 5a9f698f
      Y.C. Chen authored
      
      
      The original ast driver cannot display properly if the resolution is 1280x800 and the pixel clock is 83.5MHz.
      Here is the update to fix it.
      
      Signed-off-by: default avatarY.C. Chen <yc_chen@aspeedtech.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      5a9f698f
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e7d7743f
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These revert one recent commit that added incorrect battery quirks for
        some Asus systems and fix an off-by-one error in the watchdog driver
        based on the ACPI WDAT table.
      
        Specifics:
      
         - Revert the recent change adding battery quirks for Asus GL502VSK
           and UX305LA as these quirks turn out to be inadequate and possibly
           premature (Daniel Drake).
      
         - Fix an off-by-one error in the resource allocation part of the
           watchdog driver based on the ACPI WDAT table (Takashi Iwai)"
      
      * tag 'acpi-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / watchdog: Fix off-by-one error at resource assignment
        Revert "ACPI / battery: Add quirk for Asus GL502VSK and UX305LA"
      e7d7743f
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 394c73d3
      Linus Torvalds authored
      Pull modules fix from Jessica Yu:
       "Propagate error in modules_open() to avoid possible later NULL
        dereference if seq_open() had failed"
      
      * tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module: propagate error in modules_open()
      394c73d3
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-wdat' · 594fdbaa
      Rafael J. Wysocki authored
      * acpi-wdat:
        ACPI / watchdog: Fix off-by-one error at resource assignment
      594fdbaa
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c4f4d2f9
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Always validate XFRM esn replay attribute, from Florian Westphal.
      
       2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long.
      
       3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul
          Triebitz.
      
       4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel
          Axtens.
      
       5) Fix some interrupt handling issues in e1000e driver, from Benjamin
          Poitier.
      
       6) Use strlcpy() in several ethtool get_strings methods, from Florian
          Fainelli.
      
       7) Fix rhlist dup insertion, from Paul Blakey.
      
       8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev.
      
       9) Fix driver unload crash when link is up in smsc911x, from Jeremy
          Linton.
      
      10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric
          Dumazet.
      
      11) Need to purge the write queue when TCP connections are aborted,
          otherwise userspace using MSG_ZEROCOPY can't close the fd. From
          Soheil Hassas Yeganeh.
      
      12) Fix double free in error path of team driver, from Arkadi
          Sharshevsky.
      
      13) Filter fixes for hv_netvsc driver, from Stephen Hemminger.
      
      14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo
          Bianconi.
      
      15) Properly filter out unsupported feature flags in macvlan driver,
          from Shannon Nelson.
      
      16) Don't request loading the diag module for a protocol if the protocol
          itself is not even registered. From Xin Long.
      
      17) If datagram connect fails in ipv6, make sure the socket state is
          consistent afterwards. From Paolo Abeni.
      
      18) Use after free in qed driver, from Dan Carpenter.
      
      19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the
          entry. From Sabrina Dubroca.
      
      20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins.
      
      21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita.
      
      22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel.
      
      23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti.
      
      24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli.
      
      25) Memory leak in gemini driver, from Igor Pylypiv.
      
      26) IDR leaks in error paths of tcf_*_init() functions, from Davide
          Caratti.
      
      27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun.
      
      28) Missing dev_put() in error path of macsec_newlink(), from Dan
          Carpenter.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits)
        macsec: missing dev_put() on error in macsec_newlink()
        net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
        hv_netvsc: common detach logic
        hv_netvsc: change GPAD teardown order on older versions
        hv_netvsc: use RCU to fix concurrent rx and queue changes
        hv_netvsc: disable NAPI before channel close
        net/ipv6: Handle onlink flag with multipath routes
        ppp: avoid loop in xmit recursion detection code
        ipv6: sr: fix NULL pointer dereference when setting encap source address
        ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
        net: aquantia: driver version bump
        net: aquantia: Implement pci shutdown callback
        net: aquantia: Allow live mac address changes
        net: aquantia: Add tx clean budget and valid budget handling logic
        net: aquantia: Change inefficient wait loop on fw data reads
        net: aquantia: Fix a regression with reset on old firmware
        net: aquantia: Fix hardware reset when SPI may rarely hangup
        s390/qeth: on channel error, reject further cmd requests
        s390/qeth: lock read device while queueing next buffer
        s390/qeth: when thread completes, wake up all waiters
        ...
      c4f4d2f9
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 9ce20788
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "A couple of MMC fixes intended for v4.16-rc7:
      
        MMC host:
      
         - dw_mmc: Fix the suspend/resume issue for Exynos5433
      
         - dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit
           systems
      
         - dw_mmc: Make PIO mode work when failing with idmac when
           dw_mci_reset occurs
      
         - sdhci-acpi: Re-allow IRQ 0 to fix broken probe
      
        MMC core:
      
         - Update EXT_CSD caches to correctly switch partition for ioctl calls
      
         - Fix tracepoint print of blk_addr and blksz
      
         - Disable HPI on broken Micron (Numonyx) eMMC cards"
      
      * tag 'mmc-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-acpi: Fix IRQ 0
        mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
        mmc: core: Fix tracepoint print of blk_addr and blksz
        mmc: core: Disable HPI for certain Micron (Numonyx) eMMC cards
        mmc: dw_mmc: exynos: fix the suspend/resume issue for exynos5433
        mmc: block: fix updating ext_csd caches on ioctl call
        mmc: dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit systems
      9ce20788
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · b7b3f669
      Dave Airlie authored
      Main change is a patch to reject getfb call for multiplanar framebuffers,
      then we have a couple of error path fixes on the sun4i driver. Still on that
      driver there is a clk fix and finally a mmap offset fix on the udl driver.
      
      * tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc:
        drm: udl: Properly check framebuffer mmap offsets
        drm: Reject getfb for multi-plane framebuffers
        drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
        drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
        drm/sun4i: Fix an error handling path in 'sun4i_drv_bind()'
        drm/sun4i: Fix exclusivity of the TCON clocks
      b7b3f669
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2018-03-21' of... · 8c2d689e
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      One fix for DP MST and one fix for GPU reset on hang check.
      
      * tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel:
        drm/i915: Specify which engines to reset following semaphore/event lockups
        drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
      8c2d689e
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux into drm-fixes · 096c49ec
      Dave Airlie authored
      Two vmwgfx fixes for 4.16. Both cc'd stable.
      
      * 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux:
        drm/vmwgfx: Fix a destoy-while-held mutex problem.
        drm/vmwgfx: Fix black screen and device errors when running without fbdev
      096c49ec
    • Dave Airlie's avatar
      Merge tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux into drm-fixes · cec1b948
      Dave Airlie authored
      drm/imx: fixes for early vblank event issue, array underflow error
      
      - fix an array underflow error by reordering the range check before the array
        subscript in ipu-prg.
      - make some local functions static in ipuv3-plane.
      - add a missng header for ipu_planes_assign_pre in ipuv3-plane.
      - move arming of the vblank event from atomic_begin to atomic_flush, to avoid
        signalling atomic commit completion to userspace before plane atomic_update
        has finished, due to a race condition that is likely to be hit on i.MX6QP on
        PRE enabled channels.
      
      * tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux:
        drm/imx: move arming of the vblank event to atomic_flush
        drm/imx: ipuv3-plane: Include "imx-drm.h" header file
        drm/imx: ipuv3-plane: Make functions static when possible
        gpu: ipu-v3: prg: avoid possible array underflow
      cec1b948
    • Dan Carpenter's avatar
      macsec: missing dev_put() on error in macsec_newlink() · 5dcd8400
      Dan Carpenter authored
      We moved the dev_hold(real_dev); call earlier in the function but forgot
      to update the error paths.
      
      Fixes: 0759e552
      
       ("macsec: fix negative refcnt on parent link")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dcd8400
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-03-21' of... · e0645d9b
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      
      
      Johannes Berg says:
      
      ====================
      Two more fixes (in three patches):
       * ath9k_htc doesn't like QoS NDP frames, use regular ones
       * hwsim: set up wmediumd for radios created later
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0645d9b
    • Florian Fainelli's avatar
      net: dsa: Fix functional dsa-loop dependency on FIXED_PHY · 40013ff2
      Florian Fainelli authored
      We have a functional dependency on the FIXED_PHY MDIO bus because we register
      fixed PHY devices "the old way" which only works if the code that does this has
      had a chance to run before the fixed MDIO bus is probed. Make sure we account
      for that and have dsa_loop_bdinfo.o be either built-in or modular depending on
      whether CONFIG_FIXED_PHY reflects that too.
      
      Fixes: 98cd1552
      
       ("net: dsa: Mock-up driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40013ff2
    • David S. Miller's avatar
      Merge branch 'hv_netvsc-fix-races-during-shutdown-and-changes' · 53e82697
      David S. Miller authored
      
      
      Stephen Hemminger says:
      
      ====================
      hv_netvsc: fix races during shutdown and changes
      
      This set of patches fixes issues identified by Vitaly Kuznetsov and
      Mohammed Gamal related to state changes in Hyper-v network driver.
      
      A lot of the issues are because setting up the netvsc device requires
      a second step (in work queue) to get all the sub-channels running.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53e82697
    • Stephen Hemminger's avatar
      hv_netvsc: common detach logic · 7b2ee50c
      Stephen Hemminger authored
      Make common function for detaching internals of device
      during changes to MTU and RSS. Make sure no more packets
      are transmitted and all packets have been received before
      doing device teardown.
      
      Change the wait logic to be common and use usleep_range().
      
      Changes transmit enabling logic so that transmit queues are disabled
      during the period when lower device is being changed. And enabled
      only after sub channels are setup. This avoids issue where it could
      be that a packet was being sent while subchannel was not initialized.
      
      Fixes: 8195b139
      
       ("hv_netvsc: fix deadlock on hotplug")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b2ee50c
    • Stephen Hemminger's avatar
      hv_netvsc: change GPAD teardown order on older versions · 0ef58b0a
      Stephen Hemminger authored
      
      
      On older versions of Windows, the host ignores messages after
      vmbus channel is closed.
      
      Workaround this by doing what Windows does and send the teardown
      before close on older versions of NVSP protocol.
      
      Reported-by: default avatarMohammed Gamal <mgamal@redhat.com>
      Fixes: 0cf73780
      
       ("hv_netvsc: netvsc_teardown_gpadl() split")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ef58b0a
    • Stephen Hemminger's avatar
      hv_netvsc: use RCU to fix concurrent rx and queue changes · 02400fce
      Stephen Hemminger authored
      The receive processing may continue to happen while the
      internal network device state is in RCU grace period.
      The internal RNDIS structure is associated with the
      internal netvsc_device structure; both have the same
      RCU lifetime.
      
      Defer freeing all associated parts until after grace
      period.
      
      Fixes: 0cf73780
      
       ("hv_netvsc: netvsc_teardown_gpadl() split")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02400fce
    • Stephen Hemminger's avatar
      hv_netvsc: disable NAPI before channel close · 8348e046
      Stephen Hemminger authored
      This makes sure that no CPU is still process packets when
      the channel is closed.
      
      Fixes: 76bb5db5
      
       ("netvsc: fix use after free on module removal")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8348e046
    • David Ahern's avatar
      net/ipv6: Handle onlink flag with multipath routes · 68e2ffde
      David Ahern authored
      For multipath routes the ONLINK flag can be specified per nexthop in
      rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
      to consider the ONLINK setting coming from rtnh_flags. Each loop over
      nexthops the config for the sibling route is initialized to the global
      config and then per nexthop settings overlayed. The flag is 'or'ed into
      fib6_config to handle the ONLINK flag coming from either rtm_flags or
      rtnh_flags.
      
      Fixes: fc1e64e1
      
       ("net/ipv6: Add support for onlink flag")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68e2ffde
    • Guillaume Nault's avatar
      ppp: avoid loop in xmit recursion detection code · 6d066734
      Guillaume Nault authored
      We already detect situations where a PPP channel sends packets back to
      its upper PPP device. While this is enough to avoid deadlocking on xmit
      locks, this doesn't prevent packets from looping between the channel
      and the unit.
      
      The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq
      before checking for xmit recursion. Therefore, __ppp_xmit_process()
      might dequeue a packet from ppp->file.xq and send it on the channel
      which, in turn, loops it back on the unit. Then ppp_start_xmit()
      queues the packet back to ppp->file.xq and __ppp_xmit_process() picks
      it up and sends it again through the channel. Therefore, the packet
      will loop between __ppp_xmit_process() and ppp_start_xmit() until some
      other part of the xmit path drops it.
      
      For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops
      the packet after a few iterations. But PPTP reallocates the headroom
      if necessary, letting the loop run and exhaust the machine resources
      (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109
      
      ).
      
      Fix this by letting __ppp_xmit_process() enqueue the skb to
      ppp->file.xq, so that we can check for recursion before adding it to
      the queue. Now ppp_xmit_process() can drop the packet when recursion is
      detected.
      
      __ppp_channel_push() is a bit special. It calls __ppp_xmit_process()
      without having any actual packet to send. This is used by
      ppp_output_wakeup() to re-enable transmission on the parent unit (for
      implementations like ppp_async.c, where the .start_xmit() function
      might not consume the skb, leaving it in ppp->xmit_pending and
      disabling transmission).
      Therefore, __ppp_xmit_process() needs to handle the case where skb is
      NULL, dequeuing as many packets as possible from ppp->file.xq.
      
      Reported-by: default avatarxu heng <xuheng333@zoho.com>
      Fixes: 55454a56
      
       ("ppp: avoid dealock on recursive xmit")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d066734
    • David Lebrun's avatar
      ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76
      David Lebrun authored
      When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
      source address of the outer IPv6 header, in case none was specified.
      Using skb->dev can lead to BUG() when it is in an inconsistent state.
      This patch uses the net_device attached to the skb's dst instead.
      
      [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
      [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
      [940807.815725] PGD 0 P4D 0
      [940807.847173] Oops: 0000 [#1] SMP PTI
      [940807.890073] Modules linked in:
      [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
      [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
      [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
      [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
      [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
      [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
      [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
      [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
      [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
      [940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
      [940808.939634] Call Trace:
      [940808.970041]  <IRQ>
      [940808.995250]  ? ip6t_do_table+0x265/0x640
      [940809.043341]  seg6_do_srh_encap+0x28f/0x300
      [940809.093516]  ? seg6_do_srh+0x1a0/0x210
      [940809.139528]  seg6_do_srh+0x1a0/0x210
      [940809.183462]  seg6_output+0x28/0x1e0
      [940809.226358]  lwtunnel_output+0x3f/0x70
      [940809.272370]  ip6_xmit+0x2b8/0x530
      [940809.313185]  ? ac6_proc_exit+0x20/0x20
      [940809.359197]  inet6_csk_xmit+0x7d/0xc0
      [940809.404173]  tcp_transmit_skb+0x548/0x9a0
      [940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
      [940809.506603]  ? ip6_default_advmss+0x40/0x40
      [940809.557824]  ? tcp_current_mss+0x24/0x90
      [940809.605925]  tcp_retransmit_skb+0xd/0x80
      [940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
      [940809.719797]  tcp_ack+0xa47/0x1110
      [940809.760612]  tcp_rcv_established+0x13c/0x570
      [940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
      [940809.858879]  tcp_v6_rcv+0xa5c/0xb10
      [940809.901770]  ? seg6_output+0xdd/0x1e0
      [940809.946745]  ip6_input_finish+0xbb/0x460
      [940809.994837]  ip6_input+0x74/0x80
      [940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
      [940810.081663]  ipv6_rcv+0x31c/0x4c0
      ...
      
      Fixes: 6c8702c6
      
       ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8936ef76
    • David Lebrun's avatar
      ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca
      David Lebrun authored
      The seg6_build_state() function is called with RCU read lock held,
      so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
      
      [   92.770271] =============================
      [   92.770628] WARNING: suspicious RCU usage
      [   92.770921] 4.16.0-rc4+ #12 Not tainted
      [   92.771277] -----------------------------
      [   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [   92.772279]
      [   92.772279] other info that might help us debug this:
      [   92.772279]
      [   92.773067]
      [   92.773067] rcu_scheduler_active = 2, debug_locks = 1
      [   92.773514] 2 locks held by ip/2413:
      [   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
      [   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
      [   92.775065]
      [   92.775065] stack backtrace:
      [   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
      [   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
      [   92.776608] Call Trace:
      [   92.776852]  dump_stack+0x7d/0xbc
      [   92.777130]  __schedule+0x133/0xf00
      [   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
      [   92.777783]  ? __sched_text_start+0x8/0x8
      [   92.778073]  ? rcu_is_watching+0x19/0x30
      [   92.778383]  ? kernel_text_address+0x49/0x60
      [   92.778800]  ? __kernel_text_address+0x9/0x30
      [   92.779241]  ? unwind_get_return_address+0x29/0x40
      [   92.779727]  ? pcpu_alloc+0x102/0x8f0
      [   92.780101]  _cond_resched+0x23/0x50
      [   92.780459]  __mutex_lock+0xbd/0xad0
      [   92.780818]  ? pcpu_alloc+0x102/0x8f0
      [   92.781194]  ? seg6_build_state+0x11d/0x240
      [   92.781611]  ? save_stack+0x9b/0xb0
      [   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
      [   92.782480]  ? seg6_build_state+0x11d/0x240
      [   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
      [   92.783393]  ? ip6_route_info_create+0x687/0x1640
      [   92.783846]  ? ip6_route_add+0x74/0x110
      [   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0
      
      Fixes: 6c8702c6
      
       ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      191f86ca
    • David S. Miller's avatar
      Merge branch 'aquantia-fixes' · 9b894cdd
      David S. Miller authored
      
      
      Igor Russkikh says:
      
      ====================
      Aquantia atlantic hot fixes 03-2018
      
      This is a set of atlantic driver hot fixes for various areas:
      
      Some issues with hardware reset covered,
      Fixed napi_poll flood happening on some traffic conditions,
      Allow system to change MAC address on live device,
      Add pci shutdown handler.
      
      patch v2:
      - reverse christmas tree
      - remove driver private parameter, replacing it with define.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b894cdd