Skip to content
  1. Jan 29, 2018
    • Linus Torvalds's avatar
      Linux 4.15 · d8a5b805
      Linus Torvalds authored
      v4.15
      d8a5b805
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24b1cccf
      Linus Torvalds authored
      Pull x86 retpoline fixlet from Thomas Gleixner:
       "Remove the ESP/RSP thunks for retpoline as they cannot ever work.
      
        Get rid of them before they show up in a release"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/retpoline: Remove the esp/rsp thunk
      24b1cccf
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 32c6cdf7
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of small fixes for 4.15:
      
         - Fix vmapped stack synchronization on systems with 4-level paging
           and a large amount of memory caused by a missing 5-level folding
           which made the pgd synchronization logic to fail and causing double
           faults.
      
         - Add a missing sanity check in the vmalloc_fault() logic on 5-level
           paging systems.
      
         - Bring back protection against accessing a freed initrd in the
           microcode loader which was lost by a wrong merge conflict
           resolution.
      
         - Extend the Broadwell micro code loading sanity check.
      
         - Add a missing ENDPROC annotation in ftrace assembly code which
           makes ORC unhappy.
      
         - Prevent loading the AMD power module on !AMD platforms. The load
           itself is uncritical, but an unload attempt results in a kernel
           crash.
      
         - Update Peter Anvins role in the MAINTAINERS file"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ftrace: Add one more ENDPROC annotation
        x86: Mark hpa as a "Designated Reviewer" for the time being
        x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
        x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
        x86/microcode: Fix again accessing initrd after having been freed
        x86/microcode/intel: Extend BDW late-loading further with LLC size check
        perf/x86/amd/power: Do not load AMD power module on !AMD platforms
      32c6cdf7
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 07b0137c
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A single fix for a ~10 years old problem which causes high resolution
        timers to stop after a CPU unplug/plug cycle due to a stale flag in
        the per CPU hrtimer base struct.
      
        Paul McKenney was hunting this for about a year, but the heisenbug
        nature made it resistant against debug attempts for quite some time"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        hrtimer: Reset hrtimer cpu base proper on CPU hotplug
      07b0137c
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 62444192
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "A single bug fix to prevent a subtle deadlock in the scheduler core
        code vs cpu hotplug"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Fix cpu.max vs. cpuhotplug deadlock
      62444192
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 39e38362
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "Four patches which all address lock inversions and deadlocks in the
        perf core code and the Intel debug store"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix perf,x86,cpuhp deadlock
        perf/core: Fix ctx::mutex deadlock
        perf/core: Fix another perf,trace,cpuhp lock inversion
        perf/core: Fix lock inversion between perf,trace,cpuhp
      39e38362
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8c76e31a
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "Two final locking fixes for 4.15:
      
         - Repair the OWNER_DIED logic in the futex code which got wreckaged
           with the recent fix for a subtle race condition.
      
         - Prevent the hard lockup detector from triggering when dumping all
           held locks in the system"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
        futex: Fix OWNER_DEAD fixup
      8c76e31a
  2. Jan 28, 2018
  3. Jan 27, 2018
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner authored
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494
      
       ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
    • H. Peter Anvin's avatar
      x86: Mark hpa as a "Designated Reviewer" for the time being · 8a95b74d
      H. Peter Anvin authored
      
      
      Due to some unfortunate events, I have not been directly involved in
      the x86 kernel patch flow for a while now.  I have also not been able
      to ramp back up by now like I had hoped to, and after reviewing what I
      will need to work on both internally at Intel and elsewhere in the near
      term, it is clear that I am not going to be able to ramp back up until
      late 2018 at the very earliest.
      
      It is not acceptable to not recognize that this load is currently
      taken by Ingo and Thomas without my direct participation, so I mark
      myself as R: (designated reviewer) rather than M: (maintainer) until
      further notice.  This is in fact recognizing the de facto situation
      for the past few years.
      
      I have obviously no intention of going away, and I will do everything
      within my power to improve Linux on x86 and x86 for Linux.  This,
      however, puts credit where it is due and reflects a change of focus.
      
      This patch also removes stale entries for portions of the x86
      architecture which have not been maintained separately from arch/x86
      for a long time.  If there is a reason to re-introduce them then that
      can happen later.
      
      Signed-off-by: default avatarH. Peter Anvin <h.peter.anvin@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a95b74d
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.15-maintainers' of... · c4e0ca7f
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V update from Palmer Dabbelt:
       "RISC-V: We have a new mailing list and git repo!
      
        Sorry to send something essentially as late as possible (Friday after
        an rc9), but we managed to get a mailing list for the RISC-V Linux
        port. We've been using patches@groups.riscv.org for a while, but that
        list has some problems (it's Google Groups and it's shared over all
        RISC-V software projects). The new infaread.org list is much better.
        We just got it on Wednesday but I used it a bit on Thursday to shake
        out all the configuration problems and it appears to be in working
        order.
      
        When I updated the mailing list I noticed that the MAINTAINERS file
        was pointing to our github repo, but now that we have a kernel.org
        repo I'd like to point to that instead so I changed that as well.
        We'll be centralizing all RISC-V Linux related development here as
        that seems to be the saner way to go about it.
      
        I can understand if it's too late to get this into 4.15, but given
        that it's not a code change I was hoping it'd still be OK. It would be
        nice to have the new mailing list and git repo in the release tarballs
        so when people start to find bugs they'll get to the right place"
      
      * tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        Update the RISC-V MAINTAINERS file
      c4e0ca7f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ba804bb4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) The per-network-namespace loopback device, and thus its namespace,
          can have its teardown deferred for a long time if a kernel created
          TCP socket closes and the namespace is exiting meanwhile. The kernel
          keeps trying to finish the close sequence until it times out (which
          takes quite some time).
      
          Fix this by forcing the socket closed in this situation, from Dan
          Streetman.
      
       2) Fix regression where we're trying to invoke the update_pmtu method
          on route types (in this case metadata tunnel routes) that don't
          implement the dst_ops method. Fix from Nicolas Dichtel.
      
       3) Fix long standing memory corruption issues in r8169 driver by
          performing the chip statistics DMA programming more correctly. From
          Francois Romieu.
      
       4) Handle local broadcast sends over VRF routes properly, from David
          Ahern.
      
       5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket
          can never be released. From Alexey Kodanev.
      
       6) Set poll flags properly in VSOCK protocol layer, from Stefan
          Hajnoczi.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
        dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
        net: vrf: Add support for sends to local broadcast address
        r8169: fix memory corruption on retrieval of hardware statistics.
        net: don't call update_pmtu unconditionally
        net: tcp: close sock if net namespace is exiting
      ba804bb4
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux · db218549
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A fairly urgent nouveau regression fix for broken irqs across
        suspend/resume came in. This was broken before but a patch in 4.15 has
        made it much more obviously broken and now s/r fails a lot more often.
      
        The fix removes freeing the irq across s/r which never should have
        been done anyways.
      
        Also two vc4 fixes for a NULL deference and some misrendering /
        flickering on screen"
      
      * tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux:
        drm/nouveau: Move irq setup/teardown to pci ctor/dtor
        drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
        drm/vc4: Flush the caches before the bin jobs, as well.
      db218549
    • Stefan Hajnoczi's avatar
      VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING · ba3169fc
      Stefan Hajnoczi authored
      
      
      select(2) with wfds but no rfds must return when the socket is shut down
      by the peer.  This way userspace notices socket activity and gets -EPIPE
      from the next write(2).
      
      Currently select(2) does not return for virtio-vsock when a SEND+RCV
      shutdown packet is received.  This is because vsock_poll() only sets
      POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
      socket is in when the shutdown is received.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba3169fc
    • Alexey Kodanev's avatar
      dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state · dd5684ec
      Alexey Kodanev authored
      ccid2_hc_tx_rto_expire() timer callback always restarts the timer
      again and can run indefinitely (unless it is stopped outside), and after
      commit 120e9dab ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
      which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
      dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
      The timer prevents releasing the socket, as a result, sk_destruct() won't
      be called.
      
      Found with LTP/dccp_ipsec tests running on the bonding device,
      which later couldn't be unloaded after the tests were completed:
      
        unregister_netdevice: waiting for bond0 to become free. Usage count = 148
      
      Fixes: 2a91aa39
      
       ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd5684ec
    • Palmer Dabbelt's avatar
      Update the RISC-V MAINTAINERS file · 6572cc2b
      Palmer Dabbelt authored
      
      
      Now that we're upstream in Linux we've been able to make some
      infrastructure changes so our port works a bit more like other ports.
      Specifically:
      
      * We now have a mailing list specific to the RISC-V Linux port, hosted
        at lists.infreadead.org.
      * We now have a kernel.org git tree where work on our port is
        coordinated.
      
      This patch changes the RISC-V maintainers entry to reflect these new
      bits of infrastructure.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      6572cc2b
  4. Jan 26, 2018
    • Andy Lutomirski's avatar
      x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels · 36b3a772
      Andy Lutomirski authored
      On a 5-level kernel, if a non-init mm has a top-level entry, it needs to
      match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON()
      that would have checked it.
      
      While we're at it, get rid of the rather confusing 4-level folded "pgd"
      logic.
      
      Cleans-up: b50858ce
      
       ("x86/mm/vmalloc: Add 5-level paging support")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Neil Berrington <neil.berrington@datacore.com>
      Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
      36b3a772
    • Andy Lutomirski's avatar
      x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems · 5beda7d5
      Andy Lutomirski authored
      Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses
      large amounts of vmalloc space with PTI enabled.
      
      The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd
      folding code into account, so, on a 4-level kernel, the pgd synchronization
      logic compiles away to exactly nothing.
      
      Interestingly, the problem doesn't trigger with nopti.  I assume this is
      because the kernel is mapped with global pages if we boot with nopti.  The
      sequence of operations when we create a new task is that we first load its
      mm while still running on the old stack (which crashes if the old stack is
      unmapped in the new mm unless the TLB saves us), then we call
      prepare_switch_to(), and then we switch to the new stack.
      prepare_switch_to() pokes the new stack directly, which will populate the
      mapping through vmalloc_fault().  I assume that we're getting lucky on
      non-PTI systems -- the old stack's TLB entry stays alive long enough to
      make it all the way through prepare_switch_to() and switch_to() so that we
      make it to a valid stack.
      
      Fixes: b50858ce
      
       ("x86/mm/vmalloc: Add 5-level paging support")
      Reported-and-tested-by: default avatarNeil Berrington <neil.berrington@datacore.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: stable@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
      5beda7d5
    • Dave Airlie's avatar
      Merge branch 'linux-4.15' of git://github.com/skeggsb/linux into drm-fixes · baa35cc3
      Dave Airlie authored
      Single irq regression fix
      * 'linux-4.15' of git://github.com/skeggsb/linux:
        drm/nouveau: Move irq setup/teardown to pci ctor/dtor
      baa35cc3
    • David Ahern's avatar
      net: vrf: Add support for sends to local broadcast address · 1e19c4d6
      David Ahern authored
      
      
      Sukumar reported that sends to the local broadcast address
      (255.255.255.255) are broken. Check for the address in vrf driver
      and do not redirect to the VRF device - similar to multicast
      packets.
      
      With this change sockets can use SO_BINDTODEVICE to specify an
      egress interface and receive responses. Note: the egress interface
      can not be a VRF device but needs to be the enslaved device.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=198521
      
      Reported-by: default avatarSukumar Gopalakrishnan <sukumarg1973@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e19c4d6
    • Francois Romieu's avatar
      r8169: fix memory corruption on retrieval of hardware statistics. · a78e9366
      Francois Romieu authored
      
      
      Hardware statistics retrieval hurts in tight invocation loops.
      
      Avoid extraneous write and enforce strict ordering of writes targeted to
      the tally counters dump area address registers.
      
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Tested-by: default avatarOliver Freyermuth <o.freyermuth@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a78e9366
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 993ca206
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
       "The main item is that we try to better handle the newer trackpoints on
        Lenovo devices that are now being produced by Elan/ALPS/NXP and only
        implement a small subset of the original IBM trackpoint controls"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Revert "Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01"
        Input: trackpoint - only expose supported controls for Elan, ALPS and NXP
        Input: trackpoint - force 3 buttons if 0 button is reported
        Input: xpad - add support for PDP Xbox One controllers
        Input: stmfts,s6sy671 - add SPDX identifier
      993ca206
    • Martin Brandenburg's avatar
      orangefs: fix deadlock; do not write i_size in read_iter · 6793f1c4
      Martin Brandenburg authored
      
      
      After do_readv_writev, the inode cache is invalidated anyway, so i_size
      will never be read.  It will be fetched from the server which will also
      know about updates from other machines.
      
      Fixes deadlock on 32-bit SMP.
      
      See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2
      
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6793f1c4
    • Lyude Paul's avatar
      drm/nouveau: Move irq setup/teardown to pci ctor/dtor · 0fd189a9
      Lyude Paul authored
      For a while we've been having issues with seemingly random interrupts
      coming from nvidia cards when resuming them. Originally the fix for this
      was thought to be just re-arming the MSI interrupt registers right after
      re-allocating our IRQs, however it seems a lot of what we do is both
      wrong and not even nessecary.
      
      This was made apparent by what appeared to be a regression in the
      mainline kernel that started introducing suspend/resume issues for
      nouveau:
      
              a0c9259d
      
       (irq/matrix: Spread interrupts on allocation)
      
      After this commit was introduced, we started getting interrupts from the
      GPU before we actually re-allocated our own IRQ (see references below)
      and assigned the IRQ handler. Investigating this turned out that the
      problem was not with the commit, but the fact that nouveau even
      free/allocates it's irqs before and after suspend/resume.
      
      For starters: drivers in the linux kernel haven't had to handle
      freeing/re-allocating their IRQs during suspend/resume cycles for quite
      a while now. Nouveau seems to be one of the few drivers left that still
      does this, despite the fact there's no reason we actually need to since
      disabling interrupts from the device side should be enough, as the
      kernel is already smart enough to know to disable host-side interrupts
      for us before going into suspend. Since we were tearing down our IRQs by
      hand however, that means there was a short period during resume where
      interrupts could be received before we re-allocated our IRQ which would
      lead to us getting an unhandled IRQ. Since we never handle said IRQ and
      re-arm the interrupt registers, this would cause us to miss all of the
      interrupts from the GPU and cause our init process to start timing out
      on anything requiring interrupts.
      
      So, since this whole setup/teardown every suspend/resume cycle is
      useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
      functions instead so they're only called at driver load and driver
      unload. This should fix most of the issues with pending interrupts on
      resume, along with getting suspend/resume for nouveau to work again.
      
      As well, this probably means we can also just remove the msi rearm call
      inside nvkm_pci_init(). But since our main focus here is to fix
      suspend/resume before 4.15, we'll save that for a later patch.
      
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      0fd189a9
    • Nicolas Dichtel's avatar
      net: don't call update_pmtu unconditionally · f15ca723
      Nicolas Dichtel authored
      Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
      "BUG: unable to handle kernel NULL pointer dereference at           (null)"
      
      Let's add a helper to check if update_pmtu is available before calling it.
      
      Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
      Fixes: a93bf0ff
      
       ("vxlan: update skb dst pmtu on tx path")
      CC: Roman Kapl <code@rkapl.cz>
      CC: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f15ca723
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6e20630e
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "Fix races and a potential use after free in the s390 cmma migration
        code"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: add proper locking for CMMA migration bitmap
      6e20630e
    • Linus Torvalds's avatar
      Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 525273fb
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "It's been reported recently that readdir can list stale entries under
        some conditions. Fix it."
      
      * tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix stale entries in readdir
      525273fb
  5. Jan 25, 2018
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman authored
      
      
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
      Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ee806d5
    • Peter Zijlstra's avatar
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra authored
      
      
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      efe951d3
    • Peter Zijlstra's avatar
      perf/core: Fix ctx::mutex deadlock · 0c7296ca
      Peter Zijlstra authored
      
      
      Lockdep noticed the following 3-way lockup scenario:
      
      	sys_perf_event_open()
      	  perf_event_alloc()
      	    perf_try_init_event()
       #0	      ctx = perf_event_ctx_lock_nested(1)
      	      perf_swevent_init()
      		swevent_hlist_get()
       #1		  mutex_lock(&pmus_lock)
      
      	perf_event_init_cpu()
       #1	  mutex_lock(&pmus_lock)
       #2	  mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #2	   mutex_lock()
       #0	   mutex_lock_nested()
      
      And while we need that perf_event_ctx_lock_nested() for HW PMUs such
      that they can iterate the sibling list, trying to match it to the
      available counters, the software PMUs need do no such thing. Exclude
      them.
      
      In particular the swevent triggers the above invertion, while the
      tpevent PMU triggers a more elaborate one through their event_mutex.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c7296ca
    • Peter Zijlstra's avatar
      perf/core: Fix another perf,trace,cpuhp lock inversion · 43fa87f7
      Peter Zijlstra authored
      
      
      Lockdep noticed the following 3-way lockup race:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1              mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                  static_key_enable()
      
       #2	do_cpu_up()
      	  perf_event_init_cpu()
       #3	    mutex_lock(&pmus_lock)
       #4	    mutex_lock(&ctx->mutex)
      
      	perf_ioctl()
       #4	  ctx = perf_event_ctx_lock()
      	  _perf_iotcl()
      	    ftrace_profile_set_filter()
       #0	      mutex_lock(&event_mutex)
      
      Fudge it for now by noting that the tracepoint state does not depend
      on the event <-> context relation. Ugly though :/
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43fa87f7
    • Peter Zijlstra's avatar
      perf/core: Fix lock inversion between perf,trace,cpuhp · 82d94856
      Peter Zijlstra authored
      
      
      Lockdep gifted us with noticing the following 4-way lockup scenario:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1             mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                 static_key_enable()
      
       #2     do_cpu_up()
                perf_event_init_cpu()
       #3         mutex_lock(&pmus_lock)
       #4         mutex_lock(&ctx->mutex)
      
              perf_event_task_disable()
                mutex_lock(&current->perf_event_mutex)
       #4       ctx = perf_event_ctx_lock()
       #5       perf_event_for_each_child()
      
              do_exit()
                task_work_run()
                  __fput()
                    perf_release()
                      perf_event_release_kernel()
       #4               mutex_lock(&ctx->mutex)
       #5               mutex_lock(&event->child_mutex)
                        free_event()
                          _free_event()
                            event->destroy() := perf_trace_destroy
       #0                     mutex_lock(&event_mutex);
      
      Fix that by moving the free_event() out from under the locks.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82d94856
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2018-01-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 7e3f8e91
      Dave Airlie authored
      Two vc4 fixes that were applied in the last day.
      One fixes a NULL dereference, and the other fixes
      a flickering bug.
      
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      
      * tag 'drm-misc-fixes-2018-01-24' of git://anongit.freedesktop.org/drm/drm-misc:
        drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
        drm/vc4: Flush the caches before the bin jobs, as well.
      7e3f8e91
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5b7d2796
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Avoid negative netdev refcount in error flow of xfrm state add, from
          Aviad Yehezkel.
      
       2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
          ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
          From Yossi Kuperman.
      
       3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
          failing to allocate an appropriate amount of headroom. From
          Guillaume Nault.
      
       4) Fix memory leak in vmxnet3 driver, from Neil Horman.
      
       5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
          from Wolfgang Bumiller.
      
       6) Restrict what kinds of sockets can be bound to the KCM multiplexer
          and also disallow when another layer has attached to the socket and
          made use of sk_user_data. From Tom Herbert.
      
       7) Fix use before init of IOTLB in vhost code, from Jason Wang.
      
       8) Correct STACR register write bit definition in IBM emac driver, from
          Ivan Mikhaylov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net/ibm/emac: wrong bit is used for STA control register write
        net/ibm/emac: add 8192 rx/tx fifo size
        vhost: do not try to access device IOTLB when not initialized
        vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
        i40e: flower: check if TC offload is enabled on a netdev
        qed: Free reserved MR tid
        qed: Remove reserveration of dpi for kernel
        kcm: Check if sk_user_data already set in kcm_attach
        kcm: Only allow TCP sockets to be attached to a KCM mux
        net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
        net: sched: em_nbyte: don't add the data offset twice
        mlxsw: spectrum_router: Don't log an error on missing neighbor
        vmxnet3: repair memory leak
        ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
        pppoe: take ->needed_headroom of lower device into account on xmit
        xfrm: fix boolean assignment in xfrm_get_type_offload
        xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
        xfrm: fix error flow in case of add state fails
        xfrm: Add SA to hardware at the end of xfrm_state_construct()
      5b7d2796
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f1654953
      Linus Torvalds authored
      Pull sparc bugfix from David Miller:
       "Sparc Makefile typo fix"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64
      f1654953
    • Ivan Mikhaylov's avatar
      net/ibm/emac: wrong bit is used for STA control register write · 624ca9c3
      Ivan Mikhaylov authored
      
      
      STA control register has areas of mode and opcodes for opeations. 18 bit is
      using for mode selection, where 0 is old MIO/MDIO access method and 1 is
      indirect access mode. 19-20 bits are using for setting up read/write
      operation(STA opcodes). In current state 'read' is set into old MIO/MDIO mode
      with 19 bit and write operation is set into 18 bit which is mode selection,
      not a write operation. To correlate write with read we set it into 20 bit.
      All those bit operations are MSB 0 based.
      
      Signed-off-by: default avatarIvan Mikhaylov <ivan@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      624ca9c3
    • Ivan Mikhaylov's avatar
      net/ibm/emac: add 8192 rx/tx fifo size · 45d6e545
      Ivan Mikhaylov authored
      
      
      emac4syn chips has availability to use 8192 rx/tx fifo buffer sizes,
      in current state if we set it up in dts 8192 as example, we will get
      only 2048 which may impact on network speed.
      
      Signed-off-by: default avatarIvan Mikhaylov <ivan@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45d6e545
    • Nick Dyer's avatar
      Revert "Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01" · 060403f3
      Nick Dyer authored
      Since the sysfs attribute hangs off the RMI bus, which doesn't go away during
      firmware flash, it needs to be explicitly removed, otherwise we would try and
      register the same attribute twice.
      
      This reverts commit 36a44af5
      
      .
      
      Signed-off-by: default avatarNick Dyer <nick@shmanahar.org>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      060403f3
    • Jason Wang's avatar
      vhost: do not try to access device IOTLB when not initialized · 6f3180af
      Jason Wang authored
      The code will try to access dev->iotlb when processing
      VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead
      to NULL pointer dereference. Fixes this by check dev->iotlb before.
      
      Fixes: 6b1e6cc7
      
       ("vhost: new device IOTLB API")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f3180af
    • Jason Wang's avatar
      vhost: use mutex_lock_nested() in vhost_dev_lock_vqs() · e9cb4239
      Jason Wang authored
      We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to
      hold mutexes of all virtqueues. This may confuse lockdep to report a
      possible deadlock because of trying to hold locks belong to same
      class. Switch to use mutex_lock_nested() to avoid false positive.
      
      Fixes: 6b1e6cc7
      
       ("vhost: new device IOTLB API")
      Reported-by: default avatar <syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9cb4239