Skip to content
  1. Jan 18, 2018
    • Thomas Gleixner's avatar
      irq/matrix: Spread interrupts on allocation · a0c9259d
      Thomas Gleixner authored
      Keith reported an issue with vector space exhaustion on a server machine
      which is caused by the i40e driver allocating 168 MSI interrupts when the
      driver is initialized, even when most of these interrupts are not used at
      all.
      
      The x86 vector allocation code tries to avoid the immediate allocation with
      the reservation mode, but the card uses MSI and does not support MSI entry
      masking, which prevents reservation mode and requires immediate vector
      allocation.
      
      The matrix allocator is a bit naive and prefers the first CPU in the
      cpumask which describes the possible target CPUs for an allocation. That
      results in allocating all 168 vectors on CPU0 which later causes vector
      space exhaustion when the NVMe driver tries to allocate managed interrupts
      on each CPU for the per CPU queues.
      
      Avoid this by finding the CPU which has the lowest vector allocation count
      to spread out the non managed interrupt accross the possible target CPUs.
      
      Fixes: 2f75d9e1
      
       ("genirq: Implement bitmap matrix allocator")
      Reported-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801171557330.1777@nanos
      a0c9259d
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d966eb4
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - A rather involved set of memory hardware encryption fixes to
           support the early loading of microcode files via the initrd. These
           are larger than what we normally take at such a late -rc stage, but
           there are two mitigating factors: 1) much of the changes are
           limited to the SME code itself 2) being able to early load
           microcode has increased importance in the post-Meltdown/Spectre
           era.
      
         - An IRQ vector allocator fix
      
         - An Intel RDT driver use-after-free fix
      
         - An APIC driver bug fix/revert to make certain older systems boot
           again
      
         - A pkeys ABI fix
      
         - TSC calibration fixes
      
         - A kdump fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/vector: Fix off by one in error path
        x86/intel_rdt/cqm: Prevent use after free
        x86/mm: Encrypt the initrd earlier for BSP microcode update
        x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
        x86/mm: Centralize PMD flags in sme_encrypt_kernel()
        x86/mm: Use a struct to reduce parameters for SME PGD mapping
        x86/mm: Clean up register saving in the __enc_copy() assembly code
        x86/idt: Mark IDT tables __initconst
        Revert "x86/apic: Remove init_bsp_APIC()"
        x86/mm/pkeys: Fix fill_sig_info_pkey
        x86/tsc: Print tsc_khz, when it differs from cpu_khz
        x86/tsc: Fix erroneous TSC rate on Skylake Xeon
        x86/tsc: Future-proof native_calibrate_tsc()
        kdump: Write the correct address of mem_section into vmcoreinfo
      1d966eb4
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9a4ba2ab
      Linus Torvalds authored
      Pull scheduler fix from Ingo Molnar:
       "A delayacct statistics correctness fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        delayacct: Account blkio completion on the correct task
      9a4ba2ab
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7dfda84d
      Linus Torvalds authored
      Pull x86 perf fix from Ingo Molnar:
       "An Intel RAPL events fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
      7dfda84d
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b8c22594
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Two futex fixes: a input parameters robustness fix, and futex race
        fixes"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Prevent overflow by strengthen input validation
        futex: Avoid violating the 10th rule of futex
      b8c22594
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 88dc7fca
      Linus Torvalds authored
      Pull x86 pti bits and fixes from Thomas Gleixner:
       "This last update contains:
      
         - An objtool fix to prevent a segfault with the gold linker by
           changing the invocation order. That's not just for gold, it's a
           general robustness improvement.
      
         - An improved error message for objtool which spares tearing hairs.
      
         - Make KASAN fail loudly if there is not enough memory instead of
           oopsing at some random place later
      
         - RSB fill on context switch to prevent RSB underflow and speculation
           through other units.
      
         - Make the retpoline/RSB functionality work reliably for both Intel
           and AMD
      
         - Add retpoline to the module version magic so mismatch can be
           detected
      
         - A small (non-fix) update for cpufeatures which prevents cpu feature
           clashing for the upcoming extra mitigation bits to ease
           backporting"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/t...
      88dc7fca
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dd43f346
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A one-liner fix which prevents deferrable timers becoming stale when
        the system does not switch into NOHZ mode"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers: Unconditionally check deferrable base
      dd43f346
  2. Jan 17, 2018
    • Thomas Gleixner's avatar
      x86/apic/vector: Fix off by one in error path · 45d55e7b
      Thomas Gleixner authored
      Keith reported the following warning:
      
      WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
        x86_vector_free_irqs+0xa1/0x180
        x86_vector_alloc_irqs+0x1e4/0x3a0
        msi_domain_alloc+0x62/0x130
      
      The reason for this is that if the vector allocation fails the error
      handling code tries to free the failed vector as well, which causes the
      above imbalance warning to trigger.
      
      Adjust the error path to handle this correctly.
      
      Fixes: b5dc8e6c
      
       ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
      Reported-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
      45d55e7b
    • Thomas Gleixner's avatar
      x86/intel_rdt/cqm: Prevent use after free · d4792441
      Thomas Gleixner authored
      intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
      proceeds accessing it.
      
       BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
       Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
       find_first_bit+0x1f/0x80
       has_busy_rmid+0x47/0x70
       intel_rdt_offline_cpu+0x4b4/0x510
      
       Freed by task 195:
       kfree+0x94/0x1a0
       intel_rdt_offline_cpu+0x17d/0x510
      
      Do the teardown first and then free memory.
      
      Fixes: 24247aee
      
       ("x86/intel_rdt/cqm: Improve limbo list processing")
      Reported-by: default avatarJoseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Peter Zilstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: "Roderick W. Smith" <rod.smith@canonical.com>
      Cc: 1733662@bugs.launchpad.net
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
      d4792441
    • Andi Kleen's avatar
      module: Add retpoline tag to VERMAGIC · 6cfb521a
      Andi Kleen authored
      
      
      Add a marker for retpoline to the module VERMAGIC. This catches the case
      when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
      making it insecure.
      
      It doesn't handle the case when retpoline has been runtime disabled.  Even
      in this case the match of the retcompile status will be enforced.  This
      implies that even with retpoline run time disabled all modules loaded need
      to be recompiled.
      
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: rusty@rustcorp.com.au
      Cc: arjan.van.de.ven@intel.com
      Cc: jeyu@kernel.org
      Cc: torvalds@linux-foundation.org
      Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
      6cfb521a
    • Paolo Bonzini's avatar
      x86/cpufeature: Move processor tracing out of scattered features · 4fdec203
      Paolo Bonzini authored
      
      
      Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
      so do not duplicate it in the scattered features word.
      
      Besides being more tidy, this will be useful for KVM when it presents
      processor tracing to the guests.  KVM selects host features that are
      supported by both the host kernel (depending on command line options,
      CPU errata, or whatever) and KVM.  Whenever a full feature word exists,
      KVM's code is written in the expectation that the CPUID bit number
      matches the X86_FEATURE_* bit number, but this is not the case for
      X86_FEATURE_INTEL_PT.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luwei Kang <luwei.kang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4fdec203
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 8cbab92d
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "We had a few more items creep up over the last week. Given we are in
        -rc8, these are obviously limited to bugs that have a big downside and
        for which we are certain of the fix.
      
        The first is a straight up oops bug that all you have to do is read
        the code to see it's a guaranteed 100% oops bug.
      
        The second is a use-after-free issue. We get away lucky if the queue
        we are shutting down is empty, but if it isn't, we can end up oopsing.
        We really need to drain the queue before destroying it.
      
        The final one is an issue with bad user input causing us to access our
        port array out of bounds. While fixing the array out of bounds issue,
        it was noticed that the original code did the same thing twice (the
        call to rdma_ah_set_port_num()), so its removal is not balanced by a
        readd elsewhere, it was already where it needed to be in addition to
        where it didn't need to be.
      
        Summary:
      
         - Oops fix in hfi1 driver
      
         - use-after-free issue in iser-target
      
         - use of user supplied array index without proper checking"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx5: Fix out-of-bound access while querying AH
        IB/hfi1: Prevent a NULL dereference
        iser-target: Fix possible use-after-free in connection establishment error
      8cbab92d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b45a53be
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers.
      
       2) Memory leak in key_notify_policy(), from Steffen Klassert.
      
       3) Fix overflow with bpf arrays, from Daniel Borkmann.
      
       4) Fix RDMA regression with mlx5 due to mlx5 no longer using
          pci_irq_get_affinity(), from Saeed Mahameed.
      
       5) Missing RCU read locking in nl80211_send_iface() when it calls
          ieee80211_bss_get_ie(), from Dominik Brodowski.
      
       6) cfg80211 should check dev_set_name()'s return value, from Johannes
          Berg.
      
       7) Missing module license tag in 9p protocol, from Stephen Hemminger.
      
       8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike
          Maloney.
      
       9) Fix endless loop in netlink extack code, from David Ahern.
      
      10) TLS socket layer sets inverted error codes, resulting in an endless
          loop. From Robert Hering.
      
      11) Revert openvswitch erspan tunnel support, it's mis-designed and we
          need to kill it before it goes into a real release. From William Tu.
      
      12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
        net, sched: fix panic when updating miniq {b,q}stats
        qed: Fix potential use-after-free in qed_spq_post()
        nfp: use the correct index for link speed table
        lan78xx: Fix failure in USB Full Speed
        sctp: do not allow the v4 socket to bind a v4mapped v6 address
        sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
        sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
        ibmvnic: Fix pending MAC address changes
        netlink: extack: avoid parenthesized string constant warning
        ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
        net: Allow neigh contructor functions ability to modify the primary_key
        sh_eth: fix dumping ARSTR
        Revert "openvswitch: Add erspan tunnel support."
        net/tls: Fix inverted error codes to avoid endless loop
        ipv6: ip6_make_skb() needs to clear cork.base.dst
        sctp: avoid compiler warning on implicit fallthru
        net: ipv4: Make "ip route get" match iif lo rules again.
        netlink: extack needs to be reset each time through loop
        tipc: fix a memory leak in tipc_nl_node_get_link()
        ipv6: fix udpv6 sendmsg crash caused by too small MTU
        ...
      b45a53be
    • Linus Torvalds's avatar
      Merge tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 41aa5e5d
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A few small last-minute fixes that should sneak into 4.15:
      
         - remove a spurious WARN_ON() triggered by syzkaller
      
         - fix for ioctl races in ALSA sequencer
      
         - two trivial HD-audio fixup entries"
      
      * tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: seq: Make ioctls race-free
        ALSA: pcm: Remove yet superfluous WARN_ON()
        ALSA: hda - Apply the existing quirk to iMac 14,1
        ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
      41aa5e5d
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 921d4f67
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Bring back context level recursive protection in ring buffer.
      
         The simpler counter protection failed, due to a path when tracing
         with trace_clock_global() as it could not be reentrant and depended
         on the ring buffer recursive protection to keep that from happening.
      
       - Prevent branch profiling when FORTIFY_SOURCE is enabled.
      
         It causes 50 - 60 MB in warning messages. Branch profiling should
         never be run on production systems, so there's no reason that it
         needs to be enabled with FORTIFY_SOURCE.
      
      * tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
        ring-buffer: Bring back context level recursive checks
      921d4f67
    • Daniel Borkmann's avatar
      net, sched: fix panic when updating miniq {b,q}stats · 81d947e2
      Daniel Borkmann authored
      While working on fixing another bug, I ran into the following panic
      on arm64 by simply attaching clsact qdisc, adding a filter and running
      traffic on ingress to it:
      
        [...]
        [  178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000
        [  178.197314] Mem abort info:
        [  178.200121]   ESR = 0x96000004
        [  178.203168]   Exception class = DABT (current EL), IL = 32 bits
        [  178.209095]   SET = 0, FnV = 0
        [  178.212157]   EA = 0, S1PTW = 0
        [  178.215288] Data abort info:
        [  178.218175]   ISV = 0, ISS = 0x00000004
        [  178.222019]   CM = 0, WnR = 0
        [  178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33
        [  178.231531] [0000810fb501f000] *pgd=0000000000000000
        [  178.236508] Internal error: Oops: 96000004 [#1] SMP
        [...]
        [  178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G        W        4.15.0-rc7+ #5
        [  178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
        [  178.326887] pstate: 60400005 (nZCv daif +PAN -UAO)
        [  178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
        [  178.336728] lr : __netif_receive_skb+0x28/0x78
        [  178.341161] sp : ffff00002344b750
        [  178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580
        [  178.349769] x27: 0000000000000000 x26: ffff000009378000
        [...]
        [  178.418715] x1 : 0000000000000054 x0 : 0000000000000000
        [  178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
        [  178.430537] Call trace:
        [  178.432976]  __netif_receive_skb_core+0x49c/0xac8
        [  178.437670]  __netif_receive_skb+0x28/0x78
        [  178.441757]  process_backlog+0x9c/0x160
        [  178.445584]  net_rx_action+0x2f8/0x3f0
        [...]
      
      Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
      which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
      Problem is that this cannot work since they are actually set up right after
      the qdisc ->init() callback in qdisc_create(), so first packet going into
      sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
      therefore panic.
      
      In order to fix this, allocation of {b,q}stats needs to happen before we
      call into ->init(). In net-next, there's already such option through commit
      d59f5ffa ("net: sched: a dflt qdisc may be used with per cpu stats").
      However, the bug needs to be fixed in net still for 4.15. Thus, include
      these bits to reduce any merge churn and reuse the static_flags field to
      set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
      there is no other user left. Prashant Bhole ran into the same issue but
      for net-next, thus adding him below as well as co-author. Same issue was
      also reported by Sandipan Das when using bcc.
      
      Fixes: 46209401
      
       ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
      Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html
      Reported-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Co-authored-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Co-authored-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d947e2
    • Roland Dreier's avatar
      qed: Fix potential use-after-free in qed_spq_post() · 70eeff66
      Roland Dreier authored
      
      
      We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
      calling qed_spq_add_entry().  The test is fine is the mode is EBLOCK,
      but if it isn't then qed_spq_add_entry() might kfree(p_ent).
      
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70eeff66
    • Jakub Kicinski's avatar
      nfp: use the correct index for link speed table · 0d9c9f0f
      Jakub Kicinski authored
      sts variable is holding link speed as well as state.  We should
      be using ls to index into ls_to_ethtool.
      
      Fixes: 265aeb51
      
       ("nfp: add support for .get_link_ksettings()")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d9c9f0f
    • Yuiko Oshino's avatar
      lan78xx: Fix failure in USB Full Speed · a5b1379a
      Yuiko Oshino authored
      Fix initialize the uninitialized tx_qlen to an appropriate value when USB
      Full Speed is used.
      
      Fixes: 55d7de9d
      
       ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarYuiko Oshino <yuiko.oshino@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b1379a
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-01-15' of... · 161f72ed
      David S. Miller authored
      
      Merge tag 'mac80211-for-davem-2018-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      More fixes:
       * hwsim:
          - properly flush deletion works at module unload
          - validate # of channels passed from userspace
       * cfg80211:
          - fix RCU locking regression
          - initialize on-stack channel data for nl80211 event
          - check dev_set_name() return value
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161f72ed
    • Xin Long's avatar
      sctp: do not allow the v4 socket to bind a v4mapped v6 address · c5006b8a
      Xin Long authored
      The check in sctp_sockaddr_af is not robust enough to forbid binding a
      v4mapped v6 addr on a v4 socket.
      
      The worse thing is that v4 socket's bind_verify would not convert this
      v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
      socket bound a v6 addr.
      
      This patch is to fix it by doing the common sa.sa_family check first,
      then AF_INET check for v4mapped v6 addrs.
      
      Fixes: 7dab83de
      
       ("sctp: Support ipv6only AF_INET6 sockets.")
      Reported-by: default avatar <syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5006b8a
    • Xin Long's avatar
      sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf · a0ff6600
      Xin Long authored
      After commit cea0cc80 ("sctp: use the right sk after waking up from
      wait_buf sleep"), it may change to lock another sk if the asoc has been
      peeled off in sctp_wait_for_sndbuf.
      
      However, the asoc's new sk could be already closed elsewhere, as it's in
      the sendmsg context of the old sk that can't avoid the new sk's closing.
      If the sk's last one refcnt is held by this asoc, later on after putting
      this asoc, the new sk will be freed, while under it's own lock.
      
      This patch is to revert that commit, but fix the old issue by returning
      error under the old sk's lock.
      
      Fixes: cea0cc80
      
       ("sctp: use the right sk after waking up from wait_buf sleep")
      Reported-by: default avatar <syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0ff6600
    • Xin Long's avatar
      sctp: reinit stream if stream outcnt has been change by sinit in sendmsg · 625637bf
      Xin Long authored
      After introducing sctp_stream structure, sctp uses stream->outcnt as the
      out stream nums instead of c.sinit_num_ostreams.
      
      However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
      in sctp_sendmsg. At that moment, stream->outcnt is still using previous
      value. If it's value is not updated, the sinit_num_ostreams of sinit could
      not really work.
      
      This patch is to fix it by updating stream->outcnt and reiniting stream
      if stream outcnt has been change by sinit in sendmsg.
      
      Fixes: a8386317
      
       ("sctp: prepare asoc stream for stream reconf")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      625637bf
    • Thomas Falcon's avatar
      ibmvnic: Fix pending MAC address changes · 3d166130
      Thomas Falcon authored
      Due to architecture limitations, the IBM VNIC client driver is unable
      to perform MAC address changes unless the device has "logged in" to
      its backing device. Currently, pending MAC changes are handled before
      login, resulting in an error and failure to change the MAC address.
      Moving that chunk to the end of the ibmvnic_login function, when we are
      sure that it was successful, fixes that.
      
      The MAC address can be changed when the device is up or down, so
      only check if the device is in a "PROBED" state before setting the
      MAC address.
      
      Fixes: c26eba03
      
       ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Reviewed-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d166130
  3. Jan 16, 2018
    • Josh Snyder's avatar
      delayacct: Account blkio completion on the correct task · c96f5471
      Josh Snyder authored
      Before commit:
      
        e33a9bba ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
      
      delayacct_blkio_end() was called after context-switching into the task which
      completed I/O.
      
      This resulted in double counting: the task would account a delay both waiting
      for I/O and for time spent in the runqueue.
      
      With e33a9bba
      
      , delayacct_blkio_end() is called by try_to_wake_up().
      In ttwu, we have not yet context-switched. This is more correct, in that
      the delay accounting ends when the I/O is complete.
      
      But delayacct_blkio_end() relies on 'get_current()', and we have not yet
      context-switched into the task whose I/O completed. This results in the
      wrong task having its delay accounting statistics updated.
      
      Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
      so that it can update the statistics of the correct task.
      
      Signed-off-by: default avatarJosh Snyder <joshs@netflix.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Brendan Gregg <bgregg@netflix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-block@vger.kernel.org
      Fixes: e33a9bba
      
       ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
      Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c96f5471
    • Tom Lendacky's avatar
      x86/mm: Encrypt the initrd earlier for BSP microcode update · 107cd253
      Tom Lendacky authored
      
      
      Currently the BSP microcode update code examines the initrd very early
      in the boot process.  If SME is active, the initrd is treated as being
      encrypted but it has not been encrypted (in place) yet.  Update the
      early boot code that encrypts the kernel to also encrypt the initrd so
      that early BSP microcode updates work.
      
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      107cd253
    • Tom Lendacky's avatar
      x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption · cc5f01e2
      Tom Lendacky authored
      
      
      In preparation for encrypting more than just the kernel, the encryption
      support in sme_encrypt_kernel() needs to support 4KB page aligned
      encryption instead of just 2MB large page aligned encryption.
      
      Update the routines that populate the PGD to support non-2MB aligned
      addresses.  This is done by creating PTE page tables for the start
      and end portion of the address range that fall outside of the 2MB
      alignment.  This results in, at most, two extra pages to hold the
      PTE entries for each mapping of a range.
      
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cc5f01e2
    • Tom Lendacky's avatar
      x86/mm: Centralize PMD flags in sme_encrypt_kernel() · 2b5d00b6
      Tom Lendacky authored
      
      
      In preparation for encrypting more than just the kernel during early
      boot processing, centralize the use of the PMD flag settings based
      on the type of mapping desired.  When 4KB aligned encryption is added,
      this will allow either PTE flags or large page PMD flags to be used
      without requiring the caller to adjust.
      
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2b5d00b6
    • Tom Lendacky's avatar
      x86/mm: Use a struct to reduce parameters for SME PGD mapping · bacf6b49
      Tom Lendacky authored
      
      
      In preparation for follow-on patches, combine the PGD mapping parameters
      into a struct to reduce the number of function arguments and allow for
      direct updating of the next pagetable mapping area pointer.
      
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bacf6b49
    • Tom Lendacky's avatar
      x86/mm: Clean up register saving in the __enc_copy() assembly code · 13038801
      Tom Lendacky authored
      
      
      Clean up the use of PUSH and POP and when registers are saved in the
      __enc_copy() assembly function in order to improve the readability of the code.
      
      Move parameter register saving into general purpose registers earlier
      in the code and move all the pushes to the beginning of the function
      with corresponding pops at the end.
      
      We do this to prepare fixes.
      
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      13038801
    • Josh Poimboeuf's avatar
      objtool: Improve error message for bad file argument · 385d11b1
      Josh Poimboeuf authored
      
      
      If a nonexistent file is supplied to objtool, it complains with a
      non-helpful error:
      
        open: No such file or directory
      
      Improve it to:
      
        objtool: Can't open 'foo': No such file or directory
      
      Reported-by: default avatarMarkus <M4rkusXXL@web.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      385d11b1
    • Josh Poimboeuf's avatar
      objtool: Fix seg fault with gold linker · 2a0098d7
      Josh Poimboeuf authored
      
      
      Objtool segfaults when the gold linker is used with
      CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
      
      With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
      being passed to objtool.  The gold linker seems to strip unused ELF
      symbols by default, which confuses objtool and causes the seg fault when
      it's trying to generate ORC metadata.
      
      Objtool should really be running immediately after GCC anyway, without a
      linker call in between.  Change the makefile ordering so that objtool is
      called before the linker.
      
      Reported-and-tested-by: default avatarMarkus <M4rkusXXL@web.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ee9f8fce
      
       ("x86/unwind: Add the ORC unwinder")
      Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2a0098d7
    • Leon Romanovsky's avatar
      RDMA/mlx5: Fix out-of-bound access while querying AH · ae59c3f0
      Leon Romanovsky authored
      The rdma_ah_find_type() accesses the port array based on an index
      controlled by userspace. The existing bounds check is after the first use
      of the index, so userspace can generate an out of bounds access, as shown
      by the KASN report below.
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
      Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
      
      CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xe9/0x18f
       print_address_description+0xa2/0x350
       kasan_report+0x3a5/0x400
       to_rdma_ah_attr+0xa8/0x3b0
       mlx5_ib_query_qp+0xd35/0x1330
       ib_query_qp+0x8a/0xb0
       ib_uverbs_query_qp+0x237/0x7f0
       ib_uverbs_write+0x617/0xd80
       __vfs_write+0xf7/0x500
       vfs_write+0x149/0x310
       SyS_write+0xca/0x190
       entry_SYSCALL_64_fastpath+0x18/0x85
      RIP: 0033:0x7fe9c7a275a0
      RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
      RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
      RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
      R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
      R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560
      
      Allocated by task 1:
       __kmalloc+0x3f9/0x430
       alloc_mad_private+0x25/0x50
       ib_mad_post_receive_mads+0x204/0xa60
       ib_mad_init_device+0xa59/0x1020
       ib_register_device+0x83a/0xbc0
       mlx5_ib_add+0x50e/0x5c0
       mlx5_add_device+0x142/0x410
       mlx5_register_interface+0x18f/0x210
       mlx5_ib_init+0x56/0x63
       do_one_initcall+0x15b/0x270
       kernel_init_freeable+0x2d8/0x3d0
       kernel_init+0x14/0x190
       ret_from_fork+0x24/0x30
      
      Freed by task 0:
      (stack is not available)
      
      The buggy address belongs to the object at ffff880019ae2000
       which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 104 bytes to the right of
       512-byte region [ffff880019ae2000, ffff880019ae2200)
      The buggy address belongs to the page:
      page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      flags: 0x4000000000008100(slab|head)
      raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
      raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                ^
       ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      Disabling lock debugging due to kernel taint
      
      Cc: <stable@vger.kernel.org>
      Fixes: 44c58487
      
       ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ae59c3f0
    • Johannes Berg's avatar
      netlink: extack: avoid parenthesized string constant warning · 6311b7ce
      Johannes Berg authored
      
      
      NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
      in newer versions of gcc:
        warning: array initialized from parenthesized string constant
      
      Just remove the parentheses, they're not needed in this context since
      anyway since there can be no operator precendence issues or similar.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6311b7ce
    • David S. Miller's avatar
      Merge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY' · db9ca5ca
      David S. Miller authored
      Jim Westfall says:
      
      ====================
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
      
      This used to be the previous behavior in older kernels but became broken in
      a263b309 (ipv4: Make neigh lookups directly in output packet path)
      and then later removed because it was broken in 0bb4087c
      
       (ipv4: Fix neigh
      lookup keying over loopback/point-to-point devices)
      
      Not having this results in there being an arp entry for every remote ip
      address that the device talks to.  Given a fairly active device it can
      cause the arp table to become huge and/or having to add/purge large number
      of entires to keep within table size thresholds.
      
      $ ip -4 neigh show nud noarp | grep tun | wc -l
      55850
      
      $ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
      arp_cach|arp_cach|arp_cach|
       entries|  allocs|destroys|
         81493|620166816|620126069|
        101867|   10186|       0|
        113854|    5993|       0|
        118773|    2459|       0|
         27937|   18579|   63998|
         39256|    5659|       0|
         56231|    8487|       0|
         65602|    4685|       0|
         79697|    7047|       0|
         90733|    5517|       0|
      
      v2:
       - fixes coding style issues
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db9ca5ca
    • Jim Westfall's avatar
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY · cd9ff4de
      Jim Westfall authored
      Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
      to avoid making an entry for every remote ip the device needs to talk to.
      
      This used the be the old behavior but became broken in a263b309
      (ipv4: Make neigh lookups directly in output packet path) and later removed
      in 0bb4087c
      
       (ipv4: Fix neigh lookup keying over loopback/point-to-point
      devices) because it was broken.
      
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd9ff4de
    • Jim Westfall's avatar
      net: Allow neigh contructor functions ability to modify the primary_key · 096b9854
      Jim Westfall authored
      
      
      Use n->primary_key instead of pkey to account for the possibility that a neigh
      constructor function may have modified the primary_key value.
      
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      096b9854
    • Sergei Shtylyov's avatar
      sh_eth: fix dumping ARSTR · 17d0fb0c
      Sergei Shtylyov authored
      ARSTR  is always located at the start of the TSU register region, thus
      using add_reg()  instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
      causes EDMR or EDSR (depending on the register layout) to be dumped instead
      of ARSTR.  Use the correct condition/macro there...
      
      Fixes: 6b4b4fea
      
       ("sh_eth: Implement ethtool register dump operations")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17d0fb0c
    • William Tu's avatar
      Revert "openvswitch: Add erspan tunnel support." · 95a33208
      William Tu authored
      This reverts commit ceaa001a
      
      .
      
      The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
      as a nested attribute to support all ERSPAN v1 and v2's fields.
      The current attr is a be32 supporting only one field.  Thus, this
      patch reverts it and later patch will redo it using nested attr.
      
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Cc: Jiri Benc <jbenc@redhat.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95a33208
    • r.hering@avm.de's avatar
      net/tls: Fix inverted error codes to avoid endless loop · 30be8f8d
      r.hering@avm.de authored
      
      
      sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
      Socket error codes must be inverted by Kernel TLS before returning because
      they are stored with positive sign. If returned non-inverted they are
      interpreted as number of bytes sent, causing endless looping of the
      splice mechanic behind sendfile().
      
      Signed-off-by: default avatarRobert Hering <r.hering@avm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30be8f8d