Skip to content
  1. Mar 27, 2020
    • David Howells's avatar
      afs: Fix unpinned address list during probing · 9efcc4a1
      David Howells authored
      When it's probing all of a fileserver's interfaces to find which one is
      best to use, afs_do_probe_fileserver() takes a lock on the server record
      and notes the pointer to the address list.
      
      It doesn't, however, pin the address list, so as soon as it drops the
      lock, there's nothing to stop the address list from being freed under
      us.
      
      Fix this by taking a ref on the address list inside the locked section
      and dropping it at the end of the function.
      
      Fixes: 3bf0fb6f
      
       ("afs: Probe multiple fileservers simultaneously")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9efcc4a1
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client · 60268940
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A patch for a rather old regression in fullness handling and two
        memory leak fixes, marked for stable"
      
      * tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client:
        ceph: fix memory leak in ceph_cleanup_snapid_map()
        libceph: fix alloc_msg_with_page_vector() memory leaks
        ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
      60268940
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a53071bd
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "x86 bug fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: X86: Narrow down the IPI fastpath to single target IPI
        KVM: LAPIC: Also cancel preemption timer when disarm LAPIC timer
        KVM: VMX: don't allow memory operands for inline asm that modifies SP
        KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context
        KVM: SVM: Issue WBINVD after deactivating an SEV guest
        KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is available
        KVM: x86: remove bogus user-triggerable WARN_ON
      a53071bd
    • Linus Torvalds's avatar
      MAINTAINERS: fix bad file pattern · 23cb8490
      Linus Torvalds authored
      
      
      Testing 'parse-maintainers' due to the previous commit shows a bad file
      pattern for the "TI VPE/CAL DRIVERS" entry in the MAINTAINERS file.
      
      There's also a lot of mis-ordered entries, but I'm still a bit nervous
      about the inevitable and annoying merge problems it would probably cause
      to fix them up.
      
      The MAINTAINERS file is one of my least favorite files due to being huge
      and centralized, but fixing it is also horribly painful for that reason.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23cb8490
    • Joe Perches's avatar
      parse-maintainers: Do not sort section content by default · 5cdbec10
      Joe Perches authored
      
      
      Add an --order switch to control section reordering.
      Default for --order is off.
      
      Change the default ordering to a slightly more sensible:
      
      M:  Person acting as a maintainer
      R:  Person acting as a patch reviewer
      L:  Mailing list where patches should be sent
      S:  Maintenance status
      W:  URI for general information
      Q:  URI for patchwork tracking
      B:  URI for bug tracking/submission
      C:  URI for chat
      P:  URI or file for subsystem specific coding styles
      T:  SCM tree type and location
      F:  File and directory pattern
      X:  File and directory exclusion pattern
      N:  File glob
      K:  Keyword - patch content regex
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5cdbec10
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 9420e8ad
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "A small set of late-rc patches, mostly fixes for various crashers,
        some syzkaller fixes and a mlx5 HW limitation:
      
         - Several MAINTAINERS updates
      
         - Memory leak regression in ODP
      
         - Several fixes for syzkaller related crashes. Google recently taught
           syzkaller to create the software RDMA devices
      
         - Crash fixes for HFI1
      
         - Several fixes for mlx5 crashes
      
         - Prevent unprivileged access to an unsafe mlx5 HW resource"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx5: Block delay drop to unprivileged users
        RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
        RDMA/core: Ensure security pkey modify is not lost
        MAINTAINERS: Clean RXE section and add Zhu as RXE maintainer
        IB/hfi1: Ensure pq is not left on waitlist
        IB/rdmavt: Free kernel completion queue when done
        RDMA/mad: Do not crash if the rdma device does not have a umad interface
        RDMA/core: Fix missing error check on dev_set_name()
        RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
        RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
        MAINTAINERS: Update maintainers for HISILICON ROCE DRIVER
        RDMA/odp: Fix leaking the tgid for implicit ODP
      9420e8ad
  2. Mar 26, 2020
    • Wanpeng Li's avatar
      KVM: X86: Narrow down the IPI fastpath to single target IPI · e1be9ac8
      Wanpeng Li authored
      
      
      The original single target IPI fastpath patch forgot to filter the
      ICR destination shorthand field. Multicast IPI is not suitable for
      this feature since wakeup the multiple sleeping vCPUs will extend
      the interrupt disabled time, it especially worse in the over-subscribe
      and VM has a little bit more vCPUs scenario. Let's narrow it down to
      single target IPI.
      
      Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
      running cyclictest on all vCPUs, w/ this patch, the avg score
      of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
      sched yield are disabled during testing to avoid the disturb).
      
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1585189202-1708-3-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e1be9ac8
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1b649e0b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix deadlock in bpf_send_signal() from Yonghong Song.
      
       2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
      
       3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
      
       4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
      
       5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
          Wang.
      
       6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
      
       7) cls_route removes the wrong filter during change operations, from
          Cong Wang.
      
       8) Reject unrecognized request flags in ethtool netlink code, from
          Michal Kubecek.
      
       9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
          Doug Berger.
      
      10) Don't leak ct zone template in act_ct during replace, from Paul
          Blakey.
      
      11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
          Blakey.
      
      12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
          Lakkireddy.
      
      13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
          Dumazet.
      
      14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
          also from Eric Dumazet.
      
      15) Restrict macsec to ethernet devices, from Willem de Bruijn.
      
      16) Fix reference leak in some ethtool *_SET handlers, from Michal
          Kubecek.
      
      17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
          Kallweit.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
        net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
        net: ena: Add PCI shutdown handler to allow safe kexec
        selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
        selftests/net: add missing tests to Makefile
        r8169: re-enable MSI on RTL8168c
        net: phy: mdio-bcm-unimac: Fix clock handling
        cxgb4/ptp: pass the sign of offset delta in FW CMD
        net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
        net: cbs: Fix software cbs to consider packet sending time
        net/mlx5e: Do not recover from a non-fatal syndrome
        net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
        net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
        net/mlx5e: Enhance ICOSQ WQE info fields
        net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
        selftests: netfilter: add nfqueue test case
        netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
        netfilter: nft_fwd_netdev: validate family and chain type
        netfilter: nft_set_rbtree: Detect partial overlaps on insertion
        netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
        netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
        ...
      1b649e0b
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1dfb642b
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
      
       - One core quirk by myself to fix the .irq_disable() semantics when the
         gpiolib core takes over this callback.
      
       - The rest is an elaborate series of four patches fixing Intel laptop
         ACPI wakeup quirks.
      
      * tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
        gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
        gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
        gpiolib: Fix irq_disable() semantics
      1dfb642b
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-03-25' of... · 2910594f
      David S. Miller authored
      
      Merge tag 'wireless-drivers-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.6
      
      Fourth, and last, set of fixes for v5.6. Just two important fixes to
      iwlwifi regressions.
      
      iwlwifi
      
      * fix GEO_TX_POWER_LIMIT command on certain devices which caused
        firmware to crash during initialisation
      
      * add back device ids for three devices which were accidentally
        removed
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2910594f
    • Pablo Neira Ayuso's avatar
      net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b
      Pablo Neira Ayuso authored
      net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
          net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
            pkt->skb->tc_redirected = 1;
                    ^~
          net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
            pkt->skb->tc_from_ingress = 1;
                    ^~
      
      To avoid a direct dependency with tc actions from netfilter, wrap the
      redirect bits around CONFIG_NET_REDIRECT and move helpers to
      include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
      only existing client of these bits in the tree.
      
      This patch adds skb_set_redirected() that sets on the redirected bit
      on the skbuff, it specifies if the packet was redirect from ingress
      and resets the timestamp (timestamp reset was originally missing in the
      netfilter bugfix).
      
      Fixes: bcfabee1
      
       ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
      Reported-by: default avatar <noreply@ellerman.id.au>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c64605b
    • Guilherme G. Piccoli's avatar
      net: ena: Add PCI shutdown handler to allow safe kexec · 428c4913
      Guilherme G. Piccoli authored
      
      
      Currently ENA only provides the PCI remove() handler, used during rmmod
      for example. This is not called on shutdown/kexec path; we are potentially
      creating a failure scenario on kexec:
      
      (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
      instead pci_device_shutdown() clears the master bit of the PCI device,
      stopping all DMA transactions;
      
      (b) Kexec reboot happens and the device gets enabled again, likely having
      its FW with that DMA transaction buffered; then it may trigger the (now
      invalid) memory operation in the new kernel, corrupting kernel memory area.
      
      This patch aims to prevent this, by implementing a shutdown() handler
      quite similar to the remove() one - the difference being the handling
      of the netdev, which is unregistered on remove(), but following the
      convention observed in other drivers, it's only detached on shutdown().
      
      This prevents an odd issue in AWS Nitro instances, in which after the 2nd
      kexec the next one will fail with an initrd corruption, caused by a wild
      DMA write to invalid kernel memory. The lspci output for the adapter
      present in my instance is:
      
      00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
      Adapter (ENA) [1d0f:ec20]
      
      Suggested-by: default avatarGavin Shan <gshan@redhat.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Acked-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428c4913
    • Hangbin Liu's avatar
      selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED · c085dbfb
      Hangbin Liu authored
      The lib files should not be defined as TEST_PROGS, or we will run them
      in run_kselftest.sh.
      
      Also remove ethtool_lib.sh exec permission.
      
      Fixes: 81573b18
      
       ("selftests/net/forwarding: add Makefile to install tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c085dbfb
    • Hangbin Liu's avatar
      selftests/net: add missing tests to Makefile · 919a23e9
      Hangbin Liu authored
      
      
      Find some tests are missed in Makefile by running:
      for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      919a23e9
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · e2cf67f6
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
       "A single fix from me to correctly handle the size of read-only zone
        files"
      
      * tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonfs: Fix handling of read-only zones
      e2cf67f6
  3. Mar 25, 2020
    • Maor Gottlieb's avatar
      RDMA/mlx5: Block delay drop to unprivileged users · ba80013f
      Maor Gottlieb authored
      It has been discovered that this feature can globally block the RX port,
      so it should be allowed for highly privileged users only.
      
      Fixes: 03404e8a
      
      ("IB/mlx5: Add support to dropless RQ")
      Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.org
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ba80013f
    • Damien Le Moal's avatar
      zonfs: Fix handling of read-only zones · ccf4ad7d
      Damien Le Moal authored
      
      
      The write pointer of zones in the read-only consition is defined as
      invalid by the SCSI ZBC and ATA ZAC specifications. It is thus not
      possible to determine the correct size of a read-only zone file on
      mount. Fix this by handling read-only zones in the same manner as
      offline zones by disabling all accesses to the zone (read and write)
      and initializing the inode size of the read-only zone to 0).
      
      For zones found to be in the read-only condition at runtime, only
      disable write access to the zone and keep the size of the zone file to
      its last updated value to allow the user to recover previously written
      data.
      
      Also fix zonefs documentation file to reflect this change.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      ccf4ad7d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 6f000f98
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) A new selftest for nf_queue, from Florian Westphal. This test
         covers two recent fixes: 07f8e4d0 ("tcp: also NULL skb->dev
         when copy was needed") and b738a185
      
       ("tcp: ensure skb->dev is
         NULL before leaving TCP stack").
      
      2) The fwd action breaks with ifb. For safety in next extensions,
         make sure the fwd action only runs from ingress until it is extended
         to be used from a different hook.
      
      3) The pipapo set type now reports EEXIST in case of subrange overlaps.
         Update the rbtree set to validate range overlaps, so far this
         validation is only done only from userspace. From Stefano Brivio.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f000f98
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7e566df6
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-03-24
      
      This series introduces some fixes to mlx5 driver.
      
      From Aya, Fixes to the RX error recovery flows
      From Leon, Fix IB capability mask
      
      Please pull and let me know if there is any problem.
      
      For -stable v5.5
       ('net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure')
      
      For -stable v5.4
       ('net/mlx5e: Fix ICOSQ recovery flow with Striding RQ')
       ('net/mlx5e: Do not recover from a non-fatal syndrome')
       ('net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset')
       ('net/mlx5e: Enhance ICOSQ WQE info fields')
      
      The above patch ('net/mlx5e: Enhance ICOSQ WQE info fields')
      will fail to apply cleanly on v5.4 due to a trivial contextual conflict,
      but it is an important fix, do I need to do something about it or just
      assume Greg will know how to handle this ?
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e566df6
    • Heiner Kallweit's avatar
      r8169: re-enable MSI on RTL8168c · f13bc681
      Heiner Kallweit authored
      The original change fixed an issue on RTL8168b by mimicking the vendor
      driver behavior to disable MSI on chip versions before RTL8168d.
      This however now caused an issue on a system with RTL8168c, see [0].
      Therefore leave MSI disabled on RTL8168b, but re-enable it on RTL8168c.
      
      [0] https://bugzilla.redhat.com/show_bug.cgi?id=1792839
      
      Fixes: 003bd5b4
      
       ("r8169: don't use MSI before RTL8168d")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f13bc681
    • Andre Przywara's avatar
      net: phy: mdio-bcm-unimac: Fix clock handling · c312c781
      Andre Przywara authored
      The DT binding for this PHY describes an *optional* clock property.
      Due to a bug in the error handling logic, we are actually ignoring this
      clock *all* of the time so far.
      
      Fix this by using devm_clk_get_optional() to handle this clock properly.
      
      Fixes: b78ac6ec
      
       ("net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider")
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c312c781
    • Raju Rangoju's avatar
      cxgb4/ptp: pass the sign of offset delta in FW CMD · 50e0d28d
      Raju Rangoju authored
      
      
      cxgb4_ptp_fineadjtime() doesn't pass the signedness of offset delta
      in FW_PTP_CMD. Fix it by passing correct sign.
      
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50e0d28d
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop · e80f40cb
      Vladimir Oltean authored
      Not only did this wheel did not need reinventing, but there is also
      an issue with it: It doesn't remove the VLAN header in a way that
      preserves the L2 payload checksum when that is being provided by the DSA
      master hw.  It should recalculate checksum both for the push, before
      removing the header, and for the pull afterwards. But the current
      implementation is quite dizzying, with pulls followed immediately
      afterwards by pushes, the memmove is done before the push, etc.  This
      makes a DSA master with RX checksumming offload to print stack traces
      with the infamous 'hw csum failure' message.
      
      So remove the dsa_8021q_remove_header function and replace it with
      something that actually works with inet checksumming.
      
      Fixes: d4619336
      
       ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e80f40cb
    • Zh-yuan Ye's avatar
      net: cbs: Fix software cbs to consider packet sending time · 961d0e5b
      Zh-yuan Ye authored
      Currently the software CBS does not consider the packet sending time
      when depleting the credits. It caused the throughput to be
      Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where
      Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope
      is expected. In order to fix the issue above, this patch takes the time
      when the packet sending completes into account by moving the anchor time
      variable "last" ahead to the send completion time upon transmission and
      adding wait when the next dequeue request comes before the send
      completion time of the previous packet.
      
      changelog:
      V2->V3:
       - remove unnecessary whitespace cleanup
       - add the checks if port_rate is 0 before division
      
      V1->V2:
       - combine variable "send_completed" into "last"
       - add the comment for estimate of the packet sending
      
      Fixes: 585d763a
      
       ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
      Signed-off-by: default avatarZh-yuan Ye <ye.zh-yuan@socionext.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      961d0e5b
    • Leon Romanovsky's avatar
      RDMA/mlx5: Fix access to wrong pointer while performing flush due to error · 950bf4f1
      Leon Romanovsky authored
      The main difference between send and receive SW completions is related to
      separate treatment of WQ queue. For receive completions, the initial index
      to be flushed is stored in "tail", while for send completions, it is in
      deleted "last_poll".
      
        CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
        Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
        NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
        REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
        MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
        CFAR: c00800000e586af0 IRQMASK: 0
        GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
        GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
        GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
        GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
        GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
        GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
        NIP [c000003c7c00a000] 0xc000003c7c00a000
        LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
        Call Trace:
        [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
        [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
        [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
        [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
        [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
        [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
      
      Fixes: 8e3b6883
      
       ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
      Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      950bf4f1
    • Mike Marciniszyn's avatar
      RDMA/core: Ensure security pkey modify is not lost · 2d47fbac
      Mike Marciniszyn authored
      The following modify sequence (loosely based on ipoib) will lose a pkey
      modifcation:
      
      - Modify (pkey index, port)
      - Modify (new pkey index, NO port)
      
      After the first modify, the qp_pps list will have saved the pkey and the
      unit on the main list.
      
      During the second modify, get_new_pps() will fetch the port from qp_pps
      and read the new pkey index from qp_attr->pkey_index.  The state will
      still be zero, or IB_PORT_PKEY_NOT_VALID. Because of the invalid state,
      the new values will never replace the one in the qp pps list, losing the
      new pkey.
      
      This happens because the following if statements will never correct the
      state because the first term will be false. If the code had been executed,
      it would incorrectly overwrite valid values.
      
        if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
      	  new_pps->main.state = IB_PORT_PKEY_VALID;
      
        if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) {
      	  new_pps->main.port_num = qp_pps->main.port_num;
      	  new_pps->main.pkey_index = qp_pps->main.pkey_index;
      	  if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
      		  new_pps->main.state = IB_PORT_PKEY_VALID;
        }
      
      Fix by joining the two if statements with an or test to see if qp_pps is
      non-NULL and in the correct state.
      
      Fixes: 1dd01788
      
       ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
      Link: https://lore.kernel.org/r/20200313124704.14982.55907.stgit@awfm-01.aw.intel.com
      Reviewed-by: default avatarKaike Wan <kaike.wan@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      2d47fbac
    • Leon Romanovsky's avatar
      MAINTAINERS: Clean RXE section and add Zhu as RXE maintainer · 1fa70778
      Leon Romanovsky authored
      
      
      Zhu Yanjun contributed many patches to RXE and expressed genuine interest
      in improve RXE even more. Let's add him as a maintainer.
      
      Link: https://lore.kernel.org/r/20200312083658.29603-1-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      1fa70778
    • Aya Levin's avatar
      net/mlx5e: Do not recover from a non-fatal syndrome · 187a9830
      Aya Levin authored
      For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be
      triggered. In these scenarios, the RQ is not actually in ERR state.
      This misleads the recovery flow which assumes that the RQ is really in
      error state and no more completions arrive, causing crashes on bad page
      state.
      
      Fixes: 8276ea13
      
       ("net/mlx5e: Report and recover from CQE with error on RQ")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      187a9830
    • Aya Levin's avatar
      net/mlx5e: Fix ICOSQ recovery flow with Striding RQ · e239c6d6
      Aya Levin authored
      In striding RQ mode, the buffers of an RX WQE are first
      prepared and posted to the HW using a UMR WQEs via the ICOSQ.
      We maintain the state of these in-progress WQEs in the RQ
      SW struct.
      
      In the flow of ICOSQ recovery, the corresponding RQ is not
      in error state, hence:
      
      - The buffers of the in-progress WQEs must be released
        and the RQ metadata should reflect it.
      - Existing RX WQEs in the RQ should not be affected.
      
      For this, wrap the dealloc of the in-progress WQEs in
      a function, and use it in the ICOSQ recovery flow
      instead of mlx5e_free_rx_descs().
      
      Fixes: be5323c8
      
       ("net/mlx5e: Report and recover from CQE error on ICOSQ")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e239c6d6
    • Aya Levin's avatar
      net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset · 39369fd5
      Aya Levin authored
      When resetting the RQ (moving RQ state from RST to RDY), the driver
      resets the WQ's SW metadata.
      In striding RQ mode, we maintain a field that reflects the actual
      expected WQ head (including in progress WQEs posted to the ICOSQ).
      It was mistakenly not reset together with the WQ. Fix this here.
      
      Fixes: 8276ea13
      
       ("net/mlx5e: Report and recover from CQE with error on RQ")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      39369fd5
    • Aya Levin's avatar
      net/mlx5e: Enhance ICOSQ WQE info fields · 1de0306c
      Aya Levin authored
      Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the
      number of WQEBBs on WQE post, and increment the consumer counter (cc)
      on completion.
      
      In case of error completions, the cc was mistakenly not incremented,
      keeping a gap between cc and pc (producer counter). This failed the
      recovery flow on the ICOSQ from a CQE error which timed-out waiting for
      the cc and pc to meet.
      
      Fixes: be5323c8
      
       ("net/mlx5e: Report and recover from CQE error on ICOSQ")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1de0306c
    • Leon Romanovsky's avatar
      net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure · 306f354c
      Leon Romanovsky authored
      The cap_mask1 isn't protected by field_select and not listed among RW
      fields, but it is required to be written to properly initialize ports
      in IB virtualization mode.
      
      Link: https://lore.kernel.org/linux-rdma/88bab94d2fd72f3145835b4518bc63dda587add6.camel@redhat.com
      Fixes: ab118da4
      
       ("net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT command")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      306f354c
    • Florian Westphal's avatar
      selftests: netfilter: add nfqueue test case · a64d558d
      Florian Westphal authored
      Add a test case to check nf queue infrastructure.
      Could be extended in the future to also cover serialization of
      conntrack, uid and secctx attributes in nfqueue.
      
      For now, this checks that 'queue bypass' works, that a queue rule with
      no bypass option blocks traffic and that userspace receives the expected
      number of packets.
      For this we add two queues and hook all of
      prerouting/input/forward/output/postrouting.
      
      Packets get queued twice with a dummy base chain in between:
      This passes with current nf tree, but reverting
      commit 946c0d8e ("netfilter: nf_queue: fix reinject verdict handling")
      makes this trip (it processes 30 instead of expected 20 packets).
      
      v2: update config file with queue and other options missing/needed for
      other tests.
      v3: also test with tcp, this reveals problem with commit
      28f8bfd1
      
       ("netfilter: Support iif matches in POSTROUTING"), due to
      skb->dev pointing at another skb in the retransmit rbtree (skb->dev
      aliases to rbnode child).
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a64d558d
    • Pablo Neira Ayuso's avatar
      netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress · bcfabee1
      Pablo Neira Ayuso authored
      Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
      Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
      path after leaving the ifb egress path.
      
      This patch inconditionally sets on these two skb fields that are
      meaningful to the ifb driver. The existing forward action is guaranteed
      to run from ingress path.
      
      Fixes: 39e6dea2
      
       ("netfilter: nf_tables: add forward expression to the netdev family")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bcfabee1
    • Pablo Neira Ayuso's avatar
      netfilter: nft_fwd_netdev: validate family and chain type · 76a109fa
      Pablo Neira Ayuso authored
      Make sure the forward action is only used from ingress.
      
      Fixes: 39e6dea2
      
       ("netfilter: nf_tables: add forward expression to the netdev family")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      76a109fa
    • Stefano Brivio's avatar
      netfilter: nft_set_rbtree: Detect partial overlaps on insertion · 7c84d414
      Stefano Brivio authored
      
      
      ...and return -ENOTEMPTY to the front-end in this case, instead of
      proceeding. Currently, nft takes care of checking for these cases
      and not sending them to the kernel, but if we drop the set_overlap()
      call in nft we can end up in situations like:
      
       # nft add table t
       # nft add set t s '{ type inet_service ; flags interval ; }'
       # nft add element t s '{ 1 - 5 }'
       # nft add element t s '{ 6 - 10 }'
       # nft add element t s '{ 4 - 7 }'
       # nft list set t s
       table ip t {
       	set s {
       		type inet_service
       		flags interval
       		elements = { 1-3, 4-5, 6-7 }
       	}
       }
      
      This change has the primary purpose of making the behaviour
      consistent with nft_set_pipapo, but is also functional to avoid
      inconsistent behaviour if userspace sends overlapping elements for
      any reason.
      
      v2: When we meet the same key data in the tree, as start element while
          inserting an end element, or as end element while inserting a start
          element, actually check that the existing element is active, before
          resetting the overlap flag (Pablo Neira Ayuso)
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7c84d414
    • Stefano Brivio's avatar
      netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() · 6f7c9caf
      Stefano Brivio authored
      
      
      Replace negations of nft_rbtree_interval_end() with a new helper,
      nft_rbtree_interval_start(), wherever this helps to visualise the
      problem at hand, that is, for all the occurrences except for the
      comparison against given flags in __nft_rbtree_get().
      
      This gets especially useful in the next patch.
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6f7c9caf
    • Stefano Brivio's avatar
      netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion · 0eb4b5ee
      Stefano Brivio authored
      
      
      ...and return -ENOTEMPTY to the front-end on collision, -EEXIST if
      an identical element already exists. Together with the previous patch,
      element collision will now be returned to the user as -EEXIST.
      
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0eb4b5ee
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: Allow set back-ends to report partial overlaps on insertion · 8c2d45b2
      Pablo Neira Ayuso authored
      
      
      Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it
      might indicate that a given element (including intervals) already exists as
      such, or that the new element would clash with existing ones.
      
      If identical elements already exist, the front-end is ignoring this without
      returning error, in case NLM_F_EXCL is not set. However, if the new element
      can't be inserted due an overlap, we should report this to the user.
      
      To this purpose, allow set back-ends to return -ENOTEMPTY on collision with
      existing elements, translate that to -EEXIST, and return that to userspace,
      no matter if NLM_F_EXCL was set.
      
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8c2d45b2
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 76ccd234
      Linus Torvalds authored
      Pull perf tooling fixes from Ingo Molnar:
       "A handful of tooling fixes all across the map, no kernel changes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tools headers uapi: Update linux/in.h copy
        perf probe: Do not depend on dwfl_module_addrsym()
        perf probe: Fix to delete multiple probe event
        perf parse-events: Fix reading of invalid memory in event parsing
        perf python: Fix clang detection when using CC=clang-version
        perf map: Fix off by one in strncpy() size argument
        tools: Let O= makes handle a relative path with -C option
      76ccd234