Skip to content
  1. Nov 15, 2020
    • David Woodhouse's avatar
      sched/wait: Add add_wait_queue_priority() · c4d51a52
      David Woodhouse authored
      
      
      This allows an exclusive wait_queue_entry to be added at the head of the
      queue, instead of the tail as normal. Thus, it gets to consume events
      first without allowing non-exclusive waiters to be woken at all.
      
      The (first) intended use is for KVM IRQFD, which currently has
      inconsistent behaviour depending on whether posted interrupts are
      available or not. If they are, KVM will bypass the eventfd completely
      and deliver interrupts directly to the appropriate vCPU. If not, events
      are delivered through the eventfd and userspace will receive them when
      polling on the eventfd.
      
      By using add_wait_queue_priority(), KVM will be able to consistently
      consume events within the kernel without accidentally exposing them
      to userspace when they're supposed to be bypassed. This, in turn, means
      that userspace doesn't have to jump through hoops to avoid listening
      on the erroneously noisy eventfd and injecting duplicate interrupts.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20201027143944.648769-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c4d51a52
    • Yadong Qi's avatar
      KVM: x86: emulate wait-for-SIPI and SIPI-VMExit · bf0cd88c
      Yadong Qi authored
      
      
      Background: We have a lightweight HV, it needs INIT-VMExit and
      SIPI-VMExit to wake-up APs for guests since it do not monitor
      the Local APIC. But currently virtual wait-for-SIPI(WFS) state
      is not supported in nVMX, so when running on top of KVM, the L1
      HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
      the L2 guest cannot wake up the APs.
      
      According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
      SIPIs cause VM exits when a logical processor is in
      wait-for-SIPI state.
      
      In this patch:
          1. introduce SIPI exit reason,
          2. introduce wait-for-SIPI state for nVMX,
          3. advertise wait-for-SIPI support to guest.
      
      When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
      INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
      L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
      emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
      for L2's vcpu state.
      
      Handle procdure:
      Source vCPU:
          L2 write LAPIC.ICR(INIT).
          L0 trap LAPIC.ICR write(INIT): inject a latched INIT event to target
             vCPU.
      Target vCPU:
          L0 emulate an INIT VMExit to L1 if is guest mode.
          L1 set guest VMCS, guest_activity_state=WAIT_SIPI, vmresume.
          L0 set vcpu.mp_state to INIT_RECEIVED if (vmcs12.guest_activity_state
             == WAIT_SIPI).
      
      Source vCPU:
          L2 write LAPIC.ICR(SIPI).
          L0 trap LAPIC.ICR write(INIT): inject a latched SIPI event to traget
             vCPU.
      Target vCPU:
          L0 emulate an SIPI VMExit to L1 if (vcpu.mp_state == INIT_RECEIVED).
          L1 set CS:IP, guest_activity_state=ACTIVE, vmresume.
          L0 resume to L2.
          L2 start-up.
      
      Signed-off-by: default avatarYadong Qi <yadong.qi@intel.com>
      Message-Id: <20200922052343.84388-1-yadong.qi@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20201106065122.403183-1-yadong.qi@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf0cd88c
    • Paolo Bonzini's avatar
      KVM: x86: fix apic_accept_events vs check_nested_events · 1c96dcce
      Paolo Bonzini authored
      
      
      vmx_apic_init_signal_blocked is buggy in that it returns true
      even in VMX non-root mode.  In non-root mode, however, INITs
      are not latched, they just cause a vmexit.  Previously,
      KVM was waiting for them to be processed when kvm_apic_accept_events
      and in the meanwhile it ate the SIPIs that the processor received.
      
      However, in order to implement the wait-for-SIPI activity state,
      KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
      and it will not be possible anymore to disregard SIPIs in non-root
      mode as the code is currently doing.
      
      By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
      (with the side-effect of latching INITs) before incorrectly injecting
      an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
      can do the right thing.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1c96dcce
    • Sean Christopherson's avatar
      KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2 · 7a873e45
      Sean Christopherson authored
      
      
      Extend the KVM_SET_SREGS test to verify that all supported CR4 bits, as
      enumerated by KVM, can be set before KVM_SET_CPUID2, i.e. without first
      defining the vCPU model.  KVM is supposed to skip guest CPUID checks
      when host userspace is stuffing guest state.
      
      Check the inverse as well, i.e. that KVM rejects KVM_SET_REGS if CR4
      has one or more unsupported bits set.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-7-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7a873e45
    • Sean Christopherson's avatar
      KVM: x86: Return bool instead of int for CR4 and SREGS validity checks · ee69c92b
      Sean Christopherson authored
      
      
      Rework the common CR4 and SREGS checks to return a bool instead of an
      int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
      clarify the polarity of the return value (which is effectively inverted
      by this change).
      
      No functional changed intended.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee69c92b
    • Sean Christopherson's avatar
      KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4
      Sean Christopherson authored
      Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
      and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
      KVM_SET_SREGS would return success while failing to actually set CR4.
      
      Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
      in __set_sregs() is not a viable option as KVM has already stuffed a
      variety of vCPU state.
      
      Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
      inverted semantics.  This will be remedied in a future patch.
      
      Fixes: 5e1746d6
      
       ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2fe3cd4
    • Sean Christopherson's avatar
      KVM: SVM: Drop VMXE check from svm_set_cr4() · 311a0659
      Sean Christopherson authored
      
      
      Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
      the check by incorporating VMXE into the CR4 reserved bits, via
      kvm_cpu_caps.  SVM obviously does not set X86_FEATURE_VMX.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      311a0659
    • Sean Christopherson's avatar
      KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4() · a447e38a
      Sean Christopherson authored
      
      
      Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
      (by vmx_set_cpu_caps()), if and only if 'nested' is true.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a447e38a
    • Sean Christopherson's avatar
      KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4() · d3a9e414
      Sean Christopherson authored
      Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
      incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
      before KVM_SET_CPUID{,2}.
      
      Fixes: 5e1746d6
      
       ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Reported-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3a9e414
    • Paolo Bonzini's avatar
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini authored
      
      
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  2. Nov 13, 2020
    • Babu Moger's avatar
      KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests · 96308b06
      Babu Moger authored
      
      
      For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
      the memory encryption bit in reserved bits.
      
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      96308b06
    • Babu Moger's avatar
      KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch · 0107973a
      Babu Moger authored
      SEV guests fail to boot on a system that supports the PCID feature.
      
      While emulating the RSM instruction, KVM reads the guest CR3
      and calls kvm_set_cr3(). If the vCPU is in the long mode,
      kvm_set_cr3() does a sanity check for the CR3 value. In this case,
      it validates whether the value has any reserved bits set. The
      reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
      encryption is enabled, the memory encryption bit is set in the CR3
      value. The memory encryption bit may fall within the KVM reserved
      bit range, causing the KVM emulation failure.
      
      Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
      cache the reserved bits in the CR3 value. This will be initialized
      to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).
      
      If the architecture has any special bits(like AMD SEV encryption bit)
      that needs to be masked from the reserved bits, should be cleared
      in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.
      
      Fixes: a780a3ea
      
       ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0107973a
    • David Edmondson's avatar
      KVM: x86: clflushopt should be treated as a no-op by emulation · 51b958e5
      David Edmondson authored
      The instruction emulator ignores clflush instructions, yet fails to
      support clflushopt. Treat both similarly.
      
      Fixes: 13e457e0
      
       ("KVM: x86: Emulator does not decode clflush well")
      Signed-off-by: default avatarDavid Edmondson <david.edmondson@oracle.com>
      Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      51b958e5
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.10-3' of... · 2c38234c
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for v5.10, take #3
      
      - Allow userspace to downgrade ID_AA64PFR0_EL1.CSV2
      - Inject UNDEF on SCXTNUM_ELx access
      2c38234c
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 585e5b17
      Linus Torvalds authored
      Pull fscrypt fix from Eric Biggers:
       "Fix a regression where new files weren't using inline encryption when
        they should be"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fscrypt: fix inline encryption not used on new files
      585e5b17
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 20ca21df
      Linus Torvalds authored
      Pull gfs2 fixes from Andreas Gruenbacher:
       "Fix jdata data corruption and glock reference leak"
      
      * tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Fix case in which ail writes are done to jdata holes
        Revert "gfs2: Ignore journal log writes for jdata holes"
        gfs2: fix possible reference leak in gfs2_check_blk_type
      20ca21df
    • Linus Torvalds's avatar
      Merge tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · db7c9535
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
           ENETC
      
        Current release - bugs in new features:
      
         - mptcp: provide rmem[0] limit offset to fix oops
      
        Previous release - regressions:
      
         - IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
           calculations
      
         - lan743x: correctly handle chips with internal PHY
      
         - bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
      
         - mlx5e: Fix VXLAN port table synchronization after function reload
      
        Previous release - always broken:
      
         - bpf: Zero-fill re-used per-cpu map element
      
         - fix out-of-order UDP packets when forwarding with UDP GSO fraglists
           turned on:
             - fix UDP header access on Fast/frag0 UDP GRO
             - fix IP header access and skb lookup on Fast/frag0 UDP GRO
      
         - ethtool: netlink: add missing netdev_features_change() call
      
         - net: Update window_clamp if SOCK_RCVBUF is set
      
         - igc: Fix returning wrong statistics
      
         - ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
      
         - tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
      
         - r8169: disable hw csum for short packets on all chip versions
      
         - vrf: Fix fast path output packet handling with async Netfilter
           rules"
      
      * tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        lan743x: fix use of uninitialized variable
        net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
        net: udp: fix UDP header access on Fast/frag0 UDP GRO
        devlink: Avoid overwriting port attributes of registered port
        vrf: Fix fast path output packet handling with async Netfilter rules
        cosa: Add missing kfree in error path of cosa_write
        net: switch to the kernel.org patchwork instance
        ch_ktls: stop the txq if reaches threshold
        ch_ktls: tcb update fails sometimes
        ch_ktls/cxgb4: handle partial tag alone SKBs
        ch_ktls: don't free skb before sending FIN
        ch_ktls: packet handling prior to start marker
        ch_ktls: Correction in middle record handling
        ch_ktls: missing handling of header alone
        ch_ktls: Correction in trimmed_len calculation
        cxgb4/ch_ktls: creating skbs causes panic
        ch_ktls: Update cheksum information
        ch_ktls: Correction in finding correct length
        cxgb4/ch_ktls: decrypted bit is not enough
        net/x25: Fix null-ptr-deref in x25_connect
        ...
      db7c9535
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 200f9d21
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Stable fixes:
        - Fix failure to unregister shrinker
      
        Other fixes:
        - Fix unnecessary locking to clear up some contention
        - Fix listxattr receive buffer size
        - Fix default mount options for nfsroot"
      
      * tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFS: Remove unnecessary inode lock in nfs_fsync_dir()
        NFS: Remove unnecessary inode locking in nfs_llseek_dir()
        NFS: Fix listxattr receive buffer size
        NFSv4.2: fix failure to unregister shrinker
        nfsroot: Default mount option should ask for built-in NFS version
      200f9d21
    • Marc Zyngier's avatar
      KVM: arm64: Handle SCXTNUM_ELx traps · ed4ffaf4
      Marc Zyngier authored
      
      
      As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
      will trap to EL2. Let's handle that as gracefully as possible
      by injecting an UNDEF exception into the guest. This is consistent
      with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
      ed4ffaf4
    • Marc Zyngier's avatar
      KVM: arm64: Unify trap handlers injecting an UNDEF · 338b1793
      Marc Zyngier authored
      
      
      A large number of system register trap handlers only inject an
      UNDEF exeption, and yet each class of sysreg seems to provide its
      own, identical function.
      
      Let's unify them all, saving us introducing yet another one later.
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
      338b1793
    • Marc Zyngier's avatar
      KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace · 23711a5e
      Marc Zyngier authored
      We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
      that are immune to Spectre-v2, but that don't have this field set,
      most likely because they predate the specification.
      
      However, this prevents the migration of guests that have started on
      a host the doesn't fake this CSV2 setting to one that does, as KVM
      rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
      what is already there.
      
      In order to fix this, allow userspace to set this field as long as
      this doesn't result in a promising more than what is already there
      (setting CSV2 to 0 is acceptable, but setting it to 1 when it is
      already set to 0 isn't).
      
      Fixes: e1026237
      
       ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
      Reported-by: default avatarPeng Liang <liangpeng10@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
      23711a5e
    • Marc Zyngier's avatar
      Merge tag 'v5.10-rc1' into kvmarm-master/next · 4f6b838c
      Marc Zyngier authored
      
      
      Linux 5.10-rc1
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      4f6b838c
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · af5043c8
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These are mostly docmentation fixes and janitorial changes plus some
        new device IDs and a new quirk.
      
        Specifics:
      
         - Fix documentation regarding GPIO properties (Andy Shevchenko)
      
         - Fix spelling mistakes in ACPI documentation (Flavio Suligoi)
      
         - Fix white space inconsistencies in ACPI code (Maximilian Luz)
      
         - Fix string formatting in the ACPI Generic Event Device (GED) driver
           (Nick Desaulniers)
      
         - Add Intel Alder Lake device IDs to the ACPI drivers used by the
           Dynamic Platform and Thermal Framework (Srinivas Pandruvada)
      
         - Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
           button driver (Hans de Goede)"
      
      * tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Support Alder Lake
        Documentation: ACPI: fix spelling mistakes
        ACPI: button: Add DMI quirk for Medion Akoya E2228T
        ACPI: GED: fix -Wformat
        ACPI: Fix whitespace inconsistencies
        ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
        Documentation: firmware-guide: gpio-properties: Clarify initial output state
        Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
        Documentation: firmware-guide: gpio-properties: Fix factual mistakes
      af5043c8
    • Linus Torvalds's avatar
      Merge tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fcfb6791
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Make the intel_pstate driver behave as expected when it operates in
        the passive mode with HWP enabled and the 'powersave' governor on top
        of it"
      
      * tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
        cpufreq: Add strict_target to struct cpufreq_policy
        cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
        cpufreq: Introduce governor flags
      fcfb6791
    • Sven Van Asbroeck's avatar
      lan743x: fix use of uninitialized variable · edbc2111
      Sven Van Asbroeck authored
      When no devicetree is present, the driver will use an
      uninitialized variable.
      
      Fix by initializing this variable.
      
      Fixes: 902a66e0
      
       ("lan743x: correctly handle chips with internal PHY")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarSven Van Asbroeck <thesven73@gmail.com>
      Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      edbc2111
    • Jakub Kicinski's avatar
      Merge branch 'net-udp-fix-fast-frag0-udp-gro' · 5861c8cb
      Jakub Kicinski authored
      
      
      Alexander Lobakin says:
      
      ====================
      net: udp: fix Fast/frag0 UDP GRO
      
      While testing UDP GSO fraglists forwarding through driver that uses
      Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
      iperf packets:
      
      [ ID] Interval           Transfer     Bitrate         Jitter
      [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
      
      Simple switch to napi_gro_receive() or any other method without frag0
      shortcut completely resolved them.
      
      I've found two incorrect header accesses in GRO receive callback(s):
       - udp_hdr() (instead of udp_gro_udphdr()) that always points to junk
         in "fast" mode and could probably do this in "regular".
         This was the actual bug that caused all out-of-order delivers;
       - udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of
         skb_gro_network_header()) that potentionally might return odd
         pointers in both modes.
      
      Each patch addresses one of these two issues.
      
      This doesn't cover a support for nested tunnels as it's out of the
      subject and requires more invasive changes. It will be handled
      separately in net-next series.
      
      Credits:
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Willem de Bruijn <willemb@google.com>
      
      Since v4 [0]:
       - split the fix into two logical ones (Willem);
       - replace ternaries with plain ifs to beautify the code (Jakub);
       - drop p->data part to reintroduce it later in abovementioned set.
      
      Since v3 [1]:
       - restore the original {,__}udp{4,6}_lib_lookup_skb() and use
         private versions of them inside GRO code (Willem).
      
      Since v2 [2]:
       - dropped redundant check introduced in v2 as it's performed right
         before (thanks to Eric);
       - udp_hdr() switched to data + off for skbs from list (also Eric);
       - fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with
         Fast/frag0 due to ip{,v6}_hdr() usage (Willem).
      
      Since v1 [3]:
       - added a NULL pointer check for "uh" as suggested by Willem.
      
      [0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch
      [1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch
      [2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch
      [3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
      ====================
      
      Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.ch
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5861c8cb
    • Alexander Lobakin's avatar
      net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO · 55e72988
      Alexander Lobakin authored
      udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the
      packet. While it's probably OK for non-frag0 paths, this helpers
      will also point to junk on Fast/frag0 GRO when all headers are
      located in frags. As a result, sk/skb lookup may fail or give wrong
      results. To support both GRO modes, skb_gro_network_header() might
      be used. To not modify original functions, add private versions of
      udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO.
      
      Present since the introduction of "application-level" UDP GRO
      in 4.7-rc1.
      
      Misc: replace totally unneeded ternaries with plain ifs.
      
      Fixes: a6024562
      
       ("udp: Add GRO functions to UDP socket")
      Suggested-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55e72988
    • Alexander Lobakin's avatar
      net: udp: fix UDP header access on Fast/frag0 UDP GRO · 4b1a8628
      Alexander Lobakin authored
      UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's
      probably OK for non-frag0 paths (when all headers or even the entire
      frame are already in skb head), this inline points to junk when
      using Fast GRO (napi_gro_frags() or napi_gro_receive() with only
      Ethernet header in skb head and all the rest in the frags) and breaks
      GRO packet compilation and the packet flow itself.
      To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
      are typically used. UDP even has an inline helper that makes use of
      them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
      to get rid of the out-of-order delivers.
      
      Present since the introduction of plain UDP GRO in 5.0-rc1.
      
      Fixes: e20cf8d3
      
       ("udp: implement GRO for plain UDP sockets.")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4b1a8628
    • Bob Peterson's avatar
      gfs2: Fix case in which ail writes are done to jdata holes · 4e79e3f0
      Bob Peterson authored
      Patch b2a846db ("gfs2: Ignore journal log writes for jdata holes")
      tried (unsuccessfully) to fix a case in which writes were done to jdata
      blocks, the blocks are sent to the ail list, then a punch_hole or truncate
      operation caused the blocks to be freed. In other words, the ail items
      are for jdata holes. Before b2a846db
      
      , the jdata hole caused function
      gfs2_block_map to return -EIO, which was eventually interpreted as an
      IO error to the journal, and then withdraw.
      
      This patch changes function gfs2_get_block_noalloc, which is only used
      for jdata writes, so it returns -ENODATA rather than -EIO, and when
      -ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
      We can safely ignore it because gfs2_ail1_start_one is only called
      when the jdata pages have already been written and truncated, so the
      ail1 content no longer applies.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      4e79e3f0
    • Bob Peterson's avatar
      Revert "gfs2: Ignore journal log writes for jdata holes" · d3039c06
      Bob Peterson authored
      This reverts commit b2a846db
      
      .
      
      That commit changed the behavior of function gfs2_block_map to return
      -ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
      false.  While that fixed the intended problem for jdata, it also broke
      other callers of gfs2_block_map such as some jdata block reads.  Before
      the patch, an encountered hole would be skipped and the buffer seen as
      unmapped by the caller.  The patch changed the behavior to return
      -ENODATA, which is interpreted as an error by the caller.
      
      The -ENODATA return code should be restricted to the specific case where
      jdata holes are encountered during ail1 writes.  That will be done in a
      later patch.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      d3039c06
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 8a5c2906
      Jakub Kicinski authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-11-10
      
      This series contains updates to i40e and igc drivers and the MAINTAINERS
      file.
      
      Slawomir fixes updating VF MAC addresses to fix various issues related
      to reporting and setting of these addresses for i40e.
      
      Dan Carpenter fixes a possible used before being initialized issue for
      i40e.
      
      Vinicius fixes reporting of netdev stats for igc.
      
      Tony updates repositories for Intel Ethernet Drivers.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        MAINTAINERS: Update repositories for Intel Ethernet Drivers
        igc: Fix returning wrong statistics
        i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
        i40e: Fix MAC address setting for a VF via Host/VM
      ====================
      
      Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a5c2906
    • Parav Pandit's avatar
      devlink: Avoid overwriting port attributes of registered port · 9f73bd1c
      Parav Pandit authored
      Cited commit in fixes tag overwrites the port attributes for the
      registered port.
      
      Avoid such error by checking registered flag before setting attributes.
      
      Fixes: 71ad8d55
      
       ("devlink: Replace devlink_port_attrs_set parameters with a struct")
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f73bd1c
  3. Nov 12, 2020