Skip to content
  1. Mar 24, 2021
    • Greg Kroah-Hartman's avatar
      Linux 4.9.263 · 5023febc
      Greg Kroah-Hartman authored
      
      
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarJason Self <jason@bluehome.net>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Link: https://lore.kernel.org/r/20210322121920.399826335@linuxfoundation.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v4.9.263
      5023febc
    • Thomas Gleixner's avatar
      genirq: Disable interrupts for force threaded handlers · 528d3b76
      Thomas Gleixner authored
      commit 81e2073c upstream.
      
      With interrupt force threading all device interrupt handlers are invoked
      from kernel threads. Contrary to hard interrupt context the invocation only
      disables bottom halfs, but not interrupts. This was an oversight back then
      because any code like this will have an issue:
      
      thread(irq_A)
        irq_handler(A)
          spin_lock(&foo->lock);
      
      interrupt(irq_B)
        irq_handler(B)
          spin_lock(&foo->lock);
      
      This has been triggered with networking (NAPI vs. hrtimers) and console
      drivers where printk() happens from an interrupt which interrupted the
      force threaded handler.
      
      Now people noticed and started to change the spin_lock() in the handler to
      spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
      interrupt request which in turn breaks RT.
      
      Fix the root cause and not the symptom and disable interrupts before
      invoking the force threaded handler which preserves the regular semantics
      and the usefulness of the interrupt force threading as a general debugging
      tool.
      
      For not RT this is not changing much, except that during the execution of
      the threaded handler interrupts are delayed until the handler
      returns. Vs. scheduling and softirq processing there is no difference.
      
      For RT kernels there is no issue.
      
      Fixes: 8d32a307
      
       ("genirq: Provide forced interrupt threading")
      Reported-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      528d3b76
    • Shijie Luo's avatar
      ext4: fix potential error in ext4_do_update_inode · 542e59b0
      Shijie Luo authored
      commit 7d8bd3c7
      
       upstream.
      
      If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
      the error code will be overridden, go to out_brelse to avoid this
      situation.
      
      Signed-off-by: default avatarShijie Luo <luoshijie1@huawei.com>
      Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com
      Cc: stable@kernel.org
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      542e59b0
    • zhangyi (F)'s avatar
      ext4: find old entry again if failed to rename whiteout · 5acfb54a
      zhangyi (F) authored
      commit b7ff91fd upstream.
      
      If we failed to add new entry on rename whiteout, we cannot reset the
      old->de entry directly, because the old->de could have moved from under
      us during make indexed dir. So find the old entry again before reset is
      needed, otherwise it may corrupt the filesystem as below.
      
        /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED.
        /dev/sda: Unattached inode 75
        /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
      
      Fixes: 6b4b8e6b
      
       ("ext4: fix bug for rename with RENAME_WHITEOUT")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5acfb54a
    • Oleg Nesterov's avatar
      x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() · 5b871095
      Oleg Nesterov authored
      commit 8c150ba2 upstream.
      
      The comment in get_nr_restart_syscall() says:
      
      	 * The problem is that we can get here when ptrace pokes
      	 * syscall-like values into regs even if we're not in a syscall
      	 * at all.
      
      Yes, but if not in a syscall then the
      
      	status & (TS_COMPAT|TS_I386_REGS_POKED)
      
      check below can't really help:
      
      	- TS_COMPAT can't be set
      
      	- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
      	  32bit debugger; and even in this case get_nr_restart_syscall()
      	  is only correct if the tracee is 32bit too.
      
      Suppose that a 64bit debugger plays with a 32bit tracee and
      
      	* Tracee calls sleep(2)	// TS_COMPAT is set
      	* User interrupts the tracee by CTRL-C after 1 sec and does
      	  "(gdb) call func()"
      	* gdb saves the regs by PTRACE_GETREGS
      	* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
      	* PTRACE_CONT		// TS_COMPAT is cleared
      	* func() hits int3.
      	* Debugger catches SIGTRAP.
      	* Restore original regs by PTRACE_SETREGS.
      	* PTRACE_CONT
      
      get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
      tracee calls ia32_sys_call_table[219] == sys_madvise.
      
      Add the sticky TS_COMPAT_RESTART flag which survives after return to user
      mode. It's going to be removed in the next step again by storing the
      information in the restart block. As a further cleanup it might be possible
      to remove also TS_I386_REGS_POKED with that.
      
      Test-case:
      
        $ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests
        $ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
        $ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
        $ ./erestartsys-trap-debugger
        Unexpected: retval 1, errno 22
        erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421
      
      Fixes: 609c19a3
      
       ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
      Reported-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210201174709.GA17895@redhat.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b871095
    • Oleg Nesterov's avatar
      x86: Move TS_COMPAT back to asm/thread_info.h · 4a3b8246
      Oleg Nesterov authored
      commit 66c1b6d7 upstream.
      
      Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.
      
      It was moved to asm/processor.h by b9d989c7 ("x86/asm: Move the
      thread_info::status field to thread_struct"), then later 37a8f7c3
      ("x86/asm: Move 'status' from thread_struct to thread_info") moved the
      'status' field back but TS_COMPAT was forgotten.
      
      Preparatory patch to fix the COMPAT case for get_nr_restart_syscall()
      
      Fixes: 609c19a3
      
       ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a3b8246
    • Oleg Nesterov's avatar
      kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() · 376a76aa
      Oleg Nesterov authored
      commit 5abbe51a upstream.
      
      Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT.
      
      Add a new helper which sets restart_block->fn and calls a dummy
      arch_set_restart_data() helper.
      
      Fixes: 609c19a3
      
       ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      376a76aa
    • Thomas Gleixner's avatar
      x86/ioapic: Ignore IRQ2 again · 3d9fcc25
      Thomas Gleixner authored
      commit a501b048 upstream.
      
      Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
      the matrix allocator claimed to be out of vectors. He analyzed it down to
      the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
      ever routed to the IO/APIC ended up having an interrupt vector assigned
      which got moved during unplug of CPU0.
      
      The underlying issue is that IRQ2 for various reasons (see commit
      af174783 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
      as a reserved system vector by the vector core code and is not accounted as
      a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
      which causes the IO/APIC setup to claim that interrupt which is granted by
      the vector domain because there is no sanity check. As a consequence the
      allocation counter of CPU0 underflows which causes a subsequent unplug to
      fail with:
      
        [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
      
      There is another sanity check missing in the matrix allocator, but the
      underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
      during the conversion to irqdomains.
      
      For almost 6 years nobody complained about this wreckage, which might
      indicate that this requirement could be lifted, but for any system which
      actually has a PIC IRQ2 is unusable by design so any routing entry has no
      effect and the interrupt cannot be connected to a device anyway.
      
      Due to that and due to history biased paranoia reasons restore the IRQ2
      ignore logic and treat it as non existent despite a routing entry claiming
      otherwise.
      
      Fixes: d32932d0
      
       ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d9fcc25
    • Kan Liang's avatar
      perf/x86/intel: Fix a crash caused by zero PEBS status · 6c2ab223
      Kan Liang authored
      commit d88d05a9 upstream.
      
      A repeatable crash can be triggered by the perf_fuzzer on some Haswell
      system.
      https://lore.kernel.org/lkml/7170d3be-c17f-1ded-52aa-cc6d9ae999f4@maine.edu/
      
      For some old CPUs (HSW and earlier), the PEBS status in a PEBS record
      may be mistakenly set to 0. To minimize the impact of the defect, the
      commit was introduced to try to avoid dropping the PEBS record for some
      cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates
      the local pebs_status accordingly. However, it doesn't correct the PEBS
      status in the PEBS record, which may trigger the crash, especially for
      the large PEBS.
      
      It's possible that all the PEBS records in a large PEBS have the PEBS
      status 0. If so, the first get_next_pebs_record_by_bit() in the
      __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
      PEBS, the 'count' parameter must > 1. The second
      get_next_pebs_record_by_bit() will crash.
      
      Besides the local pebs_status, correct the PEBS status in the PEBS
      record as well.
      
      Fixes: 01330d72
      
       ("perf/x86: Allow zero PEBS status with only single active event")
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Suggested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1615555298-140216-1-git-send-email-kan.liang@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c2ab223
    • Tyrel Datwyler's avatar
      PCI: rpadlpar: Fix potential drc_name corruption in store functions · ef8dc3d3
      Tyrel Datwyler authored
      commit cc7a0bb0
      
       upstream.
      
      Both add_slot_store() and remove_slot_store() try to fix up the
      drc_name copied from the store buffer by placing a NUL terminator at
      nbyte + 1 or in place of a '\n' if present. However, the static buffer
      that we copy the drc_name data into is not zeroed and can contain
      anything past the n-th byte.
      
      This is problematic if a '\n' byte appears in that buffer after nbytes
      and the string copied into the store buffer was not NUL terminated to
      start with as the strchr() search for a '\n' byte will mark this
      incorrectly as the end of the drc_name string resulting in a drc_name
      string that contains garbage data after the n-th byte.
      
      Additionally it will cause us to overwrite that '\n' byte on the stack
      with NUL, potentially corrupting data on the stack.
      
      The following debugging shows an example of the drmgr utility writing
      "PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
      logging a corrupted string value.
      
        drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
        add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
      
      Fix this by using strscpy() instead of memcpy() to ensure the string
      is NUL terminated when copied into the static drc_name buffer.
      Further, since the string is now NUL terminated the code only needs to
      change '\n' to '\0' when present.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.ibm.com>
      [mpe: Reformat change log and add mention of possible stack corruption]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210315214821.452959-1-tyreld@linux.ibm.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef8dc3d3
    • Dan Carpenter's avatar
      iio: adis16400: Fix an error code in adis16400_initial_setup() · cd124fcd
      Dan Carpenter authored
      commit a71266e4 upstream.
      
      This is to silence a new Smatch warning:
      
          drivers/iio/imu/adis16400.c:492 adis16400_initial_setup()
          warn: sscanf doesn't return error codes
      
      If the condition "if (st->variant->flags & ADIS16400_HAS_SLOW_MODE) {"
      is false then we return 1 instead of returning 0 and probe will fail.
      
      Fixes: 72a868b3
      
       ("iio: imu: check sscanf return value")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: <Stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/YCwgFb3JVG6qrlQ+@mwanda
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd124fcd
    • Jim Lin's avatar
      usb: gadget: configfs: Fix KASAN use-after-free · 394ca034
      Jim Lin authored
      commit 98f153a1
      
       upstream.
      
      When gadget is disconnected, running sequence is like this.
      . composite_disconnect
      . Call trace:
        usb_string_copy+0xd0/0x128
        gadget_config_name_configuration_store+0x4
        gadget_config_name_attr_store+0x40/0x50
        configfs_write_file+0x198/0x1f4
        vfs_write+0x100/0x220
        SyS_write+0x58/0xa8
      . configfs_composite_unbind
      . configfs_composite_bind
      
      In configfs_composite_bind, it has
      "cn->strings.s = cn->configuration;"
      
      When usb_string_copy is invoked. it would
      allocate memory, copy input string, release previous pointed memory space,
      and use new allocated memory.
      
      When gadget is connected, host sends down request to get information.
      Call trace:
        usb_gadget_get_string+0xec/0x168
        lookup_string+0x64/0x98
        composite_setup+0xa34/0x1ee8
      
      If gadget is disconnected and connected quickly, in the failed case,
      cn->configuration memory has been released by usb_string_copy kfree but
      configfs_composite_bind hasn't been run in time to assign new allocated
      "cn->configuration" pointer to "cn->strings.s".
      
      When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
      memory is accessed, "BUG: KASAN: use-after-free" error occurs.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJim Lin <jilin@nvidia.com>
      Signed-off-by: default avatarMacpaul Lin <macpaul.lin@mediatek.com>
      Link: https://lore.kernel.org/r/1615444961-13376-1-git-send-email-macpaul.lin@mediatek.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      394ca034
    • Macpaul Lin's avatar
      USB: replace hardcode maximum usb string length by definition · fe0c1502
      Macpaul Lin authored
      commit 81c74628
      
       upstream.
      
      Replace hardcoded maximum USB string length (126 bytes) by definition
      "USB_MAX_STRING_LEN".
      
      Signed-off-by: default avatarMacpaul Lin <macpaul.lin@mediatek.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@mediatek.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe0c1502
    • Dan Carpenter's avatar
      scsi: lpfc: Fix some error codes in debugfs · debd6657
      Dan Carpenter authored
      commit 19f1bc7e upstream.
      
      If copy_from_user() or kstrtoull() fail then the correct behavior is to
      return a negative error code.
      
      Link: https://lore.kernel.org/r/YEsbU/UxYypVrC7/@mwanda
      Fixes: f9bb2da1
      
       ("[SCSI] lpfc 8.3.27: T10 additions for SLI4")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      debd6657
    • Pavel Skripkin's avatar
      net/qrtr: fix __netdev_alloc_skb call · 18905249
      Pavel Skripkin authored
      commit 093b036a
      
       upstream.
      
      syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
      It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(),
      which tries to allocate skb. Since the value comes from the untrusted source
      there is no need to raise a warning in __alloc_pages_nodemask().
      
      [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
      Call Trace:
       __alloc_pages include/linux/gfp.h:511 [inline]
       __alloc_pages_node include/linux/gfp.h:524 [inline]
       alloc_pages_node include/linux/gfp.h:538 [inline]
       kmalloc_large_node+0x60/0x110 mm/slub.c:3999
       __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
       __kmalloc_reserve net/core/skbuff.c:150 [inline]
       __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
       __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
       netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
       qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
       qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write+0x426/0x650 fs/read_write.c:518
       vfs_write+0x791/0xa30 fs/read_write.c:605
       ksys_write+0x12d/0x250 fs/read_write.c:658
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: default avatar <syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Acked-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18905249
    • Daniel Kobras's avatar
      sunrpc: fix refcount leak for rpc auth modules · b932f081
      Daniel Kobras authored
      commit f1442d63
      
       upstream.
      
      If an auth module's accept op returns SVC_CLOSE, svc_process_common()
      enters a call path that does not call svc_authorise() before leaving the
      function, and thus leaks a reference on the auth module's refcount. Hence,
      make sure calls to svc_authenticate() and svc_authorise() are paired for
      all call paths, to make sure rpc auth modules can be unloaded.
      
      Signed-off-by: default avatarDaniel Kobras <kobras@puzzle-itc.de>
      Fixes: 4d712ef1
      
       ("svcauth_gss: Close connection when dropping an incoming message")
      Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b932f081
    • Timo Rothenpieler's avatar
      svcrdma: disable timeouts on rdma backchannel · edf3c31a
      Timo Rothenpieler authored
      commit 6820bf77
      
       upstream.
      
      This brings it in line with the regular tcp backchannel, which also has
      all those timeouts disabled.
      
      Prevents the backchannel from timing out, getting some async operations
      like server side copying getting stuck indefinitely on the client side.
      
      Signed-off-by: default avatarTimo Rothenpieler <timo@rothenpieler.org>
      Fixes: 5d252f90
      
       ("svcrdma: Add class for RDMA backwards direction transport")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edf3c31a
    • Joe Korty's avatar
      NFSD: Repair misuse of sv_lock in 5.10.16-rt30. · 47de8475
      Joe Korty authored
      commit c7de87ff upstream.
      
      [ This problem is in mainline, but only rt has the chops to be
      able to detect it. ]
      
      Lockdep reports a circular lock dependency between serv->sv_lock and
      softirq_ctl.lock on system shutdown, when using a kernel built with
      CONFIG_PREEMPT_RT=y, and a nfs mount exists.
      
      This is due to the definition of spin_lock_bh on rt:
      
      	local_bh_disable();
      	rt_spin_lock(lock);
      
      which forces a softirq_ctl.lock -> serv->sv_lock dependency.  This is
      not a problem as long as _every_ lock of serv->sv_lock is a:
      
      	spin_lock_bh(&serv->sv_lock);
      
      but there is one of the form:
      
      	spin_lock(&serv->sv_lock);
      
      This is what is causing the circular dependency splat.  The spin_lock()
      grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable.
      If later on in the critical region,  someone does a local_bh_disable, we
      get a serv->sv_lock -> softirq_ctrl.lock dependency established.  Deadlock.
      
      Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no
      exceptions.
      
      [  OK  ] Stopped target NFS client services.
               Stopping Logout off all iSCSI sessions on shutdown...
               Stopping NFS server and services...
      [  109.442380]
      [  109.442385] ======================================================
      [  109.442386] WARNING: possible circular locking dependency detected
      [  109.442387] 5.10.16-rt30 #1 Not tainted
      [  109.442389] ------------------------------------------------------
      [  109.442390] nfsd/1032 is trying to acquire lock:
      [  109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270
      [  109.442405]
      [  109.442405] but task is already holding lock:
      [  109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90
      [  109.442415]
      [  109.442415] which lock already depends on the new lock.
      [  109.442415]
      [  109.442416]
      [  109.442416] the existing dependency chain (in reverse order) is:
      [  109.442417]
      [  109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}:
      [  109.442421]        rt_spin_lock+0x2b/0xc0
      [  109.442428]        svc_add_new_perm_xprt+0x42/0xa0
      [  109.442430]        svc_addsock+0x135/0x220
      [  109.442434]        write_ports+0x4b3/0x620
      [  109.442438]        nfsctl_transaction_write+0x45/0x80
      [  109.442440]        vfs_write+0xff/0x420
      [  109.442444]        ksys_write+0x4f/0xc0
      [  109.442446]        do_syscall_64+0x33/0x40
      [  109.442450]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  109.442454]
      [  109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}:
      [  109.442457]        __lock_acquire+0x1264/0x20b0
      [  109.442463]        lock_acquire+0xc2/0x400
      [  109.442466]        rt_spin_lock+0x2b/0xc0
      [  109.442469]        __local_bh_disable_ip+0xd9/0x270
      [  109.442471]        svc_xprt_do_enqueue+0xc0/0x4d0
      [  109.442474]        svc_close_list+0x60/0x90
      [  109.442476]        svc_close_net+0x49/0x1a0
      [  109.442478]        svc_shutdown_net+0x12/0x40
      [  109.442480]        nfsd_destroy+0xc5/0x180
      [  109.442482]        nfsd+0x1bc/0x270
      [  109.442483]        kthread+0x194/0x1b0
      [  109.442487]        ret_from_fork+0x22/0x30
      [  109.442492]
      [  109.442492] other info that might help us debug this:
      [  109.442492]
      [  109.442493]  Possible unsafe locking scenario:
      [  109.442493]
      [  109.442493]        CPU0                    CPU1
      [  109.442494]        ----                    ----
      [  109.442495]   lock(&serv->sv_lock);
      [  109.442496]                                lock((softirq_ctrl.lock).lock);
      [  109.442498]                                lock(&serv->sv_lock);
      [  109.442499]   lock((softirq_ctrl.lock).lock);
      [  109.442501]
      [  109.442501]  *** DEADLOCK ***
      [  109.442501]
      [  109.442501] 3 locks held by nfsd/1032:
      [  109.442503]  #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270
      [  109.442508]  #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90
      [  109.442512]  #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0
      [  109.442518]
      [  109.442518] stack backtrace:
      [  109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1
      [  109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015
      [  109.442524] Call Trace:
      [  109.442527]  dump_stack+0x77/0x97
      [  109.442533]  check_noncircular+0xdc/0xf0
      [  109.442546]  __lock_acquire+0x1264/0x20b0
      [  109.442553]  lock_acquire+0xc2/0x400
      [  109.442564]  rt_spin_lock+0x2b/0xc0
      [  109.442570]  __local_bh_disable_ip+0xd9/0x270
      [  109.442573]  svc_xprt_do_enqueue+0xc0/0x4d0
      [  109.442577]  svc_close_list+0x60/0x90
      [  109.442581]  svc_close_net+0x49/0x1a0
      [  109.442585]  svc_shutdown_net+0x12/0x40
      [  109.442588]  nfsd_destroy+0xc5/0x180
      [  109.442590]  nfsd+0x1bc/0x270
      [  109.442595]  kthread+0x194/0x1b0
      [  109.442600]  ret_from_fork+0x22/0x30
      [  109.518225] nfsd: last server has exited, flushing export cache
      [  OK  ] Stopped NFSv4 ID-name mapping service.
      [  OK  ] Stopped GSSAPI Proxy Daemon.
      [  OK  ] Stopped NFS Mount Daemon.
      [  OK  ] Stopped NFS status monitor for NFSv2/3 locking..
      
      Fixes: 719f8bcc
      
       ("svcrpc: fix xpt_list traversal locking on shutdown")
      Signed-off-by: default avatarJoe Korty <joe.korty@concurrent-rt.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47de8475
    • Sagi Grimberg's avatar
      nvmet: don't check iosqes,iocqes for discovery controllers · 8daf2ab0
      Sagi Grimberg authored
      commit d218a8a3 upstream.
      
      From the base spec, Figure 78:
      
        "Controller Configuration, these fields are defined as parameters to
         configure an "I/O Controller (IOC)" and not to configure a "Discovery
         Controller (DC).
      
         ...
         If the controller does not support I/O queues, then this field shall
         be read-only with a value of 0h
      
      Just perform this check for I/O controllers.
      
      Fixes: a07b4970
      
       ("nvmet: add a generic NVMe target")
      Reported-by: default avatarBelanger, Martin <Martin.Belanger@dell.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8daf2ab0
    • Filipe Manana's avatar
      btrfs: fix race when cloning extent buffer during rewind of an old root · ca403b79
      Filipe Manana authored
      commit dbcc7d57
      
       upstream.
      
      While resolving backreferences, as part of a logical ino ioctl call or
      fiemap, we can end up hitting a BUG_ON() when replaying tree mod log
      operations of a root, triggering a stack trace like the following:
      
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.c:1210!
        invalid opcode: 0000 [#1] SMP KASAN PTI
        CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G        W         5.11.0-2d11c0084b02-misc-next+ #89
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
        Code: 05 48 8d 74 10 (...)
        RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297
        RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6
        RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c
        RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20
        R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee
        R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c
        FS:  00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0
        Call Trace:
         btrfs_search_old_slot+0x265/0x10d0
         ? lock_acquired+0xbb/0x600
         ? btrfs_search_slot+0x1090/0x1090
         ? free_extent_buffer.part.61+0xd7/0x140
         ? free_extent_buffer+0x13/0x20
         resolve_indirect_refs+0x3e9/0xfc0
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? add_prelim_ref.part.11+0x150/0x150
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? lock_acquired+0xbb/0x600
         ? __kasan_check_write+0x14/0x20
         ? do_raw_spin_unlock+0xa8/0x140
         ? rb_insert_color+0x30/0x360
         ? prelim_ref_insert+0x12d/0x430
         find_parent_nodes+0x5c3/0x1830
         ? resolve_indirect_refs+0xfc0/0xfc0
         ? lock_release+0xc8/0x620
         ? fs_reclaim_acquire+0x67/0xf0
         ? lock_acquire+0xc7/0x510
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x160/0x210
         ? lock_release+0xc8/0x620
         ? fs_reclaim_acquire+0x67/0xf0
         ? lock_acquire+0xc7/0x510
         ? poison_range+0x38/0x40
         ? unpoison_range+0x14/0x40
         ? trace_hardirqs_on+0x55/0x120
         btrfs_find_all_roots_safe+0x142/0x1e0
         ? find_parent_nodes+0x1830/0x1830
         ? btrfs_inode_flags_to_xflags+0x50/0x50
         iterate_extent_inodes+0x20e/0x580
         ? tree_backref_for_extent+0x230/0x230
         ? lock_downgrade+0x3d0/0x3d0
         ? read_extent_buffer+0xdd/0x110
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? lock_acquired+0xbb/0x600
         ? __kasan_check_write+0x14/0x20
         ? _raw_spin_unlock+0x22/0x30
         ? __kasan_check_write+0x14/0x20
         iterate_inodes_from_logical+0x129/0x170
         ? iterate_inodes_from_logical+0x129/0x170
         ? btrfs_inode_flags_to_xflags+0x50/0x50
         ? iterate_extent_inodes+0x580/0x580
         ? __vmalloc_node+0x92/0xb0
         ? init_data_container+0x34/0xb0
         ? init_data_container+0x34/0xb0
         ? kvmalloc_node+0x60/0x80
         btrfs_ioctl_logical_to_ino+0x158/0x230
         btrfs_ioctl+0x205e/0x4040
         ? __might_sleep+0x71/0xe0
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? getrusage+0x4b6/0x9c0
         ? __kasan_check_read+0x11/0x20
         ? lock_release+0xc8/0x620
         ? __might_fault+0x64/0xd0
         ? lock_acquire+0xc7/0x510
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? __kasan_check_read+0x11/0x20
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? __kasan_check_read+0x11/0x20
         ? lock_release+0xc8/0x620
         ? __task_pid_nr_ns+0xd3/0x250
         ? lock_acquire+0xc7/0x510
         ? __fget_files+0x160/0x230
         ? __fget_light+0xf2/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fd1976e2427
        Code: 00 00 90 48 8b 05 (...)
        RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427
        RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004
        RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120
        R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004
        R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8
        Modules linked in:
        ---[ end trace ec8931a1c36e57be ]---
      
        (gdb) l *(__tree_mod_log_rewind+0x3b1)
        0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210).
        1205                     * the modification. as we're going backwards, we do the
        1206                     * opposite of each operation here.
        1207                     */
        1208                    switch (tm->op) {
        1209                    case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
        1210                            BUG_ON(tm->slot < n);
        1211                            fallthrough;
        1212                    case MOD_LOG_KEY_REMOVE_WHILE_MOVING:
        1213                    case MOD_LOG_KEY_REMOVE:
        1214                            btrfs_set_node_key(eb, &tm->key, tm->slot);
      
      Here's what happens to hit that BUG_ON():
      
      1) We have one tree mod log user (through fiemap or the logical ino ioctl),
         with a sequence number of 1, so we have fs_info->tree_mod_seq == 1;
      
      2) Another task is at ctree.c:balance_level() and we have eb X currently as
         the root of the tree, and we promote its single child, eb Y, as the new
         root.
      
         Then, at ctree.c:balance_level(), we call:
      
            tree_mod_log_insert_root(eb X, eb Y, 1);
      
      3) At tree_mod_log_insert_root() we create tree mod log elements for each
         slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each
         with a ->logical pointing to ebX->start. These are placed in an array
         named tm_list.
         Lets assume there are N elements (N pointers in eb X);
      
      4) Then, still at tree_mod_log_insert_root(), we create a tree mod log
         element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to
         ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set
         to the level of eb X and ->generation set to the generation of eb X;
      
      5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
         tm_list as argument. After that, tree_mod_log_free_eb() calls
         __tree_mod_log_insert() for each member of tm_list in reverse order,
         from highest slot in eb X, slot N - 1, to slot 0 of eb X;
      
      6) __tree_mod_log_insert() sets the sequence number of each given tree mod
         log operation - it increments fs_info->tree_mod_seq and sets
         fs_info->tree_mod_seq as the sequence number of the given tree mod log
         operation.
      
         This means that for the tm_list created at tree_mod_log_insert_root(),
         the element corresponding to slot 0 of eb X has the highest sequence
         number (1 + N), and the element corresponding to the last slot has the
         lowest sequence number (2);
      
      7) Then, after inserting tm_list's elements into the tree mod log rbtree,
         the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest
         sequence number, which is N + 2;
      
      8) Back to ctree.c:balance_level(), we free eb X by calling
         btrfs_free_tree_block() on it. Because eb X was created in the current
         transaction, has no other references and writeback did not happen for
         it, we add it back to the free space cache/tree;
      
      9) Later some other task T allocates the metadata extent from eb X, since
         it is marked as free space in the space cache/tree, and uses it as a
         node for some other btree;
      
      10) The tree mod log user task calls btrfs_search_old_slot(), which calls
          get_old_root(), and finally that calls __tree_mod_log_oldest_root()
          with time_seq == 1 and eb_root == eb Y;
      
      11) First iteration of the while loop finds the tree mod log element with
          sequence number N + 2, for the logical address of eb Y and of type
          MOD_LOG_ROOT_REPLACE;
      
      12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out
          of the loop, and set root_logical to point to tm->old_root.logical
          which corresponds to the logical address of eb X;
      
      13) On the next iteration of the while loop, the call to
          tree_mod_log_search_oldest() returns the smallest tree mod log element
          for the logical address of eb X, which has a sequence number of 2, an
          operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to
          the old slot N - 1 of eb X (eb X had N items in it before being freed);
      
      14) We then break out of the while loop and return the tree mod log operation
          of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of
          eb X, to get_old_root();
      
      15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation
          and set "logical" to the logical address of eb X, which was the old
          root. We then call tree_mod_log_search() passing it the logical
          address of eb X and time_seq == 1;
      
      16) Then before calling tree_mod_log_search(), task T adds a key to eb X,
          which results in adding a tree mod log operation of type
          MOD_LOG_KEY_ADD to the tree mod log - this is done at
          ctree.c:insert_ptr() - but after adding the tree mod log operation
          and before updating the number of items in eb X from 0 to 1...
      
      17) The task at get_old_root() calls tree_mod_log_search() and gets the
          tree mod log operation of type MOD_LOG_KEY_ADD just added by task T.
          Then it enters the following if branch:
      
          if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
             (...)
          } (...)
      
          Calls read_tree_block() for eb X, which gets a reference on eb X but
          does not lock it - task T has it locked.
          Then it clones eb X while it has nritems set to 0 in its header, before
          task T sets nritems to 1 in eb X's header. From hereupon we use the
          clone of eb X which no other task has access to;
      
      18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD
          mod log operation we just got from tree_mod_log_search() in the
          previous step and the cloned version of eb X;
      
      19) At __tree_mod_log_rewind(), we set the local variable "n" to the number
          of items set in eb X's clone, which is 0. Then we enter the while loop,
          and in its first iteration we process the MOD_LOG_KEY_ADD operation,
          which just decrements "n" from 0 to (u32)-1, since "n" is declared with
          a type of u32. At the end of this iteration we call rb_next() to find the
          next tree mod log operation for eb X, that gives us the mod log operation
          of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence
          number of N + 1 (steps 3 to 6);
      
      20) Then we go back to the top of the while loop and trigger the following
          BUG_ON():
      
              (...)
              switch (tm->op) {
              case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                       BUG_ON(tm->slot < n);
                       fallthrough;
              (...)
      
          Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0.
      
      Fix this by taking a read lock on the extent buffer before cloning it at
      ctree.c:get_old_root(). This should be done regardless of the extent
      buffer having been freed and reused, as a concurrent task might be
      modifying it (while holding a write lock on it).
      
      Reported-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/
      Fixes: 834328a8
      
       ("Btrfs: tree mod log's old roots could still be part of the tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca403b79
    • Jacob Keller's avatar
      ixgbe: prevent ptp_rx_hang from running when in FILTER_ALL mode · 13f759dc
      Jacob Keller authored
      commit 6704a3ab upstream.
      
      On hardware which supports timestamping all packets, the timestamps are
      recorded in the packet buffer, and the driver no longer uses or reads
      the registers. This makes the logic for checking and clearing Rx
      timestamp hangs meaningless.
      
      If we run the ixgbe_ptp_rx_hang() function in this case, then the driver
      will continuously spam the log output with "Clearing Rx timestamp hang".
      These messages are spurious, and confusing to end users.
      
      The original code in commit a9763f3c ("ixgbe: Update PTP to support
      X550EM_x devices", 2015-12-03) did have a flag PTP_RX_TIMESTAMP_IN_REGISTER
      which was intended to be used to avoid the Rx timestamp hang check,
      however it did not actually check the flag before calling the function.
      
      Do so now in order to stop the checks and prevent the spurious log
      messages.
      
      Fixes: a9763f3c
      
       ("ixgbe: Update PTP to support X550EM_x devices", 2015-12-03)
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWen Yang <wenyang@linux.alibaba.com>
      13f759dc
    • Jacob Keller's avatar
      ixgbe: check for Tx timestamp timeouts during watchdog · c31cd5fd
      Jacob Keller authored
      commit 622a2ef5
      
       upstream.
      
      The ixgbe driver has logic to handle only one Tx timestamp at a time,
      using a state bit lock to avoid multiple requests at once.
      
      It may be possible, if incredibly unlikely, that a Tx timestamp event is
      requested but never completes. Since we use an interrupt scheme to
      determine when the Tx timestamp occurred we would never clear the state
      bit in this case.
      
      Add an ixgbe_ptp_tx_hang() function similar to the already existing
      ixgbe_ptp_rx_hang() function. This function runs in the watchdog routine
      and makes sure we eventually recover from this case instead of
      permanently disabling Tx timestamps.
      
      Note: there is no currently known way to cause this without hacking the
      driver code to force it.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c31cd5fd
    • Florian Fainelli's avatar
      net: dsa: b53: Support setting learning on port · c7f1c92a
      Florian Fainelli authored
      commit f9b3827e
      
       upstream.
      
      Add support for being able to set the learning attribute on port, and
      make sure that the standalone ports start up with learning disabled.
      
      We can remove the code in bcm_sf2 that configured the ports learning
      attribute because we want the standalone ports to have learning disabled
      by default and port 7 cannot be bridged, so its learning attribute will
      not change past its initial configuration.
      
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      c7f1c92a
    • Jan Kara's avatar
      ext4: check journal inode extents more carefully · 58ef3832
      Jan Kara authored
      commit ce9f24cc upstream.
      
      Currently, system zones just track ranges of block, that are "important"
      fs metadata (bitmaps, group descriptors, journal blocks, etc.). This
      however complicates how extent tree (or indirect blocks) can be checked
      for inodes that actually track such metadata - currently the journal
      inode but arguably we should be treating quota files or resize inode
      similarly. We cannot run __ext4_ext_check() on such metadata inodes when
      loading their extents as that would immediately trigger the validity
      checks and so we just hack around that and special-case the journal
      inode. This however leads to a situation that a journal inode which has
      extent tree of depth at least one can have invalid extent tree that gets
      unnoticed until ext4_cache_extents() crashes.
      
      To overcome this limitation, track inode number each system zone belongs
      to (0 is used for zones not belonging to any inode). We can then verify
      inode number matches the expected one when verifying extent tree and
      thus avoid the false errors. With this there's no need to to
      special-case journal inode during extent tree checking anymore so remove
      it.
      
      Fixes: 0a944e8a
      
       ("ext4: don't perform block validity checks on the journal inode")
      Reported-by: default avatarWolfgang Frisch <wolfgang.frisch@suse.com>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-4-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58ef3832
    • Jan Kara's avatar
      ext4: don't allow overlapping system zones · f175d637
      Jan Kara authored
      commit bf9a379d
      
       upstream.
      
      Currently, add_system_zone() just silently merges two added system zones
      that overlap. However the overlap should not happen and it generally
      suggests that some unrelated metadata overlap which indicates the fs is
      corrupted. We should have caught such problems earlier (e.g. in
      ext4_check_descriptors()) but add this check as another line of defense.
      In later patch we also use this for stricter checking of journal inode
      extent tree.
      
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-3-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f175d637
    • Jan Kara's avatar
      ext4: handle error of ext4_setup_system_zone() on remount · d01b5fc0
      Jan Kara authored
      commit d176b1f6
      
       upstream.
      
      ext4_setup_system_zone() can fail. Handle the failure in ext4_remount().
      
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d01b5fc0
  2. Mar 17, 2021