Skip to content
  1. Sep 03, 2021
    • Riccardo Mancini's avatar
      perf env: Fix memory leak of bpf_prog_info_linear member · 0d8e39bb
      Riccardo Mancini authored
      commit 67069a1f
      
       upstream.
      
      ASan reported a memory leak caused by info_linear not being deallocated.
      
      The info_linear was allocated during in perf_event__synthesize_one_bpf_prog().
      
      This patch adds the corresponding free() when bpf_prog_info_node
      is freed in perf_env__purge_bpf().
      
        $ sudo ./perf record -- sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]
      
        =================================================================
        ==297735==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 7688 byte(s) in 19 object(s) allocated from:
            #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f)
            #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16
            #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16
            #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9
            #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8
            #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8
            #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8
            #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
            #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
            #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2
            #10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3
            #11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16
      
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d8e39bb
    • Guo Ren's avatar
      riscv: Fixup patch_text panic in ftrace · 133d7f93
      Guo Ren authored
      commit 5ad84adf upstream.
      
      Just like arm64, we can't trace the function in the patch_text path.
      
      Here is the bug log:
      
      [   45.234334] Unable to handle kernel paging request at virtual address ffffffd38ae80900
      [   45.242313] Oops [#1]
      [   45.244600] Modules linked in:
      [   45.247678] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-00025-g9b7db83-dirty #215
      [   45.255797] epc: ffffffe00021689a ra : ffffffe00021718e sp : ffffffe01afabb58
      [   45.262955]  gp : ffffffe00136afa0 tp : ffffffe01af94d00 t0 : 0000000000000002
      [   45.270200]  t1 : 0000000000000000 t2 : 0000000000000001 s0 : ffffffe01afabc08
      [   45.277443]  s1 : ffffffe0013718a8 a0 : 0000000000000000 a1 : ffffffe01afabba8
      [   45.284686]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : c4c16ad38ae80900
      [   45.291929]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
      [   45.299173]  s2 : 0000000000000001 s3 : ffffffe000206a60 s4 : ffffffe000206a60
      [   45.306415]  s5 : 00000000000009ec s6 : ffffffe0013718a8 s7 : c4c16ad38ae80900
      [   45.313658]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001
      [   45.320902]  s11: 0000000000000003 t3 : 0000000000000001 t4 : ffffffffd192fe79
      [   45.328144]  t5 : ffffffffb8f80000 t6 : 0000000000040000
      [   45.333472] status: 0000000200000100 badaddr: ffffffd38ae80900 cause: 000000000000000f
      [   45.341514] ---[ end trace d95102172248fdcf ]---
      [   45.346176] note: migration/0[11] exited with preempt_count 1
      
      (gdb) x /2i $pc
      => 0xffffffe00021689a <__do_proc_dointvec+196>: sd      zero,0(s7)
         0xffffffe00021689e <__do_proc_dointvec+200>: li      s11,0
      
      (gdb) bt
      0  __do_proc_dointvec (tbl_data=0x0, table=0xffffffe01afabba8,
      write=0, buffer=0x0, lenp=0x7bf897061f9a0800, ppos=0x4, conv=0x0,
      data=0x52464e43) at kernel/sysctl.c:581
      1  0xffffffe00021718e in do_proc_dointvec (data=<optimized out>,
      conv=<optimized out>, ppos=<optimized out>, lenp=<optimized out>,
      buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
      at kernel/sysctl.c:964
      2  proc_dointvec_minmax (ppos=<optimized out>, lenp=<optimized out>,
      buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
      at kernel/sysctl.c:964
      3  proc_do_static_key (table=<optimized out>, write=1, buffer=0x0,
      lenp=0x0, ppos=0x7bf897061f9a0800) at kernel/sysctl.c:1643
      4  0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
      addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
      5  0xffffffe0002c9c04 in __ftrace_replace_code
      (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
      6  0xffffffe0002ca0b2 in ftrace_replace_code (mod_flags=<optimized
      out>) at kernel/trace/ftrace.c:2530
      7  0xffffffe0002ca26a in ftrace_modify_all_code (command=5) at
      kernel/trace/ftrace.c:2677
      8  0xffffffe0002ca30e in __ftrace_modify_code (data=<optimized out>)
      at kernel/trace/ftrace.c:2703
      9  0xffffffe0002c13b0 in multi_cpu_stop (data=0x0) at kernel/stop_machine.c:224
      10 0xffffffe0002c0fde in cpu_stopper_thread (cpu=<optimized out>) at
      kernel/stop_machine.c:491
      11 0xffffffe0002343de in smpboot_thread_fn (data=0x0) at kernel/smpboot.c:165
      12 0xffffffe00022f8b4 in kthread (_create=0xffffffe01af0c040) at
      kernel/kthread.c:292
      13 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
      
         0xffffffe00020678a <+114>:   auipc   ra,0xffffe
         0xffffffe00020678e <+118>:   jalr    -118(ra) # 0xffffffe000204714 <patch_text_nosync>
         0xffffffe000206792 <+122>:   snez    a0,a0
      
      (gdb) disassemble patch_text_nosync
      Dump of assembler code for function patch_text_nosync:
         0xffffffe000204714 <+0>:     addi    sp,sp,-32
         0xffffffe000204716 <+2>:     sd      s0,16(sp)
         0xffffffe000204718 <+4>:     sd      ra,24(sp)
         0xffffffe00020471a <+6>:     addi    s0,sp,32
         0xffffffe00020471c <+8>:     auipc   ra,0x0
         0xffffffe000204720 <+12>:    jalr    -384(ra) # 0xffffffe00020459c <patch_insn_write>
         0xffffffe000204724 <+16>:    beqz    a0,0xffffffe00020472e <patch_text_nosync+26>
         0xffffffe000204726 <+18>:    ld      ra,24(sp)
         0xffffffe000204728 <+20>:    ld      s0,16(sp)
         0xffffffe00020472a <+22>:    addi    sp,sp,32
         0xffffffe00020472c <+24>:    ret
         0xffffffe00020472e <+26>:    sd      a0,-24(s0)
         0xffffffe000204732 <+30>:    auipc   ra,0x4
         0xffffffe000204736 <+34>:    jalr    -1464(ra) # 0xffffffe00020817a <flush_icache_all>
         0xffffffe00020473a <+38>:    ld      a0,-24(s0)
         0xffffffe00020473e <+42>:    ld      ra,24(sp)
         0xffffffe000204740 <+44>:    ld      s0,16(sp)
         0xffffffe000204742 <+46>:    addi    sp,sp,32
         0xffffffe000204744 <+48>:    ret
      
      (gdb) disassemble flush_icache_all-4
      Dump of assembler code for function flush_icache_all:
         0xffffffe00020817a <+0>:     addi    sp,sp,-8
         0xffffffe00020817c <+2>:     sd      ra,0(sp)
         0xffffffe00020817e <+4>:     auipc   ra,0xfffff
         0xffffffe000208182 <+8>:     jalr    -1822(ra) # 0xffffffe000206a60 <ftrace_caller>
         0xffffffe000208186 <+12>:    ld      ra,0(sp)
         0xffffffe000208188 <+14>:    addi    sp,sp,8
         0xffffffe00020818a <+0>:     addi    sp,sp,-16
         0xffffffe00020818c <+2>:     sd      s0,0(sp)
         0xffffffe00020818e <+4>:     sd      ra,8(sp)
         0xffffffe000208190 <+6>:     addi    s0,sp,16
         0xffffffe000208192 <+8>:     li      a0,0
         0xffffffe000208194 <+10>:    auipc   ra,0xfffff
         0xffffffe000208198 <+14>:    jalr    -410(ra) # 0xffffffe000206ffa <sbi_remote_fence_i>
         0xffffffe00020819c <+18>:    ld      s0,0(sp)
         0xffffffe00020819e <+20>:    ld      ra,8(sp)
         0xffffffe0002081a0 <+22>:    addi    sp,sp,16
         0xffffffe0002081a2 <+24>:    ret
      
      (gdb) frame 5
      (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
      2503                    return ftrace_make_call(rec, ftrace_addr);
      (gdb) p /x rec->ip
      $2 = 0xffffffe00020817a -> flush_icache_all !
      
      When we modified flush_icache_all's patchable-entry with ftrace_caller:
       - Insert ftrace_caller at flush_icache_all prologue.
       - Call flush_icache_all to sync I/Dcache, but flush_icache_all is
      just we modified by half.
      
      Link: https://lore.kernel.org/linux-riscv/CAJF2gTT=oDWesWe0JVWvTpGi60-gpbNhYLdFWN_5EbyeqoEDdw@mail.gmail.com/T/#t
      
      
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Reviewed-by: default avatarAtish Patra <atish.patra@wdc.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarDimitri John Ledkov <dimitri.ledkov@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      133d7f93
    • Guo Ren's avatar
      riscv: Fixup wrong ftrace remove cflag · 7e208724
      Guo Ren authored
      commit 67d94577
      
       upstream.
      
      We must use $(CC_FLAGS_FTRACE) instead of directly using -pg. It
      will cause -fpatchable-function-entry error.
      
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarDimitri John Ledkov <dimitri.ledkov@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e208724
    • Pauli Virtanen's avatar
      Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS · b42fde92
      Pauli Virtanen authored
      commit 55981d35 upstream.
      
      Some USB BT adapters don't satisfy the MTU requirement mentioned in
      commit e848dbd3
      
       ("Bluetooth: btusb: Add support USB ALT 3 for WBS")
      and have ALT 3 setting that produces no/garbled audio. Some adapters
      with larger MTU were also reported to have problems with ALT 3.
      
      Add a flag and check it and MTU before selecting ALT 3, falling back to
      ALT 1. Enable the flag for Realtek, restoring the previous behavior for
      non-Realtek devices.
      
      Tested with USB adapters (mtu<72, no/garbled sound with ALT3, ALT1
      works) BCM20702A1 0b05:17cb, CSR8510A10 0a12:0001, and (mtu>=72, ALT3
      works) RTL8761BU 0bda:8771, Intel AX200 8087:0029 (after disabling
      ALT6). Also got reports for (mtu>=72, ALT 3 reported to produce bad
      audio) Intel 8087:0a2b.
      
      Signed-off-by: default avatarPauli Virtanen <pav@iki.fi>
      Fixes: e848dbd3
      
       ("Bluetooth: btusb: Add support USB ALT 3 for WBS")
      Tested-by: default avatarMichał Kępień <kernel@kempniu.pl>
      Tested-by: default avatarJonathan Lampérth <jon@h4n.dev>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b42fde92
    • Linus Torvalds's avatar
      vt_kdsetmode: extend console locking · 60d69cb4
      Linus Torvalds authored
      commit 2287a51b
      
       upstream.
      
      As per the long-suffering comment.
      
      Reported-by: default avatarMinh Yuan <yuanmingbuaa@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60d69cb4
    • Xin Long's avatar
      tipc: call tipc_wait_for_connect only when dlen is not 0 · 0a178a01
      Xin Long authored
      commit 7387a72c
      
       upstream.
      
      __tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
      or tipc_connect(). The difference is in tipc_connect(), it will call
      tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
      is done. So there's no need to wait in __tipc_sendmsg() for this case.
      
      This patch is to fix it by calling tipc_wait_for_connect() only when dlen
      is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().
      
      Note this also fixes the failure in tipcutils/test/ptts/:
      
        # ./tipcTS &
        # ./tipcTC 9
        (hang)
      
      Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a178a01
    • Frieder Schrempf's avatar
      mtd: spinand: Fix incorrect parameters for on-die ECC · ded6da21
      Frieder Schrempf authored
      The new generic NAND ECC framework stores the configuration and
      requirements in separate places since commit 93ef92f6
      
       ("mtd: nand: Use
      the new generic ECC object"). In 5.10.x The SPI NAND layer still uses only
      the requirements to track the ECC properties. This mismatch leads to
      values of zero being used for ECC strength and step_size in the SPI NAND
      layer wherever nanddev_get_ecc_conf() is used and therefore breaks the SPI
      NAND on-die ECC support in 5.10.x.
      
      By using nanddev_get_ecc_requirements() instead of nanddev_get_ecc_conf()
      for SPI NAND, we make sure that the correct parameters for the detected
      chip are used. In later versions (5.11.x) this is fixed anyway with the
      implementation of the SPI NAND on-die ECC engine.
      
      Cc: stable@vger.kernel.org # 5.10.x
      Reported-by: default avatarvoice INTER connect GmbH <developer@voiceinterconnect.de>
      Signed-off-by: default avatarFrieder Schrempf <frieder.schrempf@kontron.de>
      Acked-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded6da21
    • Linus Torvalds's avatar
      pipe: do FASYNC notifications for every pipe IO, not just state changes · 3b2018f9
      Linus Torvalds authored
      commit fe67f4dd upstream.
      
      It turns out that the SIGIO/FASYNC situation is almost exactly the same
      as the EPOLLET case was: user space really wants to be notified after
      every operation.
      
      Now, in a perfect world it should be sufficient to only notify user
      space on "state transitions" when the IO state changes (ie when a pipe
      goes from unreadable to readable, or from unwritable to writable).  User
      space should then do as much as possible - fully emptying the buffer or
      what not - and we'll notify it again the next time the state changes.
      
      But as with EPOLLET, we have at least one case (stress-ng) where the
      kernel sent SIGIO due to the pipe being marked for asynchronous
      notification, but the user space signal handler then didn't actually
      necessarily read it all before returning (it read more than what was
      written, but since there could be multiple writes, it could leave data
      pending).
      
      The user space code then expected to get another SIGIO for subsequent
      writes - even though the pipe had been readable the whole time - and
      would only then read more.
      
      This is arguably a user space bug - and Colin King already fixed the
      stress-ng code in question - but the kernel regression rules are clear:
      it doesn't matter if kernel people think that user space did something
      silly and wrong.  What matters is that it used to work.
      
      So if user space depends on specific historical kernel behavior, it's a
      regression when that behavior changes.  It's on us: we were silly to
      have that non-optimal historical behavior, and our old kernel behavior
      was what user space was tested against.
      
      Because of how the FASYNC notification was tied to wakeup behavior, this
      was first broken by commits f467a6a6 and 1b6b26ae ("pipe: fix
      and clarify pipe read/write wakeup logic"), but at the time it seems
      nobody noticed.  Probably because the stress-ng problem case ends up
      being timing-dependent too.
      
      It was then unwittingly fixed by commit 3a34b13a ("pipe: make pipe
      writes always wake up readers") only to be broken again when by commit
      3b844826 ("pipe: avoid unnecessary EPOLLET wakeups under normal
      loads").
      
      And at that point the kernel test robot noticed the performance
      refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
      below is somewhat ad hoc, but it matches when the issue was noticed.
      
      Fix it for good (knock wood) by simply making the kill_fasync() case
      separate from the wakeup case.  FASYNC is quite rare, and we clearly
      shouldn't even try to use the "avoid unnecessary wakeups" logic for it.
      
      Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
      Fixes: 3b844826
      
       ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarOliver Sang <oliver.sang@intel.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b2018f9
    • Linus Torvalds's avatar
      pipe: avoid unnecessary EPOLLET wakeups under normal loads · e91da23c
      Linus Torvalds authored
      commit 3b844826 upstream.
      
      I had forgotten just how sensitive hackbench is to extra pipe wakeups,
      and commit 3a34b13a ("pipe: make pipe writes always wake up
      readers") ended up causing a quite noticeable regression on larger
      machines.
      
      Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
      not clear that this matters in real life all that much, but as Mel
      points out, it's used often enough when comparing kernels and so the
      performance regression shows up like a sore thumb.
      
      It's easy enough to fix at least for the common cases where pipes are
      used purely for data transfer, and you never have any exciting poll
      usage at all.  So set a special 'poll_usage' flag when there is polling
      activity, and make the ugly "EPOLLET has crazy legacy expectations"
      semantics explicit to only that case.
      
      I would love to limit it to just the broken EPOLLET case, but the pipe
      code can't see the difference between epoll and regular select/poll, so
      any non-read/write waiting will trigger the extra wakeup behavior.  That
      is sufficient for at least the hackbench case.
      
      Apart from making the odd extra wakeup cases more explicitly about
      EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
      write, not at the first write chunk.  That is actually much saner
      semantics (as much as you can call any of the legacy edge-triggered
      expectations for EPOLLET "sane") since it means that you know the wakeup
      will happen once the write is done, rather than possibly in the middle
      of one.
      
      [ For stable people: I'm putting a "Fixes" tag on this, but I leave it
        up to you to decide whether you actually want to backport it or not.
        It likely has no impact outside of synthetic benchmarks  - Linus ]
      
      Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
      Fixes: 3a34b13a
      
       ("pipe: make pipe writes always wake up readers")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarSandeep Patil <sspatil@android.com>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e91da23c
    • Filipe Manana's avatar
      btrfs: fix race between marking inode needs to be logged and log syncing · d845f89d
      Filipe Manana authored
      commit bc0939fc
      
       upstream.
      
      We have a race between marking that an inode needs to be logged, either
      at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between
      btrfs_sync_log(). The following steps describe how the race happens.
      
      1) We are at transaction N;
      
      2) Inode I was previously fsynced in the current transaction so it has:
      
          inode->logged_trans set to N;
      
      3) The inode's root currently has:
      
         root->log_transid set to 1
         root->last_log_commit set to 0
      
         Which means only one log transaction was committed to far, log
         transaction 0. When a log tree is created we set ->log_transid and
         ->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());
      
      4) One more range of pages is dirtied in inode I;
      
      5) Some task A starts an fsync against some other inode J (same root), and
         so it joins log transaction 1.
      
         Before task A calls btrfs_sync_log()...
      
      6) Task B starts an fsync against inode I, which currently has the full
         sync flag set, so it starts delalloc and waits for the ordered extent
         to complete before calling btrfs_inode_in_log() at btrfs_sync_file();
      
      7) During ordered extent completion we have btrfs_update_inode() called
         against inode I, which in turn calls btrfs_set_inode_last_trans(),
         which does the following:
      
           spin_lock(&inode->lock);
           inode->last_trans = trans->transaction->transid;
           inode->last_sub_trans = inode->root->log_transid;
           inode->last_log_commit = inode->root->last_log_commit;
           spin_unlock(&inode->lock);
      
         So ->last_trans is set to N and ->last_sub_trans set to 1.
         But before setting ->last_log_commit...
      
      8) Task A is at btrfs_sync_log():
      
         - it increments root->log_transid to 2
         - starts writeback for all log tree extent buffers
         - waits for the writeback to complete
         - writes the super blocks
         - updates root->last_log_commit to 1
      
         It's a lot of slow steps between updating root->log_transid and
         root->last_log_commit;
      
      9) The task doing the ordered extent completion, currently at
         btrfs_set_inode_last_trans(), then finally runs:
      
           inode->last_log_commit = inode->root->last_log_commit;
           spin_unlock(&inode->lock);
      
         Which results in inode->last_log_commit being set to 1.
         The ordered extent completes;
      
      10) Task B is resumed, and it calls btrfs_inode_in_log() which returns
          true because we have all the following conditions met:
      
          inode->logged_trans == N which matches fs_info->generation &&
          inode->last_subtrans (1) <= inode->last_log_commit (1) &&
          inode->last_subtrans (1) <= root->last_log_commit (1) &&
          list inode->extent_tree.modified_extents is empty
      
          And as a consequence we return without logging the inode, so the
          existing logged version of the inode does not point to the extent
          that was written after the previous fsync.
      
      It should be impossible in practice for one task be able to do so much
      progress in btrfs_sync_log() while another task is at
      btrfs_set_inode_last_trans() right after it reads root->log_transid and
      before it reads root->last_log_commit. Even if kernel preemption is enabled
      we know the task at btrfs_set_inode_last_trans() can not be preempted
      because it is holding the inode's spinlock.
      
      However there is another place where we do the same without holding the
      spinlock, which is in the memory mapped write path at:
      
        vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
        {
           (...)
           BTRFS_I(inode)->last_trans = fs_info->generation;
           BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
           BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
           (...)
      
      So with preemption happening after setting ->last_sub_trans and before
      setting ->last_log_commit, it is less of a stretch to have another task
      do enough progress at btrfs_sync_log() such that the task doing the memory
      mapped write ends up with ->last_sub_trans and ->last_log_commit set to
      the same value. It is still a big stretch to get there, as the task doing
      btrfs_sync_log() has to start writeback, wait for its completion and write
      the super blocks.
      
      So fix this in two different ways:
      
      1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the
         value of ->last_sub_trans minus 1;
      
      2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just
         like we do for buffered and direct writes at btrfs_file_write_iter(),
         which is all we need to make sure multiple writes and fsyncs to an
         inode in the same transaction never result in an fsync missing that
         the inode changed and needs to be logged. Turn this into a helper
         function and use it both at btrfs_page_mkwrite() and at
         btrfs_file_write_iter() - this also fixes the problem that at
         btrfs_page_mkwrite() we were setting those fields without the
         protection of the inode's spinlock.
      
      This is an extremely unlikely race to happen in practice.
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d845f89d
    • Gerd Rausch's avatar
      net/rds: dma_map_sg is entitled to merge entries · 6f38d95f
      Gerd Rausch authored
      [ Upstream commit fb4b1373 ]
      
      Function "dma_map_sg" is entitled to merge adjacent entries
      and return a value smaller than what was passed as "nents".
      
      Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
      rather than the original "nents" parameter ("sg_len").
      
      This old RDS bug was exposed and reliably causes kernel panics
      (using RDMA operations "rds-stress -D") on x86_64 starting with:
      commit c588072b
      
       ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
      
      Simply put: Linux 5.11 and later.
      
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f38d95f
    • Ben Skeggs's avatar
      drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences · b882dda2
      Ben Skeggs authored
      [ Upstream commit e78b1b54
      
       ]
      
      Should fix some initial modeset failures on (at least) Ampere boards.
      
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b882dda2
    • Ben Skeggs's avatar
      drm/nouveau/disp: power down unused DP links during init · 7f422cda
      Ben Skeggs authored
      [ Upstream commit 6eaa1f3c
      
       ]
      
      When booted with multiple displays attached, the EFI GOP driver on (at
      least) Ampere, can leave DP links powered up that aren't being used to
      display anything.  This confuses our tracking of SOR routing, with the
      likely result being a failed modeset and display engine hang.
      
      Fix this by (ab?)using the DisableLT IED script to power-down the link,
      restoring HW to a state the driver expects.
      
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7f422cda
    • Mark Yacoub's avatar
      drm: Copy drm_wait_vblank to user before returning · 6fd6e205
      Mark Yacoub authored
      [ Upstream commit fa0b1ef5
      
       ]
      
      [Why]
      Userspace should get back a copy of drm_wait_vblank that's been modified
      even when drm_wait_vblank_ioctl returns a failure.
      
      Rationale:
      drm_wait_vblank_ioctl modifies the request and expects the user to read
      it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates
      the sequence to become current_vblank_count + sequence (which was
      RELATIVE), but now it became ABSOLUTE.
      drmWaitVBlank (in libdrm) expects this to be the case as it modifies
      the request to be Absolute so it expects the sequence to would have been
      updated.
      
      The change is in compat_drm_wait_vblank, which is called by
      drm_compat_ioctl. This change of copying the data back regardless of the
      return number makes it en par with drm_ioctl, which always copies the
      data before returning.
      
      [How]
      Return from the function after everything has been copied to user.
      
      Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible
      Tested on ChromeOS Trogdor(msm)
      
      Reviewed-by: default avatarMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: default avatarMark Yacoub <markyacoub@chromium.org>
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210812194917.1703356-1-markyacoub@chromium.org
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fd6e205
    • Ming Lei's avatar
      blk-mq: don't grab rq's refcount in blk_mq_check_expired() · 26ee94ba
      Ming Lei authored
      [ Upstream commit c797b40c
      
       ]
      
      Inside blk_mq_queue_tag_busy_iter() we already grabbed request's
      refcount before calling ->fn(), so needn't to grab it one more time
      in blk_mq_check_expired().
      
      Meantime remove extra request expire check in blk_mq_check_expired().
      
      Cc: Keith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Link: https://lore.kernel.org/r/20210811155202.629575-1-ming.lei@redhat.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      26ee94ba
    • Kenneth Feng's avatar
      drm/amd/pm: change the workload type for some cards · b00ca567
      Kenneth Feng authored
      [ Upstream commit 93c5701b
      
       ]
      
      change the workload type for some cards as it is needed.
      
      Signed-off-by: default avatarKenneth Feng <kenneth.feng@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b00ca567
    • Kenneth Feng's avatar
      Revert "drm/amd/pm: fix workload mismatch on vega10" · 3c37ec43
      Kenneth Feng authored
      [ Upstream commit 2fd31689 ]
      
      This reverts commit 0979d432
      
      .
      Revert this because it does not apply to all the cards.
      
      Signed-off-by: default avatarKenneth Feng <kenneth.feng@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c37ec43
    • Shai Malin's avatar
      qed: Fix null-pointer dereference in qed_rdma_create_qp() · cc126b40
      Shai Malin authored
      [ Upstream commit d33d19d3
      
       ]
      
      Fix a possible null-pointer dereference in qed_rdma_create_qp().
      
      Changes from V2:
      - Revert checkpatch fixes.
      
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cc126b40
    • Shai Malin's avatar
      qed: qed ll2 race condition fixes · 18a65ba0
      Shai Malin authored
      [ Upstream commit 37110237
      
       ]
      
      Avoiding qed ll2 race condition and NULL pointer dereference as part
      of the remove and recovery flows.
      
      Changes form V1:
      - Change (!p_rx->set_prod_addr).
      - qed_ll2.c checkpatch fixes.
      
      Change from V2:
      - Revert "qed_ll2.c checkpatch fixes".
      
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      18a65ba0
    • Michael S. Tsirkin's avatar
      tools/virtio: fix build · 4ac9c81e
      Michael S. Tsirkin authored
      [ Upstream commit a24ce06c
      
       ]
      
      We use a spinlock now so add a stub.
      Ignore bogus uninitialized variable warnings.
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ac9c81e
    • Neeraj Upadhyay's avatar
      vringh: Use wiov->used to check for read/write desc order · c7ee4d22
      Neeraj Upadhyay authored
      [ Upstream commit e74cfa91
      
       ]
      
      As __vringh_iov() traverses a descriptor chain, it populates
      each descriptor entry into either read or write vring iov
      and increments that iov's ->used member. So, as we iterate
      over a descriptor chain, at any point, (riov/wriov)->used
      value gives the number of descriptor enteries available,
      which are to be read or written by the device. As all read
      iovs must precede the write iovs, wiov->used should be zero
      when we are traversing a read descriptor. Current code checks
      for wiov->i, to figure out whether any previous entry in the
      current descriptor chain was a write descriptor. However,
      iov->i is only incremented, when these vring iovs are consumed,
      at a later point, and remain 0 in __vringh_iov(). So, correct
      the check for read and write descriptor order, to use
      wiov->used.
      
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeaurora.org
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c7ee4d22
    • Vincent Whitchurch's avatar
      virtio_vdpa: reject invalid vq indices · 6c074eaa
      Vincent Whitchurch authored
      [ Upstream commit cb5d2c1f
      
       ]
      
      Do not call vDPA drivers' callbacks with vq indicies larger than what
      the drivers indicate that they support.  vDPA drivers do not bounds
      check the indices.
      
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Link: https://lore.kernel.org/r/20210701114652.21956-1-vincent.whitchurch@axis.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6c074eaa
    • Parav Pandit's avatar
      virtio_pci: Support surprise removal of virtio pci device · 0698278e
      Parav Pandit authored
      [ Upstream commit 43bb40c5
      
       ]
      
      When a virtio pci device undergo surprise removal (aka async removal in
      PCIe spec), mark the device as broken so that any upper layer drivers can
      abort any outstanding operation.
      
      When a virtio net pci device undergo surprise removal which is used by a
      NetworkManager, a below call trace was observed.
      
      kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
      watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
      CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
      Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
      Workqueue: events linkwatch_event
      RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
      Call Trace:
       virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
       ? __hw_addr_create_ex+0x85/0xc0
       __dev_mc_add+0x72/0x80
       igmp6_group_added+0xa7/0xd0
       ipv6_mc_up+0x3c/0x60
       ipv6_find_idev+0x36/0x80
       addrconf_add_dev+0x1e/0xa0
       addrconf_dev_config+0x71/0x130
       addrconf_notify+0x1f5/0xb40
       ? rtnl_is_locked+0x11/0x20
       ? __switch_to_asm+0x42/0x70
       ? finish_task_switch+0xaf/0x2c0
       ? raw_notifier_call_chain+0x3e/0x50
       raw_notifier_call_chain+0x3e/0x50
       netdev_state_change+0x67/0x90
       linkwatch_do_dev+0x3c/0x50
       __linkwatch_run_queue+0xd2/0x220
       linkwatch_event+0x21/0x30
       process_one_work+0x1c8/0x370
       worker_thread+0x30/0x380
       ? process_one_work+0x370/0x370
       kthread+0x118/0x140
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x1f/0x30
      
      Hence, add the ability to abort the command on surprise removal
      which prevents infinite loop and system lockup.
      
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0698278e
    • Parav Pandit's avatar
      virtio: Improve vq->broken access to avoid any compiler optimization · 065a13c2
      Parav Pandit authored
      [ Upstream commit 60f07798
      
       ]
      
      Currently vq->broken field is read by virtqueue_is_broken() in busy
      loop in one context by virtnet_send_command().
      
      vq->broken is set to true in other process context by
      virtio_break_device(). Reader and writer are accessing it without any
      synchronization. This may lead to a compiler optimization which may
      result to optimize reading vq->broken only once.
      
      Hence, force reading vq->broken on each invocation of
      virtqueue_is_broken() and also force writing it so that such
      update is visible to the readers.
      
      It is a theoretical fix that isn't yet encountered in the field.
      
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Link: https://lore.kernel.org/r/20210721142648.1525924-2-parav@nvidia.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      065a13c2
    • Thara Gopinath's avatar
      cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev · f41c7462
      Thara Gopinath authored
      [ Upstream commit 5d79e5ce
      
       ]
      
      The Qualcomm sm8150 platform uses the qcom-cpufreq-hw driver, so
      add it to the cpufreq-dt-platdev driver's blocklist.
      
      Signed-off-by: default avatarThara Gopinath <thara.gopinath@linaro.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f41c7462
    • Michał Mirosław's avatar
      opp: remove WARN when no valid OPPs remain · 3dea9315
      Michał Mirosław authored
      [ Upstream commit 335ffab3
      
       ]
      
      This WARN can be triggered per-core and the stack trace is not useful.
      Replace it with plain dev_err(). Fix a comment while at it.
      
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3dea9315
    • Johannes Berg's avatar
      iwlwifi: pnvm: accept multiple HW-type TLVs · be37f7db
      Johannes Berg authored
      [ Upstream commit 0f673c16
      
       ]
      
      Some products (So) may have two different types of products
      with different mac-type that are otherwise equivalent, and
      have the same PNVM data, so the PNVM file will contain two
      (or perhaps later more) HW-type TLVs. Accept the file and
      use the data section that contains any matching entry.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210719140154.a6a86e903035.Ic0b1b75c45d386698859f251518e8a5144431938@changeid
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      be37f7db
    • Adam Ford's avatar
      clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference · 9a6a5602
      Adam Ford authored
      [ Upstream commit 1669a941
      
       ]
      
      The probe was manually passing NULL instead of dev to devm_clk_hw_register.
      This caused a Unable to handle kernel NULL pointer dereference error.
      Fix this by passing 'dev'.
      
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Fixes: a20a40a8
      
       ("clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe()")
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a6a5602
    • Colin Ian King's avatar
      perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32 · bdc5049c
      Colin Ian King authored
      [ Upstream commit 0b3a8738 ]
      
      The u32 variable pci_dword is being masked with 0x1fffffff and then left
      shifted 23 places. The shift is a u32 operation,so a value of 0x200 or
      more in pci_dword will overflow the u32 and only the bottow 32 bits
      are assigned to addr. I don't believe this was the original intent.
      Fix this by casting pci_dword to a resource_size_t to ensure no
      overflow occurs.
      
      Note that the mask and 12 bit left shift operation does not need this
      because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit
      value.
      
      Fixes: ee49532b
      
       ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge")
      Addresses-Coverity: ("Unintentional integer overflow")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bdc5049c
    • Rob Herring's avatar
      dt-bindings: sifive-l2-cache: Fix 'select' matching · c5600b91
      Rob Herring authored
      [ Upstream commit 1c8094e3 ]
      
      When the schema fixups are applied to 'select' the result is a single
      entry is required for a match, but that will never match as there should
      be 2 entries. Also, a 'select' schema should have the widest possible
      match, so use 'contains' which matches the compatible string(s) in any
      position and not just the first position.
      
      Fixes: 993dcfac
      
       ("dt-bindings: riscv: sifive-l2-cache: convert bindings to json-schema")
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5600b91
    • Jerome Brunet's avatar
      usb: gadget: u_audio: fix race condition on endpoint stop · ad5329a5
      Jerome Brunet authored
      [ Upstream commit 068fdad2 ]
      
      If the endpoint completion callback is call right after the ep_enabled flag
      is cleared and before usb_ep_dequeue() is call, we could do a double free
      on the request and the associated buffer.
      
      Fix this by clearing ep_enabled after all the endpoint requests have been
      dequeued.
      
      Fixes: 7de8681b
      
       ("usb: gadget: u_audio: Free requests only after callback")
      Cc: stable <stable@vger.kernel.org>
      Reported-by: default avatarThinh Nguyen <Thinh.Nguyen@synopsys.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Link: https://lore.kernel.org/r/20210827092927.366482-1-jbrunet@baylibre.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad5329a5
    • Matthew Brost's avatar
      drm/i915: Fix syncmap memory leak · 257ea8a5
      Matthew Brost authored
      [ Upstream commit a63bcf08
      
       ]
      
      A small race exists between intel_gt_retire_requests_timeout and
      intel_timeline_exit which could result in the syncmap not getting
      free'd. Rather than work to hard to seal this race, simply cleanup the
      syncmap on fini.
      
      unreferenced object 0xffff88813bc53b18 (size 96):
        comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
          00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
        backtrace:
          [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
          [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
          [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
          [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
          [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
          [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
          [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
          [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
          [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
          [<0000000076c362a4>] do_syscall_64+0x33/0x80
          [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Fixes: 531958f6
      
       ("drm/i915/gt: Track timeline activeness in enter/exit")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com
      (cherry picked from commit faf89098
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      257ea8a5
    • Wong Vee Khee's avatar
      net: stmmac: fix kernel panic due to NULL pointer dereference of plat->est · e49b8d9c
      Wong Vee Khee authored
      [ Upstream commit 82a44ae1 ]
      
      In the case of taprio offload is not enabled, the error handling path
      causes a kernel crash due to kernel NULL pointer deference.
      
      Fix this by adding check for NULL before attempt to access 'plat->est'
      on the mutex_lock() call.
      
      The following kernel panic is observed without this patch:
      
      RIP: 0010:mutex_lock+0x10/0x20
      Call Trace:
      tc_setup_taprio+0x482/0x560 [stmmac]
      kmem_cache_alloc_trace+0x13f/0x490
      taprio_disable_offload.isra.0+0x9d/0x180 [sch_taprio]
      taprio_destroy+0x6c/0x100 [sch_taprio]
      qdisc_create+0x2e5/0x4f0
      tc_modify_qdisc+0x126/0x740
      rtnetlink_rcv_msg+0x12b/0x380
      _raw_spin_lock_irqsave+0x19/0x40
      _raw_spin_unlock_irqrestore+0x18/0x30
      create_object+0x212/0x340
      rtnl_calcit.isra.0+0x110/0x110
      netlink_rcv_skb+0x50/0x100
      netlink_unicast+0x191/0x230
      netlink_sendmsg+0x243/0x470
      sock_sendmsg+0x5e/0x60
      ____sys_sendmsg+0x20b/0x280
      copy_msghdr_from_user+0x5c/0x90
      __mod_memcg_state+0x87/0xf0
       ___sys_sendmsg+0x7c/0xc0
      lru_cache_add+0x7f/0xa0
      _raw_spin_unlock+0x16/0x30
      wp_page_copy+0x449/0x890
      handle_mm_fault+0x921/0xfc0
      __sys_sendmsg+0x59/0xa0
      do_syscall_64+0x33/0x40
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ---[ end trace b1f19b24368a96aa ]---
      
      Fixes: b60189e0
      
       ("net: stmmac: Integrate EST with TAPRIO scheduler API")
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e49b8d9c
    • Xiaoliang Yang's avatar
      net: stmmac: add mutex lock to protect est parameters · b2091d47
      Xiaoliang Yang authored
      [ Upstream commit b2aae654
      
       ]
      
      Add a mutex lock to protect est structure parameters so that the
      EST parameters can be updated by other threads.
      
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPavel Machek (CIP) <pavel@denx.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b2091d47
    • Ulf Hansson's avatar
      Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711" · ac874290
      Ulf Hansson authored
      [ Upstream commit 885814a9 ]
      
      This reverts commit 419dd626
      
      .
      
      It turned out that the change from the reverted commit breaks the ACPI
      based rpi's because it causes the 100Mhz max clock to be overridden to the
      return from sdhci_iproc_get_max_clock(), which is 0 because there isn't a
      OF/DT based clock device.
      
      Reported-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Fixes: 419dd626
      
       ("mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711")
      Acked-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac874290
    • Guangbin Huang's avatar
      net: hns3: fix get wrong pfc_en when query PFC configuration · 411680a0
      Guangbin Huang authored
      [ Upstream commit 8c1671e0 ]
      
      Currently, when query PFC configuration by dcbtool, driver will return
      PFC enable status based on TC. As all priorities are mapped to TC0 by
      default, if TC0 is enabled, then all priorities mapped to TC0 will be
      shown as enabled status when query PFC setting, even though some
      priorities have never been set.
      
      for example:
      $ dcb pfc show dev eth0
      pfc-cap 4 macsec-bypass off delay 0
      prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
      $ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
      $ dcb pfc show dev eth0
      pfc-cap 4 macsec-bypass off delay 0
      prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on
      
      To fix this problem, just returns user's PFC config parameter saved in
      driver.
      
      Fixes: cacde272
      
       ("net: hns3: Add hclge_dcb module for the support of DCB feature")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      411680a0
    • Guojia Liao's avatar
      net: hns3: fix duplicate node in VLAN list · e834ca7c
      Guojia Liao authored
      [ Upstream commit 94391fae ]
      
      VLAN list should not be added duplicate VLAN node, otherwise it would
      cause "add failed" when restore VLAN from VLAN list, so this patch adds
      VLAN ID check before adding node into VLAN list.
      
      Fixes: c6075b19
      
       ("net: hns3: Record VF vlan tables")
      Signed-off-by: default avatarGuojia Liao <liaoguojia@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e834ca7c
    • Yufeng Mo's avatar
      net: hns3: add waiting time before cmdq memory is released · 5931ec35
      Yufeng Mo authored
      [ Upstream commit a96d9330 ]
      
      After the cmdq registers are cleared, the firmware may take time to
      clear out possible left over commands in the cmdq. Driver must release
      cmdq memory only after firmware has completed processing of left over
      commands.
      
      Fixes: 232d0d55
      
       ("net: hns3: uninitialize command queue while unloading PF driver")
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5931ec35
    • Yufeng Mo's avatar
      net: hns3: clear hardware resource when loading driver · 9820af16
      Yufeng Mo authored
      [ Upstream commit 1a6d2819 ]
      
      If a PF is bonded to a virtual machine and the virtual machine exits
      unexpectedly, some hardware resource cannot be cleared. In this case,
      loading driver may cause exceptions. Therefore, the hardware resource
      needs to be cleared when the driver is loaded.
      
      Fixes: 46a3df9f
      
       ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9820af16
    • Andrey Ignatov's avatar
      rtnetlink: Return correct error on changing device netns · ad0db838
      Andrey Ignatov authored
      [ Upstream commit 96a6b93b ]
      
      Currently when device is moved between network namespaces using
      RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
      IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
      target namespace already has device with same name, userspace will get
      EINVAL what is confusing and makes debugging harder.
      
      Fix it so that userspace gets more appropriate EEXIST instead what makes
      debugging much easier.
      
      Before:
      
        # ./ifname.sh
        + ip netns add ns0
        + ip netns exec ns0 ip link add l0 type dummy
        + ip netns exec ns0 ip link show l0
        8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
        + ip link add l0 type dummy
        + ip link show l0
        10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
        + ip link set l0 netns ns0
        RTNETLINK answers: Invalid argument
      
      After:
      
        # ./ifname.sh
        + ip netns add ns0
        + ip netns exec ns0 ip link add l0 type dummy
        + ip netns exec ns0 ip link show l0
        8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
        + ip link add l0 type dummy
        + ip link show l0
        10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
        + ip link set l0 netns ns0
        RTNETLINK answers: File exists
      
      The problem is that do_setlink() passes its `char *ifname` argument,
      that it gets from a caller, to __dev_change_net_namespace() as is (as
      `const char *pat`), but semantics of ifname and pat can be different.
      
      For example, __rtnl_newlink() does this:
      
      net/core/rtnetlink.c
          3270	char ifname[IFNAMSIZ];
           ...
          3286	if (tb[IFLA_IFNAME])
          3287		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
          3288	else
          3289		ifname[0] = '\0';
           ...
          3364	if (dev) {
           ...
          3394		return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
          3395	}
      
      , i.e. do_setlink() gets ifname pointer that is always valid no matter
      if user specified IFLA_IFNAME or not and then do_setlink() passes this
      ifname pointer as is to __dev_change_net_namespace() as pat argument.
      
      But the pat (pattern) in __dev_change_net_namespace() is used as:
      
      net/core/dev.c
         11198	err = -EEXIST;
         11199	if (__dev_get_by_name(net, dev->name)) {
         11200		/* We get here if we can't use the current device name */
         11201		if (!pat)
         11202			goto out;
         11203		err = dev_get_valid_name(net, dev, pat);
         11204		if (err < 0)
         11205			goto out;
         11206	}
      
      As the result the `goto out` path on line 11202 is neven taken and
      instead of returning EEXIST defined on line 11198,
      __dev_change_net_namespace() returns an error from dev_get_valid_name()
      and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.
      
      Fixes: d8a5ec67
      
       ("[NET]: netlink support for moving devices between network namespaces.")
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad0db838