Skip to content
  1. Dec 21, 2023
    • John Fastabend's avatar
      net: tls, update curr on splice as well · 9b3d3a7f
      John Fastabend authored
      commit c5a59500 upstream.
      
      The curr pointer must also be updated on the splice similar to how
      we do this for other copy types.
      
      Fixes: d829e9c4
      
       ("tls: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b3d3a7f
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Have rb_time_cmpxchg() set the msb counter too · 869aee35
      Steven Rostedt (Google) authored
      commit 0aa0e528 upstream.
      
      The rb_time_cmpxchg() on 32-bit architectures requires setting three
      32-bit words to represent the 64-bit timestamp, with some salt for
      synchronization. Those are: msb, top, and bottom
      
      The issue is, the rb_time_cmpxchg() did not properly salt the msb portion,
      and the msb that was written was stale.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231215084114.20899342@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: f03f2abc
      
       ("ring-buffer: Have 32 bit time stamps use all 64 bits")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      869aee35
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Do not try to put back write_stamp · c425a772
      Steven Rostedt (Google) authored
      commit dd939425 upstream.
      
      If an update to an event is interrupted by another event between the time
      the initial event allocated its buffer and where it wrote to the
      write_stamp, the code try to reset the write stamp back to the what it had
      just overwritten. It knows that it was overwritten via checking the
      before_stamp, and if it didn't match what it wrote to the before_stamp
      before it allocated its space, it knows it was overwritten.
      
      To put back the write_stamp, it uses the before_stamp it read. The problem
      here is that by writing the before_stamp to the write_stamp it makes the
      two equal again, which means that the write_stamp can be considered valid
      as the last timestamp written to the ring buffer. But this is not
      necessarily true. The event that interrupted the event could have been
      interrupted in a way that it was interrupted as well, and can end up
      leaving with an invalid write_stamp. But if this happens and returns to
      this context that uses the before_stamp to update the write_stamp again,
      it can possibly incorrectly make it valid, causing later events to have in
      correct time stamps.
      
      As it is OK to leave this function with an invalid write_stamp (one that
      doesn't match the before_stamp), there's no reason to try to make it valid
      again in this case. If this race happens, then just leave with the invalid
      write_stamp and the next event to come along will just add a absolute
      timestamp and validate everything again.
      
      Bonus points: This gets rid of another cmpxchg64!
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231214222921.193037a7@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Vincent Donnefort <vdonnefort@google.com>
      Fixes: a389d86f
      
       ("ring-buffer: Have nested events still record running time stamp")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c425a772
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs · b15cf148
      Steven Rostedt (Google) authored
      commit fff88fa0 upstream.
      
      Mathieu Desnoyers pointed out an issue in the rb_time_cmpxchg() for 32 bit
      architectures. That is:
      
       static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
       {
      	unsigned long cnt, top, bottom, msb;
      	unsigned long cnt2, top2, bottom2, msb2;
      	u64 val;
      
      	/* The cmpxchg always fails if it interrupted an update */
      	 if (!__rb_time_read(t, &val, &cnt2))
      		 return false;
      
      	 if (val != expect)
      		 return false;
      
      <<<< interrupted here!
      
      	 cnt = local_read(&t->cnt);
      
      The problem is that the synchronization counter in the rb_time_t is read
      *after* the value of the timestamp is read. That means if an interrupt
      were to come in between the value being read and the counter being read,
      it can change the value and the counter and the interrupted process would
      be clueless about it!
      
      The counter needs to be read first and then the value. That way it is easy
      to tell if the value is stale or not. If the counter hasn't been updated,
      then the value is still good.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231211201324.652870-1-mathieu.desnoyers@efficios.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20231212115301.7a9c9a64@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Fixes: 10464b4a
      
       ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit")
      Reported-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b15cf148
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix writing to the buffer with max_data_size · edbc03d6
      Steven Rostedt (Google) authored
      commit b3ae7b67 upstream.
      
      The maximum ring buffer data size is the maximum size of data that can be
      recorded on the ring buffer. Events must be smaller than the sub buffer
      data size minus any meta data. This size is checked before trying to
      allocate from the ring buffer because the allocation assumes that the size
      will fit on the sub buffer.
      
      The maximum size was calculated as the size of a sub buffer page (which is
      currently PAGE_SIZE minus the sub buffer header) minus the size of the
      meta data of an individual event. But it missed the possible adding of a
      time stamp for events that are added long enough apart that the event meta
      data can't hold the time delta.
      
      When an event is added that is greater than the current BUF_MAX_DATA_SIZE
      minus the size of a time stamp, but still less than or equal to
      BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking
      for a page that can hold the event. Luckily, there's a check for this loop
      and after 1000 iterations and a warning is emitted and the ring buffer is
      disabled. But this should never happen.
      
      This can happen when a large event is added first, or after a long period
      where an absolute timestamp is prefixed to the event, increasing its size
      by 8 bytes. This passes the check and then goes into the algorithm that
      causes the infinite loop.
      
      For events that are the first event on the sub-buffer, it does not need to
      add a timestamp, because the sub-buffer itself contains an absolute
      timestamp, and adding one is redundant.
      
      The fix is to check if the event is to be the first event on the
      sub-buffer, and if it is, then do not add a timestamp.
      
      This also fixes 32 bit adding a timestamp when a read of before_stamp or
      write_stamp is interrupted. There's still no need to add that timestamp if
      the event is going to be the first event on the sub buffer.
      
      Also, if the buffer has "time_stamp_abs" set, then also check if the
      length plus the timestamp is greater than the BUF_MAX_DATA_SIZE.
      
      Link: https://lore.kernel.org/all/20231212104549.58863438@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20231212071837.5fdd6c13@gandalf.local.home
      Link: https://lore.kernel.org/linux-trace-kernel/20231212111617.39e02849@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: a4543a2f ("ring-buffer: Get timestamp after event is allocated")
      Fixes: 58fbc3c6
      
       ("ring-buffer: Consolidate add_timestamp to remove some branches")
      Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC)
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edbc03d6
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Have saved event hold the entire event · 6d98d594
      Steven Rostedt (Google) authored
      commit b0495258 upstream.
      
      For the ring buffer iterator (non-consuming read), the event needs to be
      copied into the iterator buffer to make sure that a writer does not
      overwrite it while the user is reading it. If a write happens during the
      copy, the buffer is simply discarded.
      
      But the temp buffer itself was not big enough. The allocation of the
      buffer was only BUF_MAX_DATA_SIZE, which is the maximum data size that can
      be passed into the ring buffer and saved. But the temp buffer needs to
      hold the meta data as well. That would be BUF_PAGE_SIZE and not
      BUF_MAX_DATA_SIZE.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231212072558.61f76493@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 785888c5
      
       ("ring-buffer: Have rb_iter_head_event() handle concurrent writer")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d98d594
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Do not update before stamp when switching sub-buffers · 7888b607
      Steven Rostedt (Google) authored
      commit 9e45e39d upstream.
      
      The ring buffer timestamps are synchronized by two timestamp placeholders.
      One is the "before_stamp" and the other is the "write_stamp" (sometimes
      referred to as the "after stamp" but only in the comments. These two
      stamps are key to knowing how to handle nested events coming in with a
      lockless system.
      
      When moving across sub-buffers, the before stamp is updated but the write
      stamp is not. There's an effort to put back the before stamp to something
      that seems logical in case there's nested events. But as the current event
      is about to cross sub-buffers, and so will any new nested event that happens,
      updating the before stamp is useless, and could even introduce new race
      conditions.
      
      The first event on a sub-buffer simply uses the sub-buffer's timestamp
      and keeps a "delta" of zero. The "before_stamp" and "write_stamp" are not
      used in the algorithm in this case. There's no reason to try to fix the
      before_stamp when this happens.
      
      As a bonus, it removes a cmpxchg() when crossing sub-buffers!
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231211114420.36dde01b@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: a389d86f
      
       ("ring-buffer: Have nested events still record running time stamp")
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7888b607
    • Steven Rostedt (Google)'s avatar
      tracing: Update snapshot buffer on resize if it is allocated · 7043c461
      Steven Rostedt (Google) authored
      commit d06aff1c upstream.
      
      The snapshot buffer is to mimic the main buffer so that when a snapshot is
      needed, the snapshot and main buffer are swapped. When the snapshot buffer
      is allocated, it is set to the minimal size that the ring buffer may be at
      and still functional. When it is allocated it becomes the same size as the
      main ring buffer, and when the main ring buffer changes in size, it should
      do.
      
      Currently, the resize only updates the snapshot buffer if it's used by the
      current tracer (ie. the preemptirqsoff tracer). But it needs to be updated
      anytime it is allocated.
      
      When changing the size of the main buffer, instead of looking to see if
      the current tracer is utilizing the snapshot buffer, just check if it is
      allocated to know if it should be updated or not.
      
      Also fix typo in comment just above the code change.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: ad909e21
      
       ("tracing: Add internal tracing_snapshot() functions")
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7043c461
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix memory leak of free page · 31785cf8
      Steven Rostedt (Google) authored
      commit 17d80175 upstream.
      
      Reading the ring buffer does a swap of a sub-buffer within the ring buffer
      with a empty sub-buffer. This allows the reader to have full access to the
      content of the sub-buffer that was swapped out without having to worry
      about contention with the writer.
      
      The readers call ring_buffer_alloc_read_page() to allocate a page that
      will be used to swap with the ring buffer. When the code is finished with
      the reader page, it calls ring_buffer_free_read_page(). Instead of freeing
      the page, it stores it as a spare. Then next call to
      ring_buffer_alloc_read_page() will return this spare instead of calling
      into the memory management system to allocate a new page.
      
      Unfortunately, on freeing of the ring buffer, this spare page is not
      freed, and causes a memory leak.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231210221250.7b9cc83c@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 73a757e6
      
       ("ring-buffer: Return reader page back into existing ring buffer")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31785cf8
    • Paulo Alcantara's avatar
      smb: client: fix OOB in smb2_query_reparse_point() · 8c3b77ad
      Paulo Alcantara authored
      commit 3a42709f upstream.
      
      Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that
      their sum does not wrap to a number that is smaller than @reparse_buf
      and we end up with a wild pointer as follows:
      
        BUG: unable to handle page fault for address: ffff88809c5cd45f
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 4a01067 P4D 4a01067 PUD 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
        rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
        RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
        Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff
        49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b>
        04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff
        RSP: 0018:ffffc90000347a90 EFLAGS: 00010212
        RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec
        RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4
        RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100
        R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000
        FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000)
        knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0
        PKRU: 55555554
        Call Trace:
         <TASK>
         ? __die+0x23/0x70
         ? page_fault_oops+0x181/0x480
         ? search_module_extables+0x19/0x60
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? exc_page_fault+0x1b6/0x1c0
         ? asm_exc_page_fault+0x26/0x30
         ? _raw_spin_unlock_irqrestore+0x44/0x60
         ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
         cifs_get_fattr+0x16e/0xa50 [cifs]
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? lock_acquire+0xbf/0x2b0
         cifs_root_iget+0x163/0x5f0 [cifs]
         cifs_smb3_do_mount+0x5bd/0x780 [cifs]
         smb3_get_tree+0xd9/0x290 [cifs]
         vfs_get_tree+0x2c/0x100
         ? capable+0x37/0x70
         path_mount+0x2d7/0xb80
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? _raw_spin_unlock_irqrestore+0x44/0x60
         __x64_sys_mount+0x11a/0x150
         do_syscall_64+0x47/0xf0
         entry_SYSCALL_64_after_hwframe+0x6f/0x77
        RIP: 0033:0x7f02c08d5b1e
      
      Fixes: 2e4564b3
      
       ("smb3: add support for stat of WSL reparse points for special file types")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRobert Morris <rtm@csail.mit.edu>
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@manguebit.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c3b77ad
    • Paulo Alcantara's avatar
      smb: client: fix NULL deref in asn1_ber_decoder() · d8541c50
      Paulo Alcantara authored
      commit 90d025c2
      
       upstream.
      
      If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset,
      smb2_get_data_area() sets @len to non-zero but return NULL, so
      decode_negTokeninit() ends up being called with a NULL @security_blob:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
        RIP: 0010:asn1_ber_decoder+0x173/0xc80
        Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00
        RSP: 0018:ffffc9000063f950 EFLAGS: 00010202
        RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a
        RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000
        RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000
        R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000
        FS:  00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0
        PKRU: 55555554
        Call Trace:
         <TASK>
         ? __die+0x23/0x70
         ? page_fault_oops+0x181/0x480
         ? __stack_depot_save+0x1e6/0x480
         ? exc_page_fault+0x6f/0x1c0
         ? asm_exc_page_fault+0x26/0x30
         ? asn1_ber_decoder+0x173/0xc80
         ? check_object+0x40/0x340
         decode_negTokenInit+0x1e/0x30 [cifs]
         SMB2_negotiate+0xc99/0x17c0 [cifs]
         ? smb2_negotiate+0x46/0x60 [cifs]
         ? srso_alias_return_thunk+0x5/0xfbef5
         smb2_negotiate+0x46/0x60 [cifs]
         cifs_negotiate_protocol+0xae/0x130 [cifs]
         cifs_get_smb_ses+0x517/0x1040 [cifs]
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? queue_delayed_work_on+0x5d/0x90
         cifs_mount_get_session+0x78/0x200 [cifs]
         dfs_mount_share+0x13a/0x9f0 [cifs]
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? lock_acquire+0xbf/0x2b0
         ? find_nls+0x16/0x80
         ? srso_alias_return_thunk+0x5/0xfbef5
         cifs_mount+0x7e/0x350 [cifs]
         cifs_smb3_do_mount+0x128/0x780 [cifs]
         smb3_get_tree+0xd9/0x290 [cifs]
         vfs_get_tree+0x2c/0x100
         ? capable+0x37/0x70
         path_mount+0x2d7/0xb80
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? _raw_spin_unlock_irqrestore+0x44/0x60
         __x64_sys_mount+0x11a/0x150
         do_syscall_64+0x47/0xf0
         entry_SYSCALL_64_after_hwframe+0x6f/0x77
        RIP: 0033:0x7fce52c2ab1e
      
      Fix this by setting @len to zero when @off == 0 so callers won't
      attempt to dereference non-existing data areas.
      
      Reported-by: default avatarRobert Morris <rtm@csail.mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@manguebit.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8541c50
    • Paulo Alcantara's avatar
      smb: client: fix OOB in receive_encrypted_standard() · 9f528a8e
      Paulo Alcantara authored
      commit eec04ea1 upstream.
      
      Fix potential OOB in receive_encrypted_standard() if server returned a
      large shdr->NextCommand that would end up writing off the end of
      @next_buffer.
      
      Fixes: b24df3e3
      
       ("cifs: update receive_encrypted_standard to handle compounded responses")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRobert Morris <rtm@csail.mit.edu>
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@manguebit.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f528a8e
    • Ville Syrjälä's avatar
      drm/i915: Fix remapped stride with CCS on ADL+ · 7b0faa54
      Ville Syrjälä authored
      commit 0ccd963f
      
       upstream.
      
      On ADL+ the hardware automagically calculates the CCS AUX surface
      stride from the main surface stride, so when remapping we can't
      really play a lot of tricks with the main surface stride, or else
      the AUX surface stride would get miscalculated and no longer
      match the actual data layout in memory.
      
      Supposedly we could remap in 256 main surface tile units
      (AUX page(4096)/cachline(64)*4(4x1 main surface tiles per
      AUX cacheline)=256 main surface tiles), but the extra complexity
      is probably not worth the hassle.
      
      So let's just make sure our mapping stride is calculated from
      the full framebuffer stride (instead of the framebuffer width).
      This way the stride we program into PLANE_STRIDE will be the
      original framebuffer stride, and thus there will be no change
      to the AUX stride/layout.
      
      Cc: stable@vger.kernel.org
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231205180308.7505-1-ville.syrjala@linux.intel.com
      
      
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      (cherry picked from commit 2c12eb36
      
      )
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0faa54
    • Mario Limonciello's avatar
      drm/amd/display: Disable PSR-SU on Parade 0803 TCON again · 20907717
      Mario Limonciello authored
      commit e7ab7587 upstream.
      
      When screen brightness is rapidly changed and PSR-SU is enabled the
      display hangs on panels with this TCON even on the latest DCN 3.1.4
      microcode (0x8002a81 at this time).
      
      This was disabled previously as commit 072030b1 ("drm/amd: Disable
      PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17c ("Revert
      "drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for
      a new enough microcode (commit cd2e31a9
      
       ("drm/amd/display: Set minimum
      requirement for using PSR-SU on Phoenix")).
      
      As hangs are still happening specifically with this TCON, disable PSR-SU
      again for it until it can be root caused.
      
      Cc: stable@vger.kernel.org
      Cc: aaron.ma@canonical.com
      Cc: binli@gnome.org
      Cc: Marc Rossi <Marc.Rossi@amd.com>
      Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131
      
      
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20907717
    • Christian König's avatar
      drm/amdgpu: fix tear down order in amdgpu_vm_pt_free · a9e2de19
      Christian König authored
      commit ceb9a321
      
       upstream.
      
      When freeing PD/PT with shadows it can happen that the shadow
      destruction races with detaching the PD/PT from the VM causing a NULL
      pointer dereference in the invalidation code.
      
      Fix this by detaching the the PD/PT from the VM first and then
      freeing the shadow instead.
      
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867
      
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9e2de19
    • Boris Burkov's avatar
      btrfs: don't clear qgroup reserved bit in release_folio · 730b3322
      Boris Burkov authored
      commit a8680550
      
       upstream.
      
      The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
      duplicate reservations. That is two writes to that range in one
      transaction shouldn't create two reservations, as the reservation will
      only be freed once when the write finally goes down. Therefore, it is
      never OK to clear that bit without freeing the associated qgroup
      reserve. At this point, we don't want to be freeing the reserve, so mask
      off the bit.
      
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      730b3322
    • Boris Burkov's avatar
      btrfs: free qgroup reserve when ORDERED_IOERR is set · 9b670e1b
      Boris Burkov authored
      commit f63e1164
      
       upstream.
      
      An ordered extent completing is a critical moment in qgroup reserve
      handling, as the ownership of the reservation is handed off from the
      ordered extent to the delayed ref. In the happy path we release (unlock)
      but do not free (decrement counter) the reservation, and the delayed ref
      drives the free. However, on an error, we don't create a delayed ref,
      since there is no ref to add. Therefore, free on the error path.
      
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b670e1b
    • David Stevens's avatar
      mm/shmem: fix race in shmem_undo_range w/THP · da9b7c65
      David Stevens authored
      commit 55ac8bbe upstream.
      
      Split folios during the second loop of shmem_undo_range.  It's not
      sufficient to only split folios when dealing with partial pages, since
      it's possible for a THP to be faulted in after that point.  Calling
      truncate_inode_folio in that situation can result in throwing away data
      outside of the range being targeted.
      
      [akpm@linux-foundation.org: tidy up comment layout]
      Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com
      Fixes: b9a8a419
      
       ("truncate,shmem: Handle truncates that split large folios")
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9b7c65
    • Yu Zhao's avatar
      mm/mglru: fix underprotected page cache · 8ec07b06
      Yu Zhao authored
      commit 08148805 upstream.
      
      Unmapped folios accessed through file descriptors can be underprotected.
      Those folios are added to the oldest generation based on:
      
      1. The fact that they are less costly to reclaim (no need to walk the
         rmap and flush the TLB) and have less impact on performance (don't
         cause major PFs and can be non-blocking if needed again).
      2. The observation that they are likely to be single-use. E.g., for
         client use cases like Android, its apps parse configuration files
         and store the data in heap (anon); for server use cases like MySQL,
         it reads from InnoDB files and holds the cached data for tables in
         buffer pools (anon).
      
      However, the oldest generation can be very short lived, and if so, it
      doesn't provide the PID controller with enough time to respond to a surge
      of refaults.  (Note that the PID controller uses weighted refaults and
      those from evicted generations only take a half of the whole weight.) In
      other words, for a short lived generation, the moving average smooths out
      the spike quickly.
      
      To fix the problem:
      1. For folios that are already on LRU, if they can be beyond the
         tracking range of tiers, i.e., five accesses through file
         descriptors, move them to the second oldest generation to give them
         more time to age. (Note that tiers are used by the PID controller
         to statistically determine whether folios accessed multiple times
         through file descriptors are worth protecting.)
      2. When adding unmapped folios to LRU, adjust the placement of them so
         that they are not too close to the tail. The effect of this is
         similar to the above.
      
      On Android, launching 55 apps sequentially:
                                 Before     After      Change
        workingset_refault_anon  25641024   25598972   0%
        workingset_refault_file  115016834  106178438  -8%
      
      Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com
      Fixes: ac35a490
      
       ("mm: multi-gen LRU: minimal implementation")
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reported-by: default avatarCharan Teja Kalla <quic_charante@quicinc.com>
      Tested-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Cc: T.J. Mercier <tjmercier@google.com>
      Cc: Kairui Song <ryncsn@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ec07b06
    • Amelie Delaunay's avatar
      dmaengine: stm32-dma: avoid bitfield overflow assertion · 40f3ad76
      Amelie Delaunay authored
      commit 54bed6ba upstream.
      
      stm32_dma_get_burst() returns a negative error for invalid input, which
      gets turned into a large u32 value in stm32_dma_prep_dma_memcpy() that
      in turn triggers an assertion because it does not fit into a two-bit field:
      drivers/dma/stm32-dma.c: In function 'stm32_dma_prep_dma_memcpy':
      include/linux/compiler_types.h:354:38: error: call to '__compiletime_assert_282' declared with attribute error: FIELD_PREP: value too large for the field
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                               ^
         include/linux/compiler_types.h:335:4: note: in definition of macro '__compiletime_assert'
             prefix ## suffix();    \
             ^~~~~~
         include/linux/compiler_types.h:354:2: note: in expansion of macro '_compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^~~~~~~~~~~~~~~~~~~
         include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
          #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                              ^~~~~~~~~~~~~~~~~~
         include/linux/bitfield.h:68:3: note: in expansion of macro 'BUILD_BUG_ON_MSG'
            BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?  \
            ^~~~~~~~~~~~~~~~
         include/linux/bitfield.h:114:3: note: in expansion of macro '__BF_FIELD_CHECK'
            __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
            ^~~~~~~~~~~~~~~~
         drivers/dma/stm32-dma.c:1237:4: note: in expansion of macro 'FIELD_PREP'
             FIELD_PREP(STM32_DMA_SCR_PBURST_MASK, dma_burst) |
             ^~~~~~~~~~
      
      As an easy workaround, assume the error can happen, so try to handle this
      by failing stm32_dma_prep_dma_memcpy() before the assertion. It replicates
      what is done in stm32_dma_set_xfer_param() where stm32_dma_get_burst() is
      also used.
      
      Fixes: 1c32d6c3 ("dmaengine: stm32-dma: use bitfield helpers")
      Fixes: a2b6103b
      
       ("dmaengine: stm32-dma: Improve memory burst management")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAmelie Delaunay <amelie.delaunay@foss.st.com>
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202311060135.Q9eMnpCL-lkp@intel.com/
      Link: https://lore.kernel.org/r/20231106134832.1470305-1-amelie.delaunay@foss.st.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40f3ad76
    • Alex Deucher's avatar
      drm/amdgpu/sdma5.2: add begin/end_use ring callbacks · 78b2ba39
      Alex Deucher authored
      commit ab475033 upstream.
      
      Add begin/end_use ring callbacks to disallow GFXOFF when
      SDMA work is submitted and allow it again afterward.
      
      This should avoid corner cases where GFXOFF is erroneously
      entered when SDMA is still active.  For now just allow/disallow
      GFXOFF in the begin and end helpers until we root cause the
      issue.  This should not impact power as SDMA usage is pretty
      minimal and GFXOSS should not be active when SDMA is active
      anyway, this just makes it explicit.
      
      v2: move everything into sdma5.2 code.  No reason for this
      to be generic at this point.
      v3: Add comments in new code
      
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220
      
      
      Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
      Tested-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org # 5.15+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78b2ba39
    • Florent Revest's avatar
      team: Fix use-after-free when an option instance allocation fails · 6a1472d9
      Florent Revest authored
      commit c12296bb upstream.
      
      In __team_options_register, team_options are allocated and appended to
      the team's option_list.
      If one option instance allocation fails, the "inst_rollback" cleanup
      path frees the previously allocated options but doesn't remove them from
      the team's option_list.
      This leaves dangling pointers that can be dereferenced later by other
      parts of the team driver that iterate over options.
      
      This patch fixes the cleanup path to remove the dangling pointers from
      the list.
      
      As far as I can tell, this uaf doesn't have much security implications
      since it would be fairly hard to exploit (an attacker would need to make
      the allocation of that specific small object fail) but it's still nice
      to fix.
      
      Cc: stable@vger.kernel.org
      Fixes: 80f7c668
      
       ("team: add support for per-port options")
      Signed-off-by: default avatarFlorent Revest <revest@chromium.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a1472d9
    • James Houghton's avatar
      arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify · b01af928
      James Houghton authored
      commit 3c069607
      
       upstream.
      
      It is currently possible for a userspace application to enter an
      infinite page fault loop when using HugeTLB pages implemented with
      contiguous PTEs when HAFDBS is not available. This happens because:
      
      1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
         (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).
      
      2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
         the memory access on a system without HAFDBS, we will get a page
         fault.
      
      3. HugeTLB will check if it needs to update the dirty bits on the PTE.
         For contiguous PTEs, it will check to see if the pgprot bits need
         updating. In this case, HugeTLB wants to write a sequence of
         sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
         to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
         so it thinks no update is necessary.
      
      We can get the kernel to write a sw-dirty, hw-clean PTE with the
      following steps (showing the relevant VMA flags and pgprot bits):
      
      i.   Create a valid, writable contiguous PTE.
             VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
             VMA pgprot bits: PTE_RDONLY | PTE_WRITE
             PTE pgprot bits: PTE_DIRTY | PTE_WRITE
      
      ii.  mprotect the VMA to PROT_NONE.
             VMA vmflags:     VM_SHARED
             VMA pgprot bits: PTE_RDONLY
             PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
      
      iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
             VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
             VMA pgprot bits: PTE_RDONLY | PTE_WRITE
             PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY
      
      Make it impossible to create a writeable sw-dirty, hw-clean PTE with
      pte_modify(). Such a PTE should be impossible to create, and there may
      be places that assume that pte_dirty() implies pte_hw_dirty().
      
      Signed-off-by: default avatarJames Houghton <jthoughton@google.com>
      Fixes: 031e6e6b
      
       ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags")
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b01af928
    • Baokun Li's avatar
      ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS · 0b071a32
      Baokun Li authored
      commit 2dcf5fde upstream.
      
      For files with logical blocks close to EXT_MAX_BLOCKS, the file size
      predicted in ext4_mb_normalize_request() may exceed EXT_MAX_BLOCKS.
      This can cause some blocks to be preallocated that will not be used.
      And after [Fixes], the following issue may be triggered:
      
      =========================================================
       kernel BUG at fs/ext4/mballoc.c:4653!
       Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
       CPU: 1 PID: 2357 Comm: xfs_io 6.7.0-rc2-00195-g0f5cc96c367f
       Hardware name: linux,dummy-virt (DT)
       pc : ext4_mb_use_inode_pa+0x148/0x208
       lr : ext4_mb_use_inode_pa+0x98/0x208
       Call trace:
        ext4_mb_use_inode_pa+0x148/0x208
        ext4_mb_new_inode_pa+0x240/0x4a8
        ext4_mb_use_best_found+0x1d4/0x208
        ext4_mb_try_best_found+0xc8/0x110
        ext4_mb_regular_allocator+0x11c/0xf48
        ext4_mb_new_blocks+0x790/0xaa8
        ext4_ext_map_blocks+0x7cc/0xd20
        ext4_map_blocks+0x170/0x600
        ext4_iomap_begin+0x1c0/0x348
      =========================================================
      
      Here is a calculation when adjusting ac_b_ex in ext4_mb_new_inode_pa():
      
      	ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
      	if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
      		goto adjust_bex;
      
      The problem is that when orig_goal_end is subtracted from ac_b_ex.fe_len
      it is still greater than EXT_MAX_BLOCKS, which causes ex.fe_logical to
      overflow to a very small value, which ultimately triggers a BUG_ON in
      ext4_mb_new_inode_pa() because pa->pa_free < len.
      
      The last logical block of an actual write request does not exceed
      EXT_MAX_BLOCKS, so in ext4_mb_normalize_request() also avoids normalizing
      the last logical block to exceed EXT_MAX_BLOCKS to avoid the above issue.
      
      The test case in [Link] can reproduce the above issue with 64k block size.
      
      Link: https://patchwork.kernel.org/project/fstests/list/?series=804003
      Cc:  <stable@kernel.org> # 6.4
      Fixes: 93cdf49f
      
       ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231127063313.3734294-1-libaokun1@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b071a32
    • Krzysztof Kozlowski's avatar
      soundwire: stream: fix NULL pointer dereference for multi_link · f2955dd3
      Krzysztof Kozlowski authored
      commit e199bf52 upstream.
      
      If bus is marked as multi_link, but number of masters in the stream is
      not higher than bus->hw_sync_min_links (bus->multi_link && m_rt_count >=
      bus->hw_sync_min_links), bank switching should not happen.  The first
      part of do_bank_switch() code properly takes these conditions into
      account, but second part (sdw_ml_sync_bank_switch()) relies purely on
      bus->multi_link property.  This is not balanced and leads to NULL
      pointer dereference:
      
        Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
        ...
        Call trace:
         wait_for_completion_timeout+0x124/0x1f0
         do_bank_switch+0x370/0x6f8
         sdw_prepare_stream+0x2d0/0x438
         qcom_snd_sdw_prepare+0xa0/0x118
         sm8450_snd_prepare+0x128/0x148
         snd_soc_link_prepare+0x5c/0xe8
         __soc_pcm_prepare+0x28/0x1ec
         dpcm_be_dai_prepare+0x1e0/0x2c0
         dpcm_fe_dai_prepare+0x108/0x28c
         snd_pcm_do_prepare+0x44/0x68
         snd_pcm_action_single+0x54/0xc0
         snd_pcm_action_nonatomic+0xe4/0xec
         snd_pcm_prepare+0xc4/0x114
         snd_pcm_common_ioctl+0x1154/0x1cc0
         snd_pcm_ioctl+0x54/0x74
      
      Fixes: ce6e74d0
      
       ("soundwire: Add support for multi link bank switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Reviewed-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Link: https://lore.kernel.org/r/20231124180136.390621-1-krzysztof.kozlowski@linaro.org
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2955dd3
    • Josef Bacik's avatar
      btrfs: do not allow non subvolume root targets for snapshot · 56f76265
      Josef Bacik authored
      commit a8892fd7
      
       upstream.
      
      Our btrfs subvolume snapshot <source> <destination> utility enforces
      that <source> is the root of the subvolume, however this isn't enforced
      in the kernel.  Update the kernel to also enforce this limitation to
      avoid problems with other users of this ioctl that don't have the
      appropriate checks in place.
      
      Reported-by: default avatarMartin Michaelis <code@mgjm.de>
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarNeal Gompa <neal@gompa.dev>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56f76265
    • Mark Rutland's avatar
      perf: Fix perf_event_validate_size() lockdep splat · 557f7ad0
      Mark Rutland authored
      commit 7e2c1e4b upstream.
      
      When lockdep is enabled, the for_each_sibling_event(sibling, event)
      macro checks that event->ctx->mutex is held. When creating a new group
      leader event, we call perf_event_validate_size() on a partially
      initialized event where event->ctx is NULL, and so when
      for_each_sibling_event() attempts to check event->ctx->mutex, we get a
      splat, as reported by Lucas De Marchi:
      
        WARNING: CPU: 8 PID: 1471 at kernel/events/core.c:1950 __do_sys_perf_event_open+0xf37/0x1080
      
      This only happens for a new event which is its own group_leader, and in
      this case there cannot be any sibling events. Thus it's safe to skip the
      check for siblings, which avoids having to make invasive and ugly
      changes to for_each_sibling_event().
      
      Avoid the splat by bailing out early when the new event is its own
      group_leader.
      
      Fixes: 382c27f4 ("perf: Fix perf_event_validate_size()")
      Closes: https://lore.kernel.org/lkml/20231214000620.3081018-1-lucas.demarchi@intel.com/
      Closes: https://lore.kernel.org/lkml/ZXpm6gQ%2Fd59jGsuW@xpf.sh.intel.com/
      
      
      Reported-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20231215112450.3972309-1-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      557f7ad0
    • Denis Benato's avatar
      HID: hid-asus: add const to read-only outgoing usb buffer · a684235d
      Denis Benato authored
      [ Upstream commit 06ae5afc
      
       ]
      
      In the function asus_kbd_set_report the parameter buf is read-only
      as it gets copied in a memory portion suitable for USB transfer,
      but the parameter is not marked as const: add the missing const and mark
      const immutable buffers passed to that function.
      
      Signed-off-by: default avatarDenis Benato <benato.denis96@gmail.com>
      Signed-off-by: default avatarLuke D. Jones <luke@ljones.dev>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a684235d
    • Masahiro Yamada's avatar
      arm64: add dependency between vmlinuz.efi and Image · 2b9e16bc
      Masahiro Yamada authored
      [ Upstream commit c0a85742 ]
      
      A common issue in Makefile is a race in parallel building.
      
      You need to be careful to prevent multiple threads from writing to the
      same file simultaneously.
      
      Commit 3939f334
      
       ("ARM: 8418/1: add boot image dependencies to not
      generate invalid images") addressed such a bad scenario.
      
      A similar symptom occurs with the following command:
      
        $ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi
          [ snip ]
          SORTTAB vmlinux
          OBJCOPY arch/arm64/boot/Image
          OBJCOPY arch/arm64/boot/Image
          AS      arch/arm64/boot/zboot-header.o
          PAD     arch/arm64/boot/vmlinux.bin
          GZIP    arch/arm64/boot/vmlinuz
          OBJCOPY arch/arm64/boot/vmlinuz.o
          LD      arch/arm64/boot/vmlinuz.efi.elf
          OBJCOPY arch/arm64/boot/vmlinuz.efi
      
      The log "OBJCOPY arch/arm64/boot/Image" is displayed twice.
      
      It indicates that two threads simultaneously enter arch/arm64/boot/
      and write to arch/arm64/boot/Image.
      
      It occasionally leads to a build failure:
      
        $ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi
          [ snip ]
          SORTTAB vmlinux
          OBJCOPY arch/arm64/boot/Image
          PAD     arch/arm64/boot/vmlinux.bin
        truncate: Invalid number: 'arch/arm64/boot/vmlinux.bin'
        make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13:
        arch/arm64/boot/vmlinux.bin] Error 1
        make[2]: *** Deleting file 'arch/arm64/boot/vmlinux.bin'
        make[1]: *** [arch/arm64/Makefile:163: vmlinuz.efi] Error 2
        make[1]: *** Waiting for unfinished jobs....
        make: *** [Makefile:234: __sub-make] Error 2
      
      vmlinuz.efi depends on Image, but such a dependency is not specified
      in arch/arm64/Makefile.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarSImon Glass <sjg@chromium.org>
      Link: https://lore.kernel.org/r/20231119053234.2367621-1-masahiroy@kernel.org
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2b9e16bc
    • Lech Perczak's avatar
      net: usb: qmi_wwan: claim interface 4 for ZTE MF290 · 6cb0c71c
      Lech Perczak authored
      [ Upstream commit 99360d96
      
       ]
      
      Interface 4 is used by for QMI interface in stock firmware of MF28D, the
      router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up
      from option driver.
      The proper configuration is:
      
      Interface mapping is:
      0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI
      
      T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  4 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=19d2 ProdID=0189 Rev= 0.00
      S:  Manufacturer=ZTE, Incorporated
      S:  Product=ZTE LTE Technologies MSM
      C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
      E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=86(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
      E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
      
      Cc: Bjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarLech Perczak <lech.perczak@gmail.com>
      Link: https://lore.kernel.org/r/20231117231918.100278-3-lech.perczak@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cb0c71c
    • Linus Torvalds's avatar
      asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation · f7ce7657
      Linus Torvalds authored
      [ Upstream commit 125b0bb9 ]
      
      We really don't want to do atomic_read() or anything like that, since we
      already have the value, not the lock.  The whole point of this is that
      we've loaded the lock from memory, and we want to check whether the
      value we loaded was a locked one or not.
      
      The main use of this is the lockref code, which loads both the lock and
      the reference count in one atomic operation, and then works on that
      combined value.  With the atomic_read(), the compiler would pointlessly
      spill the value to the stack, in order to then be able to read it back
      "atomically".
      
      This is the qspinlock version of commit c6f4a900 ("asm-generic:
      ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same
      bug for ticket locks.
      
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zbRkQ@mail.gmail.com/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7ce7657
    • Aoba K's avatar
      HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad · fba6e958
      Aoba K authored
      [ Upstream commit 9ffccb69
      
       ]
      
      Honor MagicBook 13 2023 has a touchpad which do not switch to the multitouch
      mode until the input mode feature is written by the host.  The touchpad do
      report the input mode at touchpad(3), while itself working under mouse mode. As
      a workaround, it is possible to call MT_QUIRE_FORCE_GET_FEATURE to force set
      feature in mt_set_input_mode for such device.
      
      The touchpad reports as BLTP7853, which cannot retrive any useful manufacture
      information on the internel by this string at present.  As the serial number of
      the laptop is GLO-G52, while DMI info reports the laptop serial number as
      GLO-GXXX, this workaround should applied to all models which has the GLO-GXXX.
      
      Signed-off-by: default avatarAoba K <nexp_0x17@outlook.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fba6e958
    • Denis Benato's avatar
      HID: hid-asus: reset the backlight brightness level on resume · 8f0c8585
      Denis Benato authored
      [ Upstream commit 546edbd2
      
       ]
      
      Some devices managed by this driver automatically set brightness to 0
      before entering a suspended state and reset it back to a default
      brightness level after the resume:
      this has the effect of having the kernel report wrong brightness
      status after a sleep, and on some devices (like the Asus RC71L) that
      brightness is the intensity of LEDs directly facing the user.
      
      Fix the above issue by setting back brightness to the level it had
      before entering a sleep state.
      
      Signed-off-by: default avatarDenis Benato <benato.denis96@gmail.com>
      Signed-off-by: default avatarLuke D. Jones <luke@ljones.dev>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8f0c8585
    • Li Nan's avatar
      nbd: pass nbd_sock to nbd_read_reply() instead of index · de78e4bd
      Li Nan authored
      [ Upstream commit 98c598af
      
       ]
      
      If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be
      krealloc in nbd_add_socket(), and a garbage request is received now, a UAF
      may occurs.
      
        T1
        nbd_ioctl
         __nbd_ioctl
          nbd_add_socket
           blk_mq_freeze_queue
      				T2
        				recv_work
        				 nbd_read_reply
        				  sock_xmit
           krealloc config->socks
      				   def config->socks
      
      Pass nbd_sock to nbd_read_reply(). And introduce a new function
      sock_xmit_recv(), which differs from sock_xmit only in the way it get
      socket.
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_xmit+0x525/0x550
      Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779
      
      Workqueue: knbd4-recv recv_work
      Call Trace:
       __dump_stack
       dump_stack+0xbe/0xfd
       print_address_description.constprop.0+0x19/0x170
       __kasan_report.cold+0x6c/0x84
       kasan_report+0x3a/0x50
       sock_xmit+0x525/0x550
       nbd_read_reply+0xfe/0x2c0
       recv_work+0x1c2/0x750
       process_one_work+0x6b6/0xf10
       worker_thread+0xdd/0xd80
       kthread+0x30a/0x410
       ret_from_fork+0x22/0x30
      
      Allocated by task 18784:
       kasan_save_stack+0x1b/0x40
       kasan_set_track
       set_alloc_info
       __kasan_kmalloc
       __kasan_kmalloc.constprop.0+0xf0/0x130
       slab_post_alloc_hook
       slab_alloc_node
       slab_alloc
       __kmalloc_track_caller+0x157/0x550
       __do_krealloc
       krealloc+0x37/0xb0
       nbd_add_socket
       +0x2d3/0x880
       __nbd_ioctl
       nbd_ioctl+0x584/0x8e0
       __blkdev_driver_ioctl
       blkdev_ioctl+0x2a0/0x6e0
       block_ioctl+0xee/0x130
       vfs_ioctl
       __do_sys_ioctl
       __se_sys_ioctl+0x138/0x190
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      Freed by task 18784:
       kasan_save_stack+0x1b/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x20/0x40
       __kasan_slab_free.part.0+0x13f/0x1b0
       slab_free_hook
       slab_free_freelist_hook
       slab_free
       kfree+0xcb/0x6c0
       krealloc+0x56/0xb0
       nbd_add_socket+0x2d3/0x880
       __nbd_ioctl
       nbd_ioctl+0x584/0x8e0
       __blkdev_driver_ioctl
       blkdev_ioctl+0x2a0/0x6e0
       block_ioctl+0xee/0x130
       vfs_ioctl
       __do_sys_ioctl
       __se_sys_ioctl+0x138/0x190
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      de78e4bd
    • Oliver Neukum's avatar
      HID: add ALWAYS_POLL quirk for Apple kb · d482bb56
      Oliver Neukum authored
      [ Upstream commit c5509218
      
       ]
      
      These devices disconnect if suspended without remote wakeup. They can operate
      with the standard driver.
      
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d482bb56
    • Brett Raye's avatar
      HID: glorious: fix Glorious Model I HID report · 541b183b
      Brett Raye authored
      [ Upstream commit a5e913c2
      
       ]
      
      The Glorious Model I mouse has a buggy HID report descriptor for its
      keyboard endpoint (used for programmable buttons). For report ID 2, there
      is a mismatch between Logical Minimum and Usage Minimum in the array that
      reports keycodes.
      
      The offending portion of the descriptor: (from hid-decode)
      
      0x95, 0x05,                    //  Report Count (5)                   30
      0x75, 0x08,                    //  Report Size (8)                    32
      0x15, 0x00,                    //  Logical Minimum (0)                34
      0x25, 0x65,                    //  Logical Maximum (101)              36
      0x05, 0x07,                    //  Usage Page (Keyboard)              38
      0x19, 0x01,                    //  Usage Minimum (1)                  40
      0x29, 0x65,                    //  Usage Maximum (101)                42
      0x81, 0x00,                    //  Input (Data,Arr,Abs)               44
      
      This bug shifts all programmed keycodes up by 1. Importantly, this causes
      "empty" array indexes of 0x00 to be interpreted as 0x01, ErrorRollOver.
      The presence of ErrorRollOver causes the system to ignore all keypresses
      from the endpoint and breaks the ability to use the programmable buttons.
      
      Setting byte 41 to 0x00 fixes this, and causes keycodes to be interpreted
      correctly.
      
      Also, USB_VENDOR_ID_GLORIOUS is changed to USB_VENDOR_ID_SINOWEALTH,
      and a new ID for Laview Technology is added. Glorious seems to be
      white-labeling controller boards or mice from these vendors. There isn't a
      single canonical vendor ID for Glorious products.
      
      Signed-off-by: default avatarBrett Raye <braye@fastmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      541b183b
    • Andy Shevchenko's avatar
      platform/x86: intel_telemetry: Fix kernel doc descriptions · 42b4ab97
      Andy Shevchenko authored
      [ Upstream commit a6584711
      
       ]
      
      LKP found issues with a kernel doc in the driver:
      
      core.c:116: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_update_events'
      core.c:188: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_get_eventconfig'
      
      It looks like it were copy'n'paste typos when these descriptions
      had been introduced. Fix the typos.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202310070743.WALmRGSY-lkp@intel.com/
      
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20231120150756.1661425-1-andriy.shevchenko@linux.intel.com
      
      
      Reviewed-by: default avatarRajneesh Bhardwaj <irenic.rajneesh@gmail.com>
      Reviewed-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      42b4ab97
    • Bibo Mao's avatar
      LoongArch: Implement constant timer shutdown interface · 355170a7
      Bibo Mao authored
      [ Upstream commit d43f37b7
      
       ]
      
      When a cpu is hot-unplugged, it is put in idle state and the function
      arch_cpu_idle_dead() is called. The timer interrupt for this processor
      should be disabled, otherwise there will be pending timer interrupt for
      the unplugged cpu, so that vcpu is prevented from giving up scheduling
      when system is running in vm mode.
      
      This patch implements the timer shutdown interface so that the constant
      timer will be properly disabled when a CPU is hot-unplugged.
      
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      355170a7
    • Masahiro Yamada's avatar
      LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi · adb6a907
      Masahiro Yamada authored
      [ Upstream commit d3ec75bc ]
      
      A common issue in Makefile is a race in parallel building.
      
      You need to be careful to prevent multiple threads from writing to the
      same file simultaneously.
      
      Commit 3939f334
      
       ("ARM: 8418/1: add boot image dependencies to not
      generate invalid images") addressed such a bad scenario.
      
      A similar symptom occurs with the following command:
      
        $ make -j$(nproc) ARCH=loongarch vmlinux.efi vmlinuz.efi
          [ snip ]
          SORTTAB vmlinux
          OBJCOPY arch/loongarch/boot/vmlinux.efi
          OBJCOPY arch/loongarch/boot/vmlinux.efi
          PAD     arch/loongarch/boot/vmlinux.bin
          GZIP    arch/loongarch/boot/vmlinuz
          OBJCOPY arch/loongarch/boot/vmlinuz.o
          LD      arch/loongarch/boot/vmlinuz.efi.elf
          OBJCOPY arch/loongarch/boot/vmlinuz.efi
      
      The log "OBJCOPY arch/loongarch/boot/vmlinux.efi" is displayed twice.
      
      It indicates that two threads simultaneously enter arch/loongarch/boot/
      and write to arch/loongarch/boot/vmlinux.efi.
      
      It occasionally leads to a build failure:
      
        $ make -j$(nproc) ARCH=loongarch vmlinux.efi vmlinuz.efi
          [ snip ]
          SORTTAB vmlinux
          OBJCOPY arch/loongarch/boot/vmlinux.efi
          PAD     arch/loongarch/boot/vmlinux.bin
        truncate: Invalid number: ‘arch/loongarch/boot/vmlinux.bin’
        make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13:
        arch/loongarch/boot/vmlinux.bin] Error 1
        make[2]: *** Deleting file 'arch/loongarch/boot/vmlinux.bin'
        make[1]: *** [arch/loongarch/Makefile:146: vmlinuz.efi] Error 2
        make[1]: *** Waiting for unfinished jobs....
        make: *** [Makefile:234: __sub-make] Error 2
      
      vmlinuz.efi depends on vmlinux.efi, but such a dependency is not
      specified in arch/loongarch/Makefile.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      adb6a907
    • Eduard Zingerman's avatar
      selftests/bpf: fix bpf_loop_bench for new callback verification scheme · 943cde1f
      Eduard Zingerman authored
      [ Upstream commit f40bfd16
      
       ]
      
      This is a preparatory change. A follow-up patch "bpf: verify callbacks
      as if they are called unknown number of times" changes logic for
      callbacks handling. While previously callbacks were verified as a
      single function call, new scheme takes into account that callbacks
      could be executed unknown number of times.
      
      This has dire implications for bpf_loop_bench:
      
          SEC("fentry/" SYS_PREFIX "sys_getpgid")
          int benchmark(void *ctx)
          {
                  for (int i = 0; i < 1000; i++) {
                          bpf_loop(nr_loops, empty_callback, NULL, 0);
                          __sync_add_and_fetch(&hits, nr_loops);
                  }
                  return 0;
          }
      
      W/o callbacks change verifier sees it as a 1000 calls to
      empty_callback(). However, with callbacks change things become
      exponential:
      - i=0: state exploring empty_callback is scheduled with i=0 (a);
      - i=1: state exploring empty_callback is scheduled with i=1;
        ...
      - i=999: state exploring empty_callback is scheduled with i=999;
      - state (a) is popped from stack;
      - i=1: state exploring empty_callback is scheduled with i=1;
        ...
      
      Avoid this issue by rewriting outer loop as bpf_loop().
      Unfortunately, this adds a function call to a loop at runtime, which
      negatively affects performance:
      
                  throughput               latency
         before:  149.919 ± 0.168 M ops/s, 6.670 ns/op
         after :  137.040 ± 0.187 M ops/s, 7.297 ns/op
      
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      943cde1f