Skip to content
  1. Jun 20, 2023
    • Rafael Aquini's avatar
      writeback: fix dereferencing NULL mapping->host on writeback_page_template · 54abe19e
      Rafael Aquini authored
      When commit 19343b5b ("mm/page-writeback: introduce tracepoint for
      wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
      as a template to create its new wait_on_page_writeback trace event, it
      ended up opening a window to NULL pointer dereference crashes due to the
      (infrequent) occurrence of a race where an access to a page in the
      swap-cache happens concurrently with the moment this page is being written
      to disk and the tracepoint is enabled:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000040
          #PF: supervisor read access in kernel mode
          #PF: error_code(0x0000) - not-present page
          PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
          Oops: 0000 [#1] PREEMPT SMP PTI
          CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
          RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
          Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
          RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
          RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
          RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
          R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
          R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
          FS:  00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
          Call Trace:
           <TASK>
           ? __die+0x20/0x70
           ? page_fault_oops+0x76/0x170
           ? kernelmode_fixup_or_oops+0x84/0x110
           ? exc_page_fault+0x65/0x150
           ? asm_exc_page_fault+0x22/0x30
           ? trace_event_raw_event_writeback_folio_template+0x76/0xf0
           folio_wait_writeback+0x6b/0x80
           shmem_swapin_folio+0x24a/0x500
           ? filemap_get_entry+0xe3/0x140
           shmem_get_folio_gfp+0x36e/0x7c0
           ? find_busiest_group+0x43/0x1a0
           shmem_fault+0x76/0x2a0
           ? __update_load_avg_cfs_rq+0x281/0x2f0
           __do_fault+0x33/0x130
           do_read_fault+0x118/0x160
           do_pte_missing+0x1ed/0x2a0
           __handle_mm_fault+0x566/0x630
           handle_mm_fault+0x91/0x210
           do_user_addr_fault+0x22c/0x740
           exc_page_fault+0x65/0x150
           asm_exc_page_fault+0x22/0x30
      
      This problem arises from the fact that the repurposed writeback_dirty_page
      trace event code was written assuming that every pointer to mapping
      (struct address_space) would come from a file-mapped page-cache object,
      thus mapping->host would always be populated, and that was a valid case
      before commit 19343b5b.  The swap-cache address space
      (swapper_spaces), however, doesn't populate its ->host (struct inode)
      pointer, thus leading to the crashes in the corner-case aforementioned.
      
      commit 19343b5b ended up breaking the assignment of __entry->name and
      __entry->ino for the wait_on_page_writeback tracepoint -- both dependent
      on mapping->host carrying a pointer to a valid inode.  The assignment of
      __entry->name was fixed by commit 68f23b89 ("memcg: fix a crash in
      wb_workfn when a device disappears"), and this commit fixes the remaining
      case, for __entry->ino.
      
      Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
      
      
      Fixes: 19343b5b ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Reviewed-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54abe19e
  2. Jun 13, 2023
  3. May 29, 2023
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 8b817fde
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "User events:
      
         - Use long instead of int for storing the enable set/clear bit, as it
           was found that big endian machines could end up using the wrong
           bits.
      
         - Split allocating mm and attaching it. This keeps the allocation
           separate from the registration and avoids various races.
      
         - Remove RCU locking around pin_user_pages_remote() as that can
           schedule. The RCU protection is no longer needed with the above
           split of mm allocation and attaching.
      
         - Rename the "link" fields of the various structs to something more
           meaningful.
      
         - Add comments around user_event_mm struct usage and locking
           requirements.
      
        Timerlat tracer:
      
         - Fix missed wakeup of timerlat thread caused by the timerlat
           interrupt triggering when tracing is off. The timer interrupt
           handler needs to always wake up the timerlat thread regardless if
           tracing is enabled or not, otherwise, it will never wake up.
      
        Histograms:
      
         - Fix regression of breaking the "stacktrace" modifier for variables.
           That modifier cannot be used for values, but can be used for
           variables that are passed from one histogram to the next. This was
           broken when adding the restriction to values as the variable logic
           used the same code.
      
         - Rename the special field "stacktrace" to "common_stacktrace".
      
           Special fields (that are not actually part of the event, but can
           act just like event fields, like 'comm' and 'timestamp') should be
           prefixed with 'common_' for consistency. To keep backward
           compatibility, 'stacktrace' can still be used (as with the special
           field 'cpu'), but can be overridden if the event has a field called
           'stacktrace'.
      
         - Update the synthetic event selftests to use the new name (synthetic
           events are created by histograms)
      
        Tracing bootup selftests:
      
         - Reorganize the code to keep artifacts of the selftests not compiled
           in when selftests are not configured.
      
         - Add various cond_resched() around the selftest code, as the
           softlock watchdog was triggering much more often. It appears that
           the kernel runs slower now with full debugging enabled.
      
         - While debugging ftrace with ftrace (using an instance ring buffer
           instead of the top level one), I found that the selftests were
           disabling prints to the debug instance.
      
           This should not happen, as the selftests only disable printing to
           the main buffer as the selftests examine the main buffer to see if
           it has what it expects, and prints can make the tests fail.
      
           Make the selftests only disable printing to the toplevel buffer,
           and leave the instance buffers alone"
      
      * tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Have function_graph selftest call cond_resched()
        tracing: Only make selftest conditionals affect the global_trace
        tracing: Make tracing_selftest_running/delete nops when not used
        tracing: Have tracer selftests call cond_resched() before running
        tracing: Move setting of tracing_selftest_running out of register_tracer()
        tracing/selftests: Update synthetic event selftest to use common_stacktrace
        tracing: Rename stacktrace field to common_stacktrace
        tracing/histograms: Allow variables to have some modifiers
        tracing/user_events: Document user_event_mm one-shot list usage
        tracing/user_events: Rename link fields for clarity
        tracing/user_events: Remove RCU lock while pinning pages
        tracing/user_events: Split up mm alloc and attach
        tracing/timerlat: Always wakeup the timerlat thread
        tracing/user_events: Use long vs int for atomic bit ops
      8b817fde
    • Linus Torvalds's avatar
      Merge tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 7a6c8e51
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix an alignment crash in x86/aria"
      
      * tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
      7a6c8e51
    • Linus Torvalds's avatar
      Revert "module: error out early on concurrent load of the same module file" · ac2263b5
      Linus Torvalds authored
      
      
      This reverts commit 9828ed3f.
      
      Sadly, it does seem to cause failures to load modules. Johan Hovold reports:
      
       "This change breaks module loading during boot on the Lenovo Thinkpad
        X13s (aarch64).
      
        Specifically it results in indefinite probe deferral of the display
        and USB (ethernet) which makes it a pain to debug. Typing in the dark
        to acquire some logs reveals that other modules are missing as well"
      
      Since this was applied late as a "let's try this", I'm reverting it
      asap, and we can try to figure out what goes wrong later.  The excessive
      parallel module loading problem is annoying, but not noticeable in
      normal situations, and this was only meant as an optimistic workaround
      for a user-space bug.
      
      One possible solution may be to do the optimistic exclusive open first,
      and then use a lock to serialize loading if that fails.
      
      Reported-by: default avatarJohan Hovold <johan@kernel.org>
      Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac2263b5
    • Steven Rostedt (Google)'s avatar
      tracing: Have function_graph selftest call cond_resched() · a2d910f0
      Steven Rostedt (Google) authored
      When all kernel debugging is enabled (lockdep, KSAN, etc), the function
      graph enabling and disabling can take several seconds to complete. The
      function_graph selftest enables and disables function graph tracing
      several times. With full debugging enabled, the soft lockup watchdog was
      triggering because the selftest was running without ever scheduling.
      
      Add cond_resched() throughout the test to make sure it does not trigger
      the soft lockup detector.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.org
      
      
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a2d910f0
    • Steven Rostedt (Google)'s avatar
      tracing: Only make selftest conditionals affect the global_trace · ac9d2cb1
      Steven Rostedt (Google) authored
      The tracing_selftest_running and tracing_selftest_disabled variables were
      to keep trace_printk() and other writes from affecting the tracing
      selftests, as the tracing selftests would examine the ring buffer to see
      if it contained what it expected or not. trace_printk() and friends could
      add to the ring buffer and cause the selftests to fail (and then disable
      the tracer that was being tested). To keep that from happening, these
      variables were added and would keep trace_printk() and friends from
      writing to the ring buffer while the tests were going on.
      
      But this was only the top level ring buffer (owned by the global_trace
      instance). There is no reason to prevent writing into ring buffers of
      other instances via the trace_array_printk() and friends. For the
      functions that could be used by other instances, check if the global_trace
      is the tracer instance that is being written to before deciding to not
      allow the write.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.org
      
      
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ac9d2cb1
    • Steven Rostedt (Google)'s avatar
      tracing: Make tracing_selftest_running/delete nops when not used · a3ae76d7
      Steven Rostedt (Google) authored
      There's no reason to test the condition variables tracing_selftest_running
      or tracing_selftest_delete when tracing selftests are not enabled. Make
      them define 0s when not the selftests are not configured in.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-4-rostedt@goodmis.org
      
      
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a3ae76d7
    • Steven Rostedt (Google)'s avatar
      tracing: Have tracer selftests call cond_resched() before running · 9da705d4
      Steven Rostedt (Google) authored
      As there are more and more internal selftests being added to the Linux
      kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
      these are enabled. Add a cond_resched() to the calling of
      do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
      otherwise the soft lockup watchdog may trigger on boot up.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.org
      
      
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9da705d4
    • Steven Rostedt (Google)'s avatar
      tracing: Move setting of tracing_selftest_running out of register_tracer() · e8352cf5
      Steven Rostedt (Google) authored
      The variables tracing_selftest_running and tracing_selftest_disabled are
      only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
      visible within the selftest code. The setting of those variables are in
      the register_tracer() call, and set in a location where they do not need
      to be. Create a wrapper around run_tracer_selftest() called
      do_run_tracer_selftest() which sets those variables, and have
      register_tracer() call that instead.
      
      Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
      scope gets rid of them (and also the ability to remove testing against
      them) when the startup tests are not enabled (most cases).
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.org
      
      
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e8352cf5
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · e338142b
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - init count imbalance fix in qcom-qmp-pcie and combo drivers
      
       - kernel doc header fix for qcom-snps driver
      
       - mediatek floating point comparison fix
      
       - amlogic fix register value
      
      * tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: qcom-snps: correct struct qcom_snps_hsphy kerneldoc
        phy: amlogic: phy-meson-g12a-mipi-dphy-analog: fix CNTL2_DIF_TX_CTL0 value
        phy: mediatek: rework the floating point comparisons to fixed point
        phy: qcom-qmp-pcie-msm8996: fix init-count imbalance
        phy: qcom-qmp-combo: fix init-count imbalance
      e338142b
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine · dca389eb
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "Driver fixes for the at-hdmac, pl330, TI and IDXD drivers:
      
         - AT HDMAC driver fixes for Flow Controller bitfield, peripheral ID
           handling and potential NULL dereference check
      
         - PL330 function rename to avoid conflicts
      
         - build warning fix for pm function in TI driver
      
         - IDXD driver fix for passing freed memory"
      
      * tag 'dmaengine-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
        dmaengine: at_hdmac: Extend the Flow Controller bitfield to three bits
        dmaengine: at_hdmac: Repair bitfield macros for peripheral ID handling
        dmaengine: pl330: rename _start to prevent build error
        dmaengine: at_xdmac: fix potential Oops in at_xdmac_prep_interleaved()
        dmaengine: ti: k3-udma: annotate pm function with __maybe_unused
        dmaengine: idxd: Fix passing freed memory in idxd_cdev_open()
      dca389eb
  4. May 28, 2023