Skip to content
  1. Apr 16, 2021
    • Marco Elver's avatar
      perf: Add support for SIGTRAP on perf events · 97ba62b2
      Marco Elver authored
      
      
      Adds bit perf_event_attr::sigtrap, which can be set to cause events to
      send SIGTRAP (with si_code TRAP_PERF) to the task where the event
      occurred. The primary motivation is to support synchronous signals on
      perf events in the task where an event (such as breakpoints) triggered.
      
      To distinguish perf events based on the event type, the type is set in
      si_errno. For events that are associated with an address, si_addr is
      copied from perf_sample_data.
      
      The new field perf_event_attr::sig_data is copied to si_perf, which
      allows user space to disambiguate which event (of the same type)
      triggered the signal. For example, user space could encode the relevant
      information it cares about in sig_data.
      
      We note that the choice of an opaque u64 provides the simplest and most
      flexible option. Alternatives where a reference to some user space data
      is passed back suffer from the problem that modification of referenced
      data (be it the event fd, or the perf_event_attr) can race with the
      signal being delivered (of course, the same caveat applies if user space
      decides to store a pointer in sig_data, but the ABI explicitly avoids
      prescribing such a design).
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Link: https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.net/
      97ba62b2
    • Marco Elver's avatar
      signal: Introduce TRAP_PERF si_code and si_perf to siginfo · fb6cc127
      Marco Elver authored
      
      
      Introduces the TRAP_PERF si_code, and associated siginfo_t field
      si_perf. These will be used by the perf event subsystem to send signals
      (if requested) to the task where an event occurred.
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
      Acked-by: Arnd Bergmann <arnd@arndb.de> # asm-generic
      Link: https://lkml.kernel.org/r/20210408103605.1676875-6-elver@google.com
      fb6cc127
    • Marco Elver's avatar
      perf: Add support for event removal on exec · 2e498d0a
      Marco Elver authored
      
      
      Adds bit perf_event_attr::remove_on_exec, to support removing an event
      from a task on exec.
      
      This option supports the case where an event is supposed to be
      process-wide only, and should not propagate beyond exec, to limit
      monitoring to the original process image only.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210408103605.1676875-5-elver@google.com
      2e498d0a
    • Marco Elver's avatar
      perf: Support only inheriting events if cloned with CLONE_THREAD · 2b26f0aa
      Marco Elver authored
      
      
      Adds bit perf_event_attr::inherit_thread, to restricting inheriting
      events only if the child was cloned with CLONE_THREAD.
      
      This option supports the case where an event is supposed to be
      process-wide only (including subthreads), but should not propagate
      beyond the current process's shared environment.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/YBvj6eJR%2FDY2TsEB@hirez.programming.kicks-ass.net/
      2b26f0aa
    • Marco Elver's avatar
      perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children · 47f661ec
      Marco Elver authored
      
      
      As with other ioctls (such as PERF_EVENT_IOC_{ENABLE,DISABLE}), fix up
      handling of PERF_EVENT_IOC_MODIFY_ATTRIBUTES to also apply to children.
      
      Suggested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Link: https://lkml.kernel.org/r/20210408103605.1676875-3-elver@google.com
      47f661ec
    • Peter Zijlstra's avatar
      perf: Rework perf_event_exit_event() · ef54c1a4
      Peter Zijlstra authored
      
      
      Make perf_event_exit_event() more robust, such that we can use it from
      other contexts. Specifically the up and coming remove_on_exec.
      
      For this to work we need to address a few issues. Remove_on_exec will
      not destroy the entire context, so we cannot rely on TASK_TOMBSTONE to
      disable event_function_call() and we thus have to use
      perf_remove_from_context().
      
      When using perf_remove_from_context(), there's two races to consider.
      The first is against close(), where we can have concurrent tear-down
      of the event. The second is against child_list iteration, which should
      not find a half baked event.
      
      To address this, teach perf_remove_from_context() to special case
      !ctx->is_active and about DETACH_CHILD.
      
      [ elver@google.com: fix racing parent/child exit in sync_child_event(). ]
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210408103605.1676875-2-elver@google.com
      ef54c1a4
    • Alexander Shishkin's avatar
      perf intel-pt: Use aux_watermark · 874fc35c
      Alexander Shishkin authored
      
      
      Turns out, the default setting of attr.aux_watermark to half of the total
      buffer size is not very useful, especially with smaller buffers. The
      problem is that, after half of the buffer is filled up, the kernel updates
      ->aux_head and sets up the next "transaction", while observing that
      ->aux_tail is still zero (as userspace haven't had the chance to update
      it), meaning that the trace will have to stop at the end of this second
      "transaction". This means, for example, that the second PERF_RECORD_AUX in
      every trace comes with TRUNCATED flag set.
      
      Setting attr.aux_watermark to quarter of the buffer gives enough space for
      the ->aux_tail update to be observed and prevents the data loss.
      
      The obligatory before/after showcase:
      
      > # perf_before record -e intel_pt//u -m,8 uname
      > Linux
      > [ perf record: Woken up 6 times to write data ]
      > Warning:
      > AUX data lost 4 times out of 10!
      >
      > [ perf record: Captured and wrote 0.099 MB perf.data ]
      > # perf record -e intel_pt//u -m,8 uname
      > Linux
      > [ perf record: Woken up 4 times to write data ]
      > [ perf record: Captured and wrote 0.039 MB perf.data ]
      
      The effect is still visible with large workloads and large buffers,
      although less pronounced.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shishkin@linux.intel.com
      874fc35c
    • Alexander Shishkin's avatar
      perf: Cap allocation order at aux_watermark · d68e6799
      Alexander Shishkin authored
      
      
      Currently, we start allocating AUX pages half the size of the total
      requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
      in turn, makes intel_pt driver disregard the watermark also, as it uses
      page order for its SG (ToPA) configuration.
      
      Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
      only one currently making use of high order allocations, there is no
      reason not to fix the allocator instead. This way, any other driver
      wishing to add this support would not have to worry about this.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210414154955.49603-2-alexander.shishkin@linux.intel.com
      d68e6799
  2. Apr 02, 2021
    • Alexander Antonov's avatar
      perf/x86/intel/uncore: Enable IIO stacks to PMON mapping for multi-segment SKX · cface032
      Alexander Antonov authored
      
      
      IIO stacks to PMON mapping on Skylake servers is exposed through introduced
      early attributes /sys/devices/uncore_iio_<pmu_idx>/dieX, where dieX is a
      file which holds "Segment:Root Bus" for PCIe root port which can
      be monitored by that IIO PMON block. These sysfs attributes are disabled
      for multiple segment topologies except VMD domains which start at 0x10000.
      This patch removes the limitation and enables IIO stacks to PMON mapping
      for multi-segment Skylake servers by introducing segment-aware
      intel_uncore_topology structure and attributing the topology configuration
      to the segment in skx_iio_get_topology() function.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAlexander Antonov <alexander.antonov@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarKyle Meyer <kyle.meyer@hpe.com>
      Link: https://lkml.kernel.org/r/20210323150507.2013-1-alexander.antonov@linux.intel.com
      cface032
    • Kan Liang's avatar
      perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks · c4c55e36
      Kan Liang authored
      
      
      The discovery table provides the generic uncore block information
      for the MMIO type of uncore blocks, which is good enough to provide
      basic uncore support.
      
      The box control field is composed of the BAR address and box control
      offset. When initializing the uncore blocks, perf should ioremap the
      address from the box control field.
      
      Implement the generic support for the MMIO type of uncore block.
      
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-6-git-send-email-kan.liang@linux.intel.com
      c4c55e36
    • Kan Liang's avatar
      perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks · 42839ef4
      Kan Liang authored
      
      
      The discovery table provides the generic uncore block information
      for the PCI type of uncore blocks, which is good enough to provide
      basic uncore support.
      
      The PCI BUS and DEVFN information can be retrieved from the box control
      field. Introduce the uncore_pci_pmus_register() to register all the
      PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.
      
      The PCI BUS and DEVFN information are different among dies. Add box_ctls
      to store the box control field of each die.
      
      Add a new BUS notifier for the PCI type of uncore block to support the
      hotplug. If the device is "hot remove", the corresponding registered PMU
      has to be unregistered. Perf cannot locate the PMU by searching a const
      pci_device_id table, because the discovery tables don't provide such
      information. Introduce uncore_pci_find_dev_pmu_from_types() to search
      the whole uncore_pci_uncores for the PMU.
      
      Implement generic support for the PCI type of uncore block.
      
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-5-git-send-email-kan.liang@linux.intel.com
      42839ef4
    • Kan Liang's avatar
      perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier · 6477dc39
      Kan Liang authored
      
      
      Perf will use a similar method to the PCI sub driver to register
      the PMUs for the PCI type of uncore blocks. The method requires a BUS
      notifier to support hotplug. The current BUS notifier cannot be reused,
      because it searches a const id_table for the corresponding registered
      PMU. The PCI type of uncore blocks in the discovery tables doesn't
      provide an id_table.
      
      Factor out uncore_bus_notify() and add the pointer of an id_table as a
      parameter. The uncore_bus_notify() will be reused in the following
      patch.
      
      The current BUS notifier is only used by the PCI sub driver. Its name is
      too generic. Rename it to uncore_pci_sub_notifier, which is specific for
      the PCI sub driver.
      
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-4-git-send-email-kan.liang@linux.intel.com
      6477dc39
    • Kan Liang's avatar
      perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks · d6c75413
      Kan Liang authored
      
      
      The discovery table provides the generic uncore block information for
      the MSR type of uncore blocks, e.g., the counter width, the number of
      counters, the location of control/counter registers, which is good
      enough to provide basic uncore support. It can be used as a fallback
      solution when the kernel doesn't support a platform.
      
      The name of the uncore box cannot be retrieved from the discovery table.
      uncore_type_&typeID_&boxID will be used as its name. Save the type ID
      and the box ID information in the struct intel_uncore_type.
      Factor out uncore_get_pmu_name() to handle different naming methods.
      
      Implement generic support for the MSR type of uncore block.
      
      Some advanced features, such as filters and constraints, cannot be
      retrieved from discovery tables. Features that rely on that
      information are not be supported here.
      
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-3-git-send-email-kan.liang@linux.intel.com
      d6c75413
    • Kan Liang's avatar
      perf/x86/intel/uncore: Parse uncore discovery tables · edae1f06
      Kan Liang authored
      
      
      A self-describing mechanism for the uncore PerfMon hardware has been
      introduced with the latest Intel platforms. By reading through an MMIO
      page worth of information, perf can 'discover' all the standard uncore
      PerfMon registers in a machine.
      
      The discovery mechanism relies on BIOS's support. With a proper BIOS,
      a PCI device with the unique capability ID 0x23 can be found on each
      die. Perf can retrieve the information of all available uncore PerfMons
      from the device via MMIO. The information is composed of one global
      discovery table and several unit discovery tables.
      - The global discovery table includes global uncore information of the
        die, e.g., the address of the global control register, the offset of
        the global status register, the number of uncore units, the offset of
        unit discovery tables, etc.
      - The unit discovery table includes generic uncore unit information,
        e.g., the access type, the counter width, the address of counters,
        the address of the counter control, the unit ID, the unit type, etc.
        The unit is also called "box" in the code.
      Perf can provide basic uncore support based on this information
      with the following patches.
      
      To locate the PCI device with the discovery tables, check the generic
      PCI ID first. If it doesn't match, go through the entire PCI device tree
      and locate the device with the unique capability ID.
      
      The uncore information is similar among dies. To save parsing time and
      space, only completely parse and store the discovery tables on the first
      die and the first box of each die. The parsed information is stored in
      an
      RB tree structure, intel_uncore_discovery_type. The size of the stored
      discovery tables varies among platforms. It's around 4KB for a Sapphire
      Rapids server.
      
      If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
      will exit with -ENODEV. There is nothing changed.
      
      Add a module parameter to disable the discovery feature. If a BIOS gets
      the discovery tables wrong, users can have an option to disable the
      feature. For the current patchset, the uncore driver will exit with
      -ENODEV. In the future, it may fall back to the hardcode uncore driver
      on a known platform.
      
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.liang@linux.intel.com
      edae1f06
  3. Mar 17, 2021
    • Ondrej Mosnacek's avatar
      perf/core: Fix unconditional security_locked_down() call · 08ef1af4
      Ondrej Mosnacek authored
      Currently, the lockdown state is queried unconditionally, even though
      its result is used only if the PERF_SAMPLE_REGS_INTR bit is set in
      attr.sample_type. While that doesn't matter in case of the Lockdown LSM,
      it causes trouble with the SELinux's lockdown hook implementation.
      
      SELinux implements the locked_down hook with a check whether the current
      task's type has the corresponding "lockdown" class permission
      ("integrity" or "confidentiality") allowed in the policy. This means
      that calling the hook when the access control decision would be ignored
      generates a bogus permission check and audit record.
      
      Fix this by checking sample_type first and only calling the hook when
      its result would be honored.
      
      Fixes: b0c8fdc7
      
       ("lockdown: Lock down perf when in confidentiality mode")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lkml.kernel.org/r/20210224215628.192519-1-omosnace@redhat.com
      08ef1af4
    • Namhyung Kim's avatar
      perf core: Allocate perf_event in the target node memory · ff65338e
      Namhyung Kim authored
      
      
      For cpu events, it'd better allocating them in the corresponding node
      memory as they would be mostly accessed by the target cpu.  Although
      perf tools sets the cpu affinity before calling perf_event_open, there
      are places it doesn't (notably perf record) and we should consider
      other external users too.
      
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210311115413.444407-2-namhyung@kernel.org
      ff65338e
    • Namhyung Kim's avatar
      perf core: Add a kmem_cache for struct perf_event · bdacfaf2
      Namhyung Kim authored
      
      
      The kernel can allocate a lot of struct perf_event when profiling. For
      example, 256 cpu x 8 events x 20 cgroups = 40K instances of the struct
      would be allocated on a large system.
      
      The size of struct perf_event in my setup is 1152 byte. As it's
      allocated by kmalloc, the actual allocation size would be rounded up
      to 2K.
      
      Then there's 896 byte (~43%) of waste per instance resulting in total
      ~35MB with 40K instances. We can create a dedicated kmem_cache to
      avoid such a big unnecessary memory consumption.
      
      With this change, I can see below (note this machine has 112 cpus).
      
        # grep perf_event /proc/slabinfo
        perf_event    224    784   1152    7    2 : tunables   24   12    8 : slabdata    112    112      0
      
      The sixth column is pages-per-slab which is 2, and the fifth column is
      obj-per-slab which is 7.  Thus actually it can use 1152 x 7 = 8064
      byte in the 8K, and wasted memory is (8192 - 8064) / 7 = ~18 byte per
      instance.
      
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210311115413.444407-1-namhyung@kernel.org
      bdacfaf2
    • Namhyung Kim's avatar
      perf core: Allocate perf_buffer in the target node memory · 9483409a
      Namhyung Kim authored
      
      
      I found the ring buffer pages are allocated in the node but the ring
      buffer itself is not.  Let's convert it to use kzalloc_node() too.
      
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210315033436.682438-1-namhyung@kernel.org
      9483409a
  4. Mar 15, 2021
    • Linus Torvalds's avatar
      Linux 5.12-rc3 · 1e28eed1
      Linus Torvalds authored
      1e28eed1
    • Alexey Dobriyan's avatar
      prctl: fix PR_SET_MM_AUXV kernel stack leak · c995f12a
      Alexey Dobriyan authored
      
      
      Doing a
      
      	prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);
      
      will copy 1 byte from userspace to (quite big) on-stack array
      and then stash everything to mm->saved_auxv.
      AT_NULL terminator will be inserted at the very end.
      
      /proc/*/auxv handler will find that AT_NULL terminator
      and copy original stack contents to userspace.
      
      This devious scheme requires CAP_SYS_RESOURCE.
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c995f12a
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70404fe3
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of irqchip updates:
      
         - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct
      
         - Add a missing DT compatible string for the Ingenic driver
      
         - Remove the pointless debugfs_file pointer from struct irqdomain"
      
      * tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/ingenic: Add support for the JZ4760
        dt-bindings/irq: Add compatible string for the JZ4760B
        irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
        ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
        irqdomain: Remove debugfs_file from struct irq_domain
      70404fe3
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 802b31c0
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A single fix in for hrtimers to prevent an interrupt storm caused by
        the lack of reevaluation of the timers which expire in softirq context
        under certain circumstances, e.g. when the clock was set"
      
      * tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
      802b31c0
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c72cbc93
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A set of scheduler updates:
      
         - Prevent a NULL pointer dereference in the migration_stop_cpu()
           mechanims
      
         - Prevent self concurrency of affine_move_task()
      
         - Small fixes and cleanups related to task migration/affinity setting
      
         - Ensure that sync_runqueues_membarrier_state() is invoked on the
           current CPU when it is in the cpu mask"
      
      * tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/membarrier: fix missing local execution of ipi_sync_rq_state()
        sched: Simplify set_affinity_pending refcounts
        sched: Fix affine_move_task() self-concurrency
        sched: Optimize migration_cpu_stop()
        sched: Collate affine_move_task() stoppers
        sched: Simplify migration_cpu_stop()
        sched: Fix migration_cpu_stop() requeueing
      c72cbc93
    • Linus Torvalds's avatar
      Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 19469d2a
      Linus Torvalds authored
      Pull objtool fix from Thomas Gleixner:
       "A single objtool fix to handle the PUSHF/POPF validation correctly for
        the paravirt changes which modified arch_local_irq_restore not to use
        popf"
      
      * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool,x86: Fix uaccess PUSHF/POPF validation
      19469d2a
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fa509ff8
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "A couple of locking fixes:
      
         - A fix for the static_call mechanism so it handles unaligned
           addresses correctly.
      
         - Make u64_stats_init() a macro so every instance gets a seperate
           lockdep key.
      
         - Make seqcount_latch_init() a macro as well to preserve the static
           variable which is used for the lockdep key"
      
      * tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        seqlock,lockdep: Fix seqcount_latch_init()
        u64_stats,lockdep: Fix u64_stats_init() vs lockdep
        static_call: Fix the module key fixup
      fa509ff8
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 75013c6c
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Make sure PMU internal buffers are flushed for per-CPU events too and
         properly handle PID/TID for large PEBS.
      
       - Handle the case properly when there's no PMU and therefore return an
         empty list of perf MSRs for VMX to switch instead of reading random
         garbage from the stack.
      
      * tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
        perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
        perf/core: Flush PMU internal buffers for per-CPU events
      75013c6c
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 836d7f05
      Linus Torvalds authored
      Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
       "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
        added v5.10, but failed to take the SetVirtualAddressMap() RT service
        into account"
      
      * tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
      836d7f05
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a7c10df
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - A couple of SEV-ES fixes and robustifications: verify usermode stack
         pointer in NMI is not coming from the syscall gap, correctly track
         IRQ states in the #VC handler and access user insn bytes atomically
         in same handler as latter cannot sleep.
      
       - Balance 32-bit fast syscall exit path to do the proper work on exit
         and thus not confuse audit and ptrace frameworks.
      
       - Two fixes for the ORC unwinder going "off the rails" into KASAN
         redzones and when ORC data is missing.
      
      * tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev-es: Use __copy_from_user_inatomic()
        x86/sev-es: Correctly track IRQ states in runtime #VC handler
        x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
        x86/sev-es: Introduce ip_within_syscall_gap() helper
        x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
        x86/unwind/orc: Silence warnings caused by missing ORC data
        x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
      0a7c10df
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · c3c7579f
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Some more powerpc fixes for 5.12:
      
         - Fix wrong instruction encoding for lis in ppc_function_entry(),
           which could potentially lead to missed kprobes.
      
         - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
           non-volatile GPRs immediately after exec.
      
         - Clean up a missed SRR specifier in the recent interrupt rework.
      
         - Don't treat unrecoverable_exception() as an interrupt handler, it's
           called from other handlers so shouldn't do the interrupt entry/exit
           accounting itself.
      
         - Fix build errors caused by missing declarations for
           [en/dis]able_kernel_vsx().
      
        Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
        Olsa, Naveen N. Rao, and Nicholas Piggin"
      
      * tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/traps: unrecoverable_exception() is not an interrupt handler
        powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
        powerpc/64s/exception: Clean up a missed SRR specifier
        powerpc: Fix inverted SET_FULL_REGS bitop
        powerpc/64s: Use symbolic macros for function entry encoding
        powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
      c3c7579f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 9d0c8e79
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "More fixes for ARM and x86"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: LAPIC: Advancing the timer expiration on guest initiated write
        KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
        KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
        kvm: x86: annotate RCU pointers
        KVM: arm64: Fix exclusive limit for IPA size
        KVM: arm64: Reject VM creation when the default IPA size is unsupported
        KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
        KVM: arm64: Don't use cbz/adr with external symbols
        KVM: arm64: Fix range alignment when walking page tables
        KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
        KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
        KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
        KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
        KVM: arm64: Fix nVHE hyp panic host context restore
        KVM: arm64: Avoid corrupting vCPU context register in guest exit
        KVM: arm64: nvhe: Save the SPE context early
        kvm: x86: use NULL instead of using plain integer as pointer
        KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
        KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
      9d0c8e79
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 50eb842f
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "28 patches.
      
        Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
        highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
        zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
        ia64"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
        zram: fix broken page writeback
        zram: fix return value on writeback_store
        mm/memcg: set memcg when splitting page
        mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
        ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
        ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
        mm/userfaultfd: fix memory corruption due to writeprotect
        kasan: fix KASAN_STACK dependency for HW_TAGS
        kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
        mm/madvise: replace ptrace attach requirement for process_madvise
        include/linux/sched/mm.h: use rcu_dereference in in_vfork()
        kfence: fix reports if constant function prefixes exist
        kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
        kfence: fix printk format for ptrdiff_t
        linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
        MAINTAINERS: exclude uapi directories in API/ABI section
        binfmt_misc: fix possible deadlock in bm_register_write
        mm/highmem.c: fix zero_user_segments() with start > end
        hugetlb: do early cow when page pinned on src mm
        mm: use is_cow_mapping() across tree where proper
        ...
      50eb842f
  5. Mar 14, 2021
    • Thomas Gleixner's avatar
      Merge tag 'irqchip-fixes-5.12-1' of... · b470ebc9
      Thomas Gleixner authored
      Merge tag 'irqchip-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
      
      Pull irqchip fixes from Marc Zyngier:
      
        - More compatible strings for the Ingenic irqchip (introducing the
          JZ4760B SoC)
        - Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
        - Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
          Kconfig, now relying on the architecture to get it right
        - Drop the debugfs_file field from struct irq_domain, now that
          debugfs can track things on its own
      b470ebc9
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 88fe4924
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small misc/char driver fixes to resolve some reported
        problems:
      
         - habanalabs driver fixes
      
         - Acrn build fixes (reported many times)
      
         - pvpanic module table export fix
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        misc/pvpanic: Export module FDT device table
        misc: fastrpc: restrict user apps from sending kernel RPC messages
        virt: acrn: Correct type casting of argument of copy_from_user()
        virt: acrn: Use EPOLLIN instead of POLLIN
        virt: acrn: Use vfs_poll() instead of f_op->poll()
        virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
        cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
        habanalabs: fix debugfs address translation
        habanalabs: Disable file operations after device is removed
        habanalabs: Call put_pid() when releasing control device
        drivers: habanalabs: remove unused dentry pointer for debugfs files
        habanalabs: mark hl_eq_inc_ptr() as static
      88fe4924
    • Linus Torvalds's avatar
      Merge tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · be61af33
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are some small staging driver fixes for reported problems. They
        include:
      
         - wfx header file cleanup patch reverted as it could cause problems
      
         - comedi driver endian fixes
      
         - buffer overflow problems for staging wifi drivers
      
         - build dependency issue for rtl8192e driver
      
        All have been in linux-next for a while with no reported problems"
      
      * tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
        Revert "staging: wfx: remove unused included header files"
        staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
        staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
        staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
        staging: comedi: pcl726: Use 16-bit 0 for interrupt data
        staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
        staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
        staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
        staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
        staging: comedi: pcl818: Fix endian problem for AI command data
        staging: comedi: pcl711: Fix endian problem for AI command data
        staging: comedi: me4000: Fix endian problem for AI command data
        staging: comedi: dmm32at: Fix endian problem for AI command data
        staging: comedi: das800: Fix endian problem for AI command data
        staging: comedi: das6402: Fix endian problem for AI command data
        staging: comedi: adv_pci1710: Fix endian problem for AI command data
        staging: comedi: addi_apci_1500: Fix endian problem for command sample
        staging: comedi: addi_apci_1032: Fix endian problem for COS sample
        staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
        staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
        ...
      be61af33
    • Linus Torvalds's avatar
      Merge tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · cc14086f
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small tty and serial driver fixes to resolve some
        reported problems:
      
         - led tty trigger fixes based on review and were acked by the led
           maintainer
      
         - revert a max310x serial driver patch as it was causing problems
      
         - revert a pty change as it was also causing problems
      
        All of these have been in linux-next for a while with no reported
        problems"
      
      * tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        Revert "drivers:tty:pty: Fix a race causing data loss on close"
        Revert "serial: max310x: rework RX interrupt handling"
        leds: trigger/tty: Use led_set_brightness_sync() from workqueue
        leds: trigger: Fix error path to not unlock the unlocked mutex
      cc14086f
    • Linus Torvalds's avatar
      Merge tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 5c7bdbf8
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch
        of reported issues:
      
         - usbip fixups for issues found by syzbot
      
         - xhci driver fixes and quirk additions
      
         - gadget driver fixes
      
         - dwc3 QCOM driver fix
      
         - usb-serial new ids and fixes
      
         - usblp fix for a long-time issue
      
         - cdc-acm quirk addition
      
         - other tiny fixes for reported problems
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
        xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
        usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
        xhci: Improve detection of device initiated wake signal.
        usb: xhci: do not perform Soft Retry for some xHCI hosts
        usbip: fix vudc usbip_sockfd_store races leading to gpf
        usbip: fix vhci_hcd attach_store() races leading to gpf
        usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
        usbip: fix vudc to check for stream socket
        usbip: fix vhci_hcd to check for stream socket
        usbip: fix stub_dev to check for stream socket
        usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
        USB: usblp: fix a hang in poll() if disconnected
        USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
        usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
        usb: dwc3: qcom: Honor wakeup enabled/disabled state
        usb: gadget: f_uac1: stop playback on function disable
        usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
        USB: gadget: u_ether: Fix a configfs return code
        usb: dwc3: qcom: add ACPI device id for sc8180x
        Goodix Fingerprint device is not a modem
        ...
      5c7bdbf8
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 42062343
      Linus Torvalds authored
      Pull erofs fix from Gao Xiang:
       "Fix an urgent regression introduced by commit baa2c7c9 ("block:
        set .bi_max_vecs as actual allocated vector number"), which could
        cause unexpected hung since linux 5.12-rc1.
      
        Resolve it by avoiding using bio->bi_max_vecs completely"
      
      * tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: fix bio->bi_max_vecs behavior change
      42062343
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.12-2' of... · e83bad7f
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - avoid 'make image_name' invoking syncconfig
      
       - fix a couple of bugs in scripts/dummy-tools
      
       - fix LLD_VENDOR and locale issues in scripts/ld-version.sh
      
       - rebuild GCC plugins when the compiler is upgraded
      
       - allow LTO to be enabled with KASAN_HW_TAGS
      
       - allow LTO to be enabled without LLVM=1
      
      * tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: fix ld-version.sh to not be affected by locale
        kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
        kbuild: remove LLVM=1 test from HAS_LTO_CLANG
        kbuild: remove unneeded -O option to dtc
        kbuild: dummy-tools: adjust to scripts/cc-version.sh
        kbuild: Allow LTO to be selected with KASAN_HW_TAGS
        kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
        kbuild: rebuild GCC plugins when the compiler is upgraded
        kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
        kbuild: dummy-tools: fix inverted tests for gcc
        kbuild: add image_name to no-sync-config-targets
      e83bad7f
    • Minchan Kim's avatar
      zram: fix broken page writeback · 2766f182
      Minchan Kim authored
      commit 0d835962 ("zram: support page writeback") introduced two
      problems.  It overwrites writeback_store's return value as kstrtol's
      return value, which makes return value zero so user could see zero as
      return value of write syscall even though it wrote data successfully.
      
      It also breaks index value in the loop in that it doesn't increase the
      index any longer.  It means it can write only first starting block index
      so user couldn't write all idle pages in the zram so lose memory saving
      chance.
      
      This patch fixes those issues.
      
      Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org
      Fixes: 0d835962
      
      ("zram: support page writeback")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarAmos Bianchi <amosbianchi@google.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: John Dias <joaodias@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2766f182
    • Minchan Kim's avatar
      zram: fix return value on writeback_store · 57e0076e
      Minchan Kim authored
      writeback_store's return value is overwritten by submit_bio_wait's return
      value.  Thus, writeback_store will return zero since there was no IO
      error.  In the end, write syscall from userspace will see the zero as
      return value, which could make the process stall to keep trying the write
      until it will succeed.
      
      Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org
      Fixes: 3b82a051
      
      ("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: John Dias <joaodias@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57e0076e