Skip to content
  1. Apr 26, 2011
  2. Apr 24, 2011
  3. Apr 23, 2011
  4. Apr 22, 2011
    • Peter Zijlstra's avatar
      perf, x86: Update/fix Intel Nehalem cache events · f4929bd3
      Peter Zijlstra authored
      
      
      Change the Nehalem cache events to use retired memory instruction counters
      (similar to Westmere), this greatly improves the provided stats.
      
      Using:
      
      main ()
      {
              int i;
      
              for (i = 0; i < 1000000000; i++) {
                      asm("mov (%%rsp), %%rbx;"
                          "mov %%rbx, (%%rsp);" : : : "rbx");
              }
      }
      
      We find:
      
       $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
            4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
            1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
               1.565184942  seconds time elapsed   ( +-   0.005% )
      
      The 5b is surprising - we'd expect 1b:
      
       $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
            1,000,021,961 r10b:u                     ( +-   0.000% )
            1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
               1.565055422  seconds time elapsed   ( +-   0.003% )
      
      Which this patch thus fixes.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f4929bd3
    • Cyrill Gorcunov's avatar
      perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow · 1ea5a6af
      Cyrill Gorcunov authored
      
      
      It's not enough to simply disable event on overflow the
      cpuc->active_mask should be cleared as well otherwise counter
      may stall in "active" even in real being already disabled (which
      potentially may lead to the situation that user may not use this
      counter further).
      
      Don pointed out that:
      
       " I also noticed this patch fixed some unknown NMIs
         on a P4 when I stressed the box".
      
      Tested-by: default avatarLin Ming <ming.m.lin@intel.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1ea5a6af
    • Ingo Molnar's avatar
      x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6
      Ingo Molnar authored
      Andi Kleen pointed out that the Intel offcore support patches were merged
      without user-space tool support to the functionality:
      
       |
       | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
       | user space bits were not. This made it impossible to set the extra mask
       | and actually do the OFFCORE profiling
       |
      
      Andi submitted a preliminary patch for user-space support, as an
      extension to perf's raw event syntax:
      
       |
       | Some raw events -- like the Intel OFFCORE events -- support additional
       | parameters. These can be appended after a ':'.
       |
       | For example on a multi socket Intel Nehalem:
       |
       |    perf stat -e r1b7:20ff -a sleep 1
       |
       | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
       | that measures any access to DRAM on another socket.
       |
      
      But this kind of usability is absolutely unacceptable - users should not
      be expected to type in magic, CPU and model specific incantations to get
      access to useful hardware functionality.
      
      The proper solution is to expose useful offcore functionality via
      generalized events - that way users do not have to care which specific
      CPU model they are using, they can use the conceptual event and not some
      model specific quirky hexa number.
      
      We already have such generalization in place for CPU cache events,
      and it's all very extensible.
      
      "Offcore" events measure general DRAM access patters along various
      parameters. They are particularly useful in NUMA systems.
      
      We want to support them via generalized DRAM events: either as the
      fourth level of cache (after the last-level cache), or as a separate
      generalization category.
      
      That way user-space support would be very obvious, memory access
      profiling could be done via self-explanatory commands like:
      
        perf record -e dram ./myapp
        perf record -e dram-remote ./myapp
      
      ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
      accesses.
      
      These generalized events would work on all CPUs and architectures that
      have comparable PMU features.
      
      ( Note, these are just examples: actual implementation could have more
        sophistication and more parameter - as long as they center around
        similarly simple usecases. )
      
      Now we do not want to revert *all* of the current offcore bits, as they
      are still somewhat useful for generic last-level-cache events, implemented
      in this commit:
      
        e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere
      
      But we definitely do not yet want to expose the unstructured raw events
      to user-space, until better generalization and usability is implemented
      for these hardware event features.
      
      ( Note: after generalization has been implemented raw offcore events can be
        supported as well: there can always be an odd event that is marginally
        useful but not useful enough to generalize. DRAM profiling is definitely
        *not* such a category so generalization must be done first. )
      
      Furthermore, PERF_TYPE_RAW access to these registers was not intended
      to go upstream without proper support - it was a side-effect of the above
      e994d7d2
      
       commit, not mentioned in the changelog.
      
      As v2.6.39 is nearing release we go for the simplest approach: disable
      the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
      kernel and becomes an ABI.
      
      Once proper structure is implemented for these hardware events and users
      are offered usable solutions we can revisit this issue.
      
      Reported-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b52c55c6
    • Andi Kleen's avatar
      perf: Support Xeon E7's via the Westmere PMU driver · b2508e82
      Andi Kleen authored
      
      
      There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
      
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: a.p.zijlstra@chello.nl
      Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b2508e82
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 91e8549b
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
        block: don't propagate unlisted DISK_EVENTs to userland
        elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
      91e8549b
    • Tejun Heo's avatar
      ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd · 7eec77a1
      Tejun Heo authored
      
      
      check_events() implementations in both ide-gd and ide-cd are
      inadequate for in-kernel event polling.  Both generate media change
      events continuously when certain conditions are met causing infinite
      event loop between the driver and userland event handler.
      
      As disk event now supports suppression of unlisted events, simply
      de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
      problem.  Internal handling around media revalidation will behave the
      same while userland will fall back to userland event polling after
      detecting the device doesn't support disk events.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      7eec77a1
    • Tejun Heo's avatar
      block: don't propagate unlisted DISK_EVENTs to userland · 7c88a168
      Tejun Heo authored
      
      
      DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
      internal event for revalidation of removeable devices.  Some legacy
      drivers don't implement proper event detection and continuously
      generate events under certain circumstances.  For example, ide-cd
      generates media changed continuously if there's no media in the drive,
      which can lead to infinite loop of events jumping back and forth
      between the driver and userland event handler.
      
      This patch updates disk event infrastructure such that it never
      propagates events not listed in disk->events to userland.  Those
      events are processed the same for internal purposes but uevent
      generation is suppressed.
      
      This also ensures that userland only gets events which are advertised
      in the @events sysfs node lowering risk of confusion.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      7c88a168
    • Jens Axboe's avatar
      elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too · 3aa72873
      Jens Axboe authored
      
      
      The sort insert is the one that goes to the IO scheduler. With
      the SORT_MERGE addition, we could bypass IO scheduler setup
      but still ask the IO scheduler to insert the request. This would
      cause an oops on switching IO schedulers through the sysfs
      interface, unless the disk just happened to be idle while it
      occured.
      
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      3aa72873