Skip to content
  1. Oct 29, 2013
    • Peter Zijlstra's avatar
      perf/x86: Fix NMI measurements · e8a923cc
      Peter Zijlstra authored
      
      
      OK, so what I'm actually seeing on my WSM is that sched/clock.c is
      'broken' for the purpose we're using it for.
      
      What triggered it is that my WSM-EP is broken :-(
      
        [    0.001000] tsc: Fast TSC calibration using PIT
        [    0.002000] tsc: Detected 2533.715 MHz processor
        [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
        [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
        [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      
      For some reason it consistently detects TSC skew, even though NHM+
      should have a single clock domain for 'reasonable' systems.
      
      This marks sched_clock_stable=0, which means that we do fancy stuff to
      try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
      clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
      it either wakes up or gets kicked by another cpu.
      
      While this is perfectly fine for the scheduler -- it only cares about
      actually running stuff, and when we're running stuff we're obviously not
      idle. This does somewhat break down for perf which can trigger events
      just fine on an otherwise idle cpu.
      
      So I've got NMIs get get 'measured' as taking ~1ms, which actually
      don't last nearly that long:
      
                <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
        ...
                <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990
      
      So ftrace (which uses sched_clock(), not the fancy bits) only sees
      ~27us, but we measure ~1ms !!
      
      Now since all this measurement stuff lives in x86 code, we can actually
      fix it.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: mingo@kernel.org
      Cc: dave.hansen@linux.intel.com
      Cc: eranian@google.com
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: jmario@redhat.com
      Cc: acme@infradead.org
      Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e8a923cc
    • Peter Zijlstra's avatar
      perf: Fix perf ring buffer memory ordering · bf378d34
      Peter Zijlstra authored
      
      
      The PPC64 people noticed a missing memory barrier and crufty old
      comments in the perf ring buffer code. So update all the comments and
      add the missing barrier.
      
      When the architecture implements local_t using atomic_long_t there
      will be double barriers issued; but short of introducing more
      conditional barrier primitives this is the best we can do.
      
      Reported-by: default avatarVictor Kaplansky <victork@il.ibm.com>
      Tested-by: default avatarVictor Kaplansky <victork@il.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: michael@ellerman.id.au
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: anton@samba.org
      Cc: benh@kernel.crashing.org
      Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bf378d34
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo' of... · cd657187
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
       into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
       * Add color overhead for stdio output buffer, which fixes
         --stdio output being chopped up on the hot (red) entries,
         fix from Jiri Olsa.
      
       * Get 'perf record -g -a sleep 1' working again, removing the
         need for -- separating perf options from the workload, restoring
         ages old behaviour, fix from Jiri Olsa.
         More patches allowing ~/.perfconfig setting up of default
         callchain collecting method ("fp" or "dwarf") left for next
         merge window.
      
       * Fixup mmap event consumption, where we were acking the
         consumption by writing the tail before actually accessing
         the event, which could lead to using overwritten records
         in things like 'perf record --call-graph'. From Zhouyi Zhou.
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cd657187
    • Zhouyi Zhou's avatar
      perf tools: Fixup mmap event consumption · 8e50d384
      Zhouyi Zhou authored
      
      
      The tail position of the event buffer should only be modified after
      actually use that event.
      
      If not the event buffer could be invalid before use, and segment fault
      occurs when invoking perf top -G.
      
      Signed-off-by: default avatarZhouyi Zhou <yizhouzhou@ict.ac.cn>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
      Link: http://lkml.kernel.org/r/1382600613-32177-1-git-send-email-zhouzhouyi@gmail.com
      
      
      [ Simplified the logic using exit gotos and renamed write_tail method to mmap_consume ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8e50d384
    • Jiri Olsa's avatar
      perf top: Split -G and --call-graph · ae779a63
      Jiri Olsa authored
      
      
      Splitting -G and --call-graph for record command, so we could use '-G'
      with no option.
      
      The '-G' option now takes NO argument and enables the configured unwind
      method, which is currently the frame pointers method.
      
      It will be possible to configure unwind method via config file in
      upcoming patches.
      
      All current '-G' arguments is overtaken by --call-graph option.
      
      NOTE: The documentation for top --call-graph option
            was wrongly copied from report command.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1382797536-32303-3-git-send-email-jolsa@redhat.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ae779a63
    • Jiri Olsa's avatar
      perf record: Split -g and --call-graph · 09b0fd45
      Jiri Olsa authored
      
      
      Splitting -g and --call-graph for record command, so we could use '-g'
      with no option.
      
      The '-g' option now takes NO argument and enables the configured unwind
      method, which is currently the frame pointers method.
      
      It will be possible to configure unwind method via config file in
      upcoming patches.
      
      All current '-g' arguments is overtaken by --call-graph option.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
      
      
      [ reordered -g/--call-graph on --help and expanded the man page
        according to comments by David Ahern and Namhyung Kim ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      09b0fd45
    • Jiri Olsa's avatar
      perf hists: Add color overhead for stdio output buffer · 9754c4f9
      Jiri Olsa authored
      
      
      Following commit tightened up the buffer size for output to strict width
      of used format columns:
      
        99cf666c perf hists: Fix formatting of long symbol names
      
      This works fine until you hit color overhead output which places extra
      bytes into output buffer. We need to account for color overhead in the
      output buffer. Adding maximum color byte size to the output buffer size.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1382700293-1803-1-git-send-email-jolsa@redhat.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9754c4f9
  2. Oct 28, 2013
  3. Oct 27, 2013
    • Helge Deller's avatar
      parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM · 54e181e0
      Helge Deller authored
      
      
      Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
      not able to bring up other CPUs than the monarch CPU and instead crashed the
      kernel.  The reason was unclear, esp. since it involved various machines (e.g.
      J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
      when less than 4GB were installed, or if a 32bit Linux kernel was booted.
      
      In the end, the fix for those SMP problems is trivial:
      During the early phase of the initialization of the CPUs, including the monarch
      CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
      It's documented that this firmware function may clobber various registers, and
      one one of those possibly clobbered registers is %cr30 which holds the task
      thread info pointer.
      
      Now, if %cr30 would always have been clobbered, then this bug would have been
      detected much earlier. But lots of testing finally showed, that - at least for
      %cr30 - on some machines only the upper 32bits of the 64bit register suddenly
      turned zero after the firmware call.
      
      So, after finding the root cause, the explanation for the various crashes
      became clear:
      - On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
        problem.
      - Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
        thread info pointer was below 4GB.
      - Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
        the upper 32bit were zero anyay.
      - Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
        info pointer was located above the 4GB boundary.
      
      Finally, the patch to fix this problem is trivial by saving the %cr30 register
      before the firmware call and restoring it afterwards.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # 2.6.12+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      54e181e0
  4. Oct 26, 2013
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 20582e34
      Linus Torvalds authored
      Pull ACPI and power management fixes from
       "These fix two bugs in the intel_pstate driver, a hibernate bug leading
        to nasty resume failures sometimes and acpi-cpufreq initialization bug
        that causes problems to happen during module unload when intel_pstate
        is in use.
      
        Specifics:
      
         - Fix for rounding errors in intel_pstate causing CPU utilization to
           be underestimated from Brennan Shacklett.
      
         - intel_pstate fix to always use the correct max pstate value when
           computing the min pstate from Dirk Brandewie.
      
         - Hibernation fix for deadlocking resume in cases when the probing of
           the device containing the image is deferred from Russ Dill.
      
         - acpi-cpufreq fix to prevent the module from staying in memory when
           the driver cannot be registered and then attempting to unregister
           things that have never been registered on exit"
      
      * tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        acpi-cpufreq: Fail initialization if driver cannot be registered
        PM / hibernate: Move software_resume to late_initcall_sync
        intel_pstate: Correct calculation of min pstate value
        intel_pstate: Improve accuracy by not truncating until final result
      20582e34
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd · d255c59a
      Linus Torvalds authored
      Pull final mtd fixes from Brian Norris:
       "A few more last-minute regression fixes, prepared jointly by me and
        David Woodhouse:
      
         - Revert pxa3xx to its old name to avoid breaking existing
           'mtdparts=' boot strings.
      
         - Return GPMI NAND to its legacy ECC layout for backwards
           compatibility.  We will revisit this in 3.13.
      
        A note from David on the latter fix: 'This leaves a harmless cosmetic
        warning about an unused function.  At this point in the cycle I really
        don't care.'"
      
      * tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd:
        mtd: gpmi: fix ECC regression
        mtd: nand: pxa3xx: Fix registered MTD name
      d255c59a
    • Nicholas Bellinger's avatar
      vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter · 60a01f55
      Nicholas Bellinger authored
      
      
      This patch addresses a long-standing bug where the get_user_pages_fast()
      write parameter used for setting the underlying page table entry permission
      bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
      passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
      
      However, this parameter is intended to signal WRITEs to pinned userspace
      PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
      for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
      
      This bug would manifest itself as random process segmentation faults on
      KVM host after repeated vhost starts + stops and/or with lots of vhost
      endpoints + LUNs.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Asias He <asias@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.6+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      60a01f55
    • Wei Yongjun's avatar
      target/pscsi: fix return value check · 58932e96
      Wei Yongjun authored
      
      
      In case of error, the function scsi_host_lookup() returns NULL
      pointer not ERR_PTR(). The IS_ERR() test in the return value check
      should be replaced with NULL test.
      
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      58932e96
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f55ac56d
      Linus Torvalds authored
      Pull vfs fixes (try two) from Al Viro:
       "nfsd performance regression fix + seq_file lseek(2) fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        seq_file: always update file->f_pos in seq_lseek()
        nfsd regression since delayed fput()
      f55ac56d
    • David Woodhouse's avatar
      mtd: gpmi: fix ECC regression · 031e2777
      David Woodhouse authored
      
      
      The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
      computing the ECC strength and ECC step size ourselves.
      
      Commit 2febcdf8 ("mtd: gpmi: set the BCHs geometry with the ecc info")
      makes the driver use the ECC info (ECC strength and ECC step size)
      provided by the MTD code, and creates a different NAND ECC layout
      for the BCH, and use the new ECC layout. This causes a regression:
      
         We can not mount the ubifs which was created by the old NAND ECC layout.
      
      This patch fixes this issue by reverting to the legacy ECC layout.
      
      We will probably introduce a new device-tree property to indicate that
      the new ECC layout can be used. For now though, for the imminent 3.12
      release, we just unconditionally revert to the 3.11 behaviour.
      
      This leaves a harmless cosmetic warning about an unused function. At
      this point in the cycle I really don't care.
      
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Acked-by: default avatarHuang Shijie <b32955@freescale.com>
      Acked-by: default avatarMarek Vasut <marex@denx.de>
      Tested-by: default avatarMarek Vasut <marex@denx.de>
      031e2777
  5. Oct 25, 2013
  6. Oct 24, 2013