Skip to content
  1. Oct 28, 2013
  2. Oct 24, 2013
    • Joseph Schuchart's avatar
      perf script python: Fix mem leak due to missing Py_DECREFs on dict entries · c0268e8d
      Joseph Schuchart authored
      We are using the Python scripting interface in perf to extract kernel
      events relevant for performance analysis of HPC codes. We noticed that
      the "perf script" call allocates a significant amount of memory (in the
      order of several 100 MiB) during it's run, e.g. 125 MiB for a 25 MiB
      input file:
      
        $> perf record -o perf.data -a -R -g fp \
             -e power:cpu_frequency -e sched:sched_switch \
             -e sched:sched_migrate_task -e sched:sched_process_exit \
             -e sched:sched_process_fork -e sched:sched_process_exec \
             -e cycles  -m 4096 --freq 4000
        $> /usr/bin/time perf script -i perf.data -s dummy_script.py
        0.84user 0.13system 0:01.92elapsed 51%CPU (0avgtext+0avgdata
        125532maxresident)k
        73072inputs+0outputs (57major+33086minor)pagefaults 0swaps
      
      Upon further investigation using the valgrind massif tool, we noticed
      that Python objects that are created in trace-event-python.c via
      PyString_FromString*() (and their Integer and Long counterparts) are
      never free'd.
      
      The reason for this seem to be missing Py_DECREF calls on the objects
      that are returned by these functions and stored in the Python
      dictionaries. The Python dictionaries do not steal references (as
      opposed to Python tuples and lists) but instead add their own reference.
      
      Hence, the reference that is returned by these object creation functions
      is never released and the memory is leaked. (see [1,2])
      
      The attached patch fixes this by wrapping all relevant calls to
      PyDict_SetItemString() and decrementing the reference counter
      immediately after the Python function call.
      
      This reduces the allocated memory to a reasonable amount:
      
        $> /usr/bin/time perf script -i perf.data -s dummy_script.py
        0.73user 0.05system 0:00.79elapsed 99%CPU (0avgtext+0avgdata
        49132maxresident)k
        0inputs+0outputs (0major+14045minor)pagefaults 0swaps
      
      For comparison, with a 120 MiB input file the memory consumption
      reported by time drops from almost 600 MiB to 146 MiB.
      
      The patch has been tested using Linux 3.8.2 with Python 2.7.4 and Linux
      3.11.6 with Python 2.7.5.
      
      Please let me know if you need any further information.
      
      [1] http://docs.python.org/2/c-api/tuple.html#PyTuple_SetItem
      [2] http://docs.python.org/2/c-api/dict.html#PyDict_SetItemString
      
      
      
      Signed-off-by: default avatarJoseph Schuchart <joseph.schuchart@tu-dresden.de>
      Reviewed-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Link: http://lkml.kernel.org/r/1381468543-25334-4-git-send-email-namhyung@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c0268e8d
  3. Oct 20, 2013
  4. Oct 19, 2013
  5. Oct 18, 2013
  6. Oct 17, 2013