Skip to content
  1. Feb 07, 2014
    • Shaohua Li's avatar
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li authored
      
      
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
    • Zongxun Wang's avatar
      ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters · fb951eb5
      Zongxun Wang authored
      
      
      Even if using the same jbd2 handle, we cannot rollback a transaction.
      So once some error occurs after successfully allocating clusters, the
      allocated clusters will never be used and it means they are lost.  For
      example, call ocfs2_claim_clusters successfully when expanding a file,
      but failed in ocfs2_insert_extent.  So we need free the allocated
      clusters if they are not used indeed.
      
      Signed-off-by: default avatarZongxun Wang <wangzongxun@huawei.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb951eb5
    • Randy Dunlap's avatar
      Documentation/kernel-parameters.txt: fix memmap= language · 277cba1d
      Randy Dunlap authored
      
      
      Clean up descriptions of memmap= boot options.
      
      Add periods (full stops), drop commas, change "used" to "reserved" or
      "marked".
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Andiry Xu <andiry.xu@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      277cba1d
  2. Feb 06, 2014
  3. Feb 05, 2014
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 878a876b
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "Filipe is fixing compile and boot problems with our crc32c rework, and
        Josef has disabled snapshot aware defrag for now.
      
        As the number of snapshots increases, we're hitting OOM.  For the
        short term we're disabling things until a bigger fix is ready"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: use late_initcall instead of module_init
        Btrfs: use btrfs_crc32c everywhere instead of libcrc32c
        Btrfs: disable snapshot aware defrag for now
      878a876b
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · d7512f79
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights:
      
         - Fix NFSv3 acl regressions
         - Fix NFSv4 memory corruption due to slot table abuse in
           nfs4_proc_open_confirm
         - nfs4_destroy_session must call rpc_destroy_waitqueue"
      
      * tag 'nfs-for-3.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        fs: get_acl() must be allowed to return EOPNOTSUPP
        NFSv3: Fix return value of nfs3_proc_setacls
        NFSv3: Remove unused function nfs3_proc_set_default_acl
        NFSv4.1: nfs4_destroy_session must call rpc_destroy_waitqueue
        NFSv4: Fix memory corruption in nfs4_proc_open_confirm
        nfs: fix setting of ACLs on file creation.
      d7512f79
    • Linus Torvalds's avatar
      kbuild: don't enable DEBUG_INFO when building for COMPILE_TEST · 12b13835
      Linus Torvalds authored
      
      
      It really isn't very interesting to have DEBUG_INFO when doing compile
      coverage stuff (you wouldn't want to run the result anyway, that's kind
      of the whole point of COMPILE_TEST), and it currently makes the build
      take longer and use much more disk space for "all{yes,mod}config".
      
      There's somewhat active discussion about this still, and we might end up
      with some new config option for things like this (Andi points out that
      the silly X86_DECODER_SELFTEST option also slows down the normal
      coverage tests hugely), but I'm starting the ball rolling with this
      simple one-liner.
      
      DEBUG_INFO isn't that noticeable if you have tons of memory and a good
      IO subsystem, but it hurts you a lot if you don't - for very little
      upside for the common use.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12b13835
  4. Feb 04, 2014
  5. Feb 03, 2014
    • Konrad Rzeszutek Wilk's avatar
      Revert "xen/grant-table: Avoid m2p_override during mapping" · e85fc980
      Konrad Rzeszutek Wilk authored
      This reverts commit 08ece5bb
      
      .
      
      As it breaks ARM builds and needs more attention
      on the ARM side.
      
      Acked-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e85fc980
    • Linus Torvalds's avatar
      Linus 3.14-rc1 · 38dbfb59
      Linus Torvalds authored
      38dbfb59
    • Linus Torvalds's avatar
      Merge branch 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 69048e01
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
       "The three major changes in this patchset is a implementation for
        flexible userspace memory maps, cache-flushing fixes (again), and a
        long-discussed ABI change to make EWOULDBLOCK the same value as
        EAGAIN.
      
        parisc has been the only platform where we had EWOULDBLOCK != EAGAIN
        to keep HP-UX compatibility.  Since we will probably never implement
        full HP-UX support, we prefer to drop this compatibility to make it
        easier for us with Linux userspace programs which mostly never checked
        for both values.  We don't expect major fall-outs because of this
        change, and if we face some, we will simply rebuild the necessary
        applications in the debian archives"
      
      * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: add flexible mmap memory layout support
        parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc
        parisc: convert uapi/asm/stat.h to use native types only
        parisc: wire up sched_setattr and sched_getattr
        parisc: fix cache-flushing
        parisc/sti_console: prefer Linux fonts over built-in ROM fonts
      69048e01
    • Mikulas Patocka's avatar
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka authored
      
      
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • Mikulas Patocka's avatar
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka authored
      
      
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
    • Helge Deller's avatar
      parisc: add flexible mmap memory layout support · 9dabf60d
      Helge Deller authored
      Add support for the flexible mmap memory layout (as described in
      http://lwn.net/Articles/91829
      
      ). This is especially very interesting on
      parisc since we currently only support 32bit userspace (even with a
      64bit Linux kernel).
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9dabf60d
    • Guy Martin's avatar
      parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc · f5a408d5
      Guy Martin authored
      
      
      On Linux, only parisc uses a different value for EWOULDBLOCK which
      causes a lot of troubles for applications not checking for both values.
      Since the hpux compat is long dead, make EWOULDBLOCK behave the same as
      all other architectures.
      
      Signed-off-by: default avatarGuy Martin <gmsoft@tuxicoman.be>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f5a408d5
    • Helge Deller's avatar
      parisc: convert uapi/asm/stat.h to use native types only · 9391bc77
      Helge Deller authored
      
      
      The stat.h header file is exported to userspace. Some userspace
      applications failed to compile due to missing/unknown types, so we
      better convert it to use native types only (like it's done on other
      architectures too).
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9391bc77
    • Helge Deller's avatar
      parisc: wire up sched_setattr and sched_getattr · 998bbb2f
      Helge Deller authored
      
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      998bbb2f
    • Helge Deller's avatar
      parisc: fix cache-flushing · 57737c49
      Helge Deller authored
      This commit:
      f8dae006
      
      : parisc: Ensure full cache coherency for kmap/kunmap
      caused negative caching side-effects, e.g. hanging processes with expect and
      too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems.
      
      This patch now partly reverts it and has been in production use on our debian buildd
      makeservers since a week without any major problems.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      57737c49
    • Helge Deller's avatar
      parisc/sti_console: prefer Linux fonts over built-in ROM fonts · 8a10bc9d
      Helge Deller authored
      
      
      The built-in ROM fonts lack many necessary ASCII characters, which is
      why it makes sens to prefer the Linux fonts instead if they are
      available.  This makes consoles on STI graphics cards which are not
      supported by the stifb driver (e.g. Visualize FXe) looks much nicer.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # v3.13
      8a10bc9d
    • Linus Torvalds's avatar
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 602456bf
      Linus Torvalds authored
      Pull hwmon kconfig fixes from Jean Delvare.
      
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        hwmon: Fix SENSORS_TMP102 dependencies to eliminate build errors
        hwmon: Fix SENSORS_LM75 dependencies to eliminate build errors
      602456bf
    • Linus Torvalds's avatar
      Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux · 7b383bef
      Linus Torvalds authored
      Pull SLAB changes from Pekka Enberg:
       "Random bug fixes that have accumulated in my inbox over the past few
        months"
      
      * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
        mm: Fix warning on make htmldocs caused by slab.c
        mm: slub: work around unneeded lockdep warning
        mm: sl[uo]b: fix misleading comments
        slub: Fix possible format string bug.
        slub: use lockdep_assert_held
        slub: Fix calculation of cpu slabs
        slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings
      7b383bef
    • Linus Torvalds's avatar
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · 87af5e5c
      Linus Torvalds authored
      Pull turbostat updates from Len Brown.
      
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
        tools/power turbostat: introduce -s to dump counters
        tools/power turbostat: remove unused command line option
        turbostat: Add option to report joules consumed per sample
        turbostat: run on HSX
        turbostat: Add a .gitignore to ignore the compiled turbostat binary
        turbostat: Clean up error handling; disambiguate error messages; use err and errx
        turbostat: Factor out common function to open file and exit on failure
        turbostat: Add a helper to parse a single int out of a file
        turbostat: Check return value of fscanf
        turbostat: Use GCC's CPUID functions to support PIC
        turbostat: Don't attempt to printf an off_t with %zx
        turbostat: Don't put unprocessed uapi headers in the include path
      87af5e5c
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · e4c0da21
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Here's a set of patches for (hopefully) -rc1.  Some of them are fixes,
        but a good number of them also do things such as enable new drivers in
        the defconfigs for platforms that have such devices, increases
        coverage of the multiplatform defconfig and some DTS changes that
        plumbs up some of the devices that now have bindings and driver
        support.
      
        The commit dates are recent; we've mostly collected these fixes in the
        last few days but I also had to rebuild the branch yesterday to sort
        out some internal conflicts which reset the timestamps.  The changes
        should have been tested by each platform maintainer already (and few
        of them have cross-platform impact) so I'm personally not too
        concerned by it at this time"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (23 commits)
        ARM: multi_v7_defconfig: remove redundant entries and re-enable TI_EDMA
        ARM: multi_v7_defconfig: add mvebu drivers
        clocksource: kona: Add basic use of external clock
        drivers: bus: fix CCI driver kcalloc call parameters swap
        ARM: dts: bcm28155-ap: Fix Card Detection GPIO
        ARM: multi_v7_defconfig: Select CONFIG_AT803X_PHY
        ARM: keystone: config: fix build warning when CONFIG_DMADEVICES is not set
        MAINTAINERS: ARM: SiRF: use regex patterns to involve all SiRF drivers
        ARM: dts: zynq: Add SDHCI nodes
        ARM: hisi: don't select SMP
        ARM: tegra: rebuild tegra_defconfig to add DEBUG_FS
        ARM: multi_v7: copy most options from tegra_defconfig
        ARM: iop32x: fix power off handling for the EM7210 board
        ARM: integrator: restore static map on the CP
        ARM: msm_defconfig: Enable MSM clock drivers
        ARM: dts: msm: Add clock controller nodes and hook into uart
        ARM: OMAP4+: move errata initialization to omap4_pm_init_early
        ARM: OMAP4460: cpuidle: Extend PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD on cpuidle
        ARM: mvebu: fix compilation warning on Armada 370 (i.e. non-SMP)
        ARM: shmobile: r8a7790.dtsi: ficx i2c[0-3] clock reference
        ...
      e4c0da21
    • Keith Busch's avatar
      NVMe: Namespace use after free on surprise removal · 9ac27090
      Keith Busch authored
      
      
      An nvme block device may have open references when the device is
      removed. New commands may still be sent on the removed device, so we
      need to ref count the opens, return errors for new commands, and not
      free the namespace and nvme_dev until all references are closed.
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      9ac27090