Skip to content
  1. May 21, 2016
    • Johannes Weiner's avatar
      mm: filemap: only do access activations on reads · bbddabe2
      Johannes Weiner authored
      
      
      Andres observed that his database workload is struggling with the
      transaction journal creating pressure on frequently read pages.
      
      Access patterns like transaction journals frequently write the same
      pages over and over, but in the majority of cases those pages are never
      read back.  There are no caching benefits to be had for those pages, so
      activating them and having them put pressure on pages that do benefit
      from caching is a bad choice.
      
      Leave page activations to read accesses and don't promote pages based on
      writes alone.
      
      It could be said that partially written pages do contain cache-worthy
      data, because even if *userspace* does not access the unwritten part,
      the kernel still has to read it from the filesystem for correctness.
      However, a counter argument is that these pages enjoy at least *some*
      protection over other inactive file pages through the writeback cache,
      in the sense that dirty pages are written back with a delay and cache
      reclaim leaves them alone until they have been written back to disk.
      Should that turn out to be insufficient and we see increased read IO
      from partial writes under memory pressure, we can always go back and
      update grab_cache_page_write_begin() to take (pos, len) so that it can
      tell partial writes from pages that don't need partial reads.  But for
      now, keep it simple.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bbddabe2
    • Rik van Riel's avatar
      mm: workingset: only do workingset activations on reads · f0281a00
      Rik van Riel authored
      This is a follow-up to
      
        http://www.spinics.net/lists/linux-mm/msg101739.html
      
      
      
      where Andres reported his database workingset being pushed out by the
      minimum size enforcement of the inactive file list - currently 50% of
      cache - as well as repeatedly written file pages that are never actually
      read.
      
      Two changes fell out of the discussions.  The first change observes that
      pages that are only ever written don't benefit from caching beyond what
      the writeback cache does for partial page writes, and so we shouldn't
      promote them to the active file list where they compete with pages whose
      cached data is actually accessed repeatedly.  This change comes in two
      patches - one for in-cache write accesses and one for refaults triggered
      by writes, neither of which should promote a cache page.
      
      Second, with the refault detection we don't need to set 50% of the cache
      aside for used-once cache anymore since we can detect frequently used
      pages even when they are evicted between accesses.  We can allow the
      active list to be bigger and thus protect a bigger workingset that isn't
      challenged by streamers.  Depending on the access patterns, this can
      increase major faults during workingset transitions for better
      performance during stable phases.
      
      This patch (of 3):
      
      When rewriting a page, the data in that page is replaced with new data.
      This means that evicting something else from the active file list, in
      order to cache data that will be replaced by something else, is likely
      to be a waste of memory.
      
      It is better to save the active list for frequently read pages, because
      reads actually use the data that is in the page.
      
      This patch ignores partial writes, because it is unclear whether the
      complexity of identifying those is worth any potential performance gain
      obtained from better caching pages that see repeated partial writes at
      large enough intervals to not get caught by the use-twice promotion code
      used for the inactive file list.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0281a00
    • Linus Torvalds's avatar
      Merge tag 'mfd-for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 6eb59af5
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New Drivers:
         - Add new driver for MAXIM MAX77620/MAX20024 PMIC
         - Add new driver for Hisilicon HI665X PMIC
      
        New Device Support:
         - Add support for AXP809 in axp20x-rsb
         - Add support for Power Supply in axp20x
      
        New core features:
         - devm_mfd_* managed resources
      
        Fix-ups:
         - Remove unused code (da9063-irq, wm8400-core, tps6105x,
           smsc-ece1099, twl4030-power)
         - Improve clean-up in error path (intel_quark_i2c_gpio)
         - Explicitly include headers (syscon.h)
         - Allow building as modules (max77693)
         - Use IS_ENABLED() instead of rolling your own (dm355evm_msp,
           wm8400-core)
         - DT adaptions (axp20x, hi655x, arizona, max77620)
         - Remove CLK_IS_ROOT flag (intel-lpss, intel_quark)
         - Move to gpiochip API (asic3, dm355evm_msp, htc-egpio, htc-i2cpld,
           sm501, tc6393xb, tps65010, ucb1x00, vexpress)
         - Make use of devm_mfd_* calls (act8945a, as3711, atmel-hlcdc,
           bcm590xx, hi6421-pmic-core, lp3943, menf21bmc, mt6397, rdc321x,
           rk808, rn5t618, rt5033, sky81452, stw481x, tps6507x, tps65217,
           wm8400)
      
        Bug Fixes"
         - Fix ACPI child matching (mfd-core)
         - Fix start-up ordering issues (mt6397-core, arizona-core)
         - Fix forgotten register state on resume (intel-lpss)
         - Fix Clock related issues (twl6040)
         - Fix scheduling whilst atomic (omap-usb-tll)
         - Kconfig changes (vexpress)"
      
      * tag 'mfd-for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (73 commits)
        mfd: hi655x: Add MFD driver for hi655x
        mfd: ab8500-debugfs: Trivial fix of spelling mistake on "between"
        mfd: vexpress: Add !ARCH_USES_GETTIMEOFFSET dependency
        mfd: Add device-tree binding doc for PMIC MAX77620/MAX20024
        mfd: max77620: Add core driver for MAX77620/MAX20024
        mfd: arizona: Add defines for GPSW values that can be used from DT
        mfd: omap-usb-tll: Fix scheduling while atomic BUG
        mfd: wm5110: ARIZONA_CLOCK_CONTROL should be volatile
        mfd: axp20x: Add a cell for the ac power_supply part of the axp20x PMICs
        mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table correctly
        mfd: wl1273-core: Use devm_mfd_add_devices() for mfd_device registration
        mfd: tps65910: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
        mfd: sec: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
        mfd: rc5t583: Use devm_mfd_add_devices and devm_request_threaded_irq
        mfd: max77686: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
        mfd: as3722: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
        mfd: twl4030-power: Remove driver path in file comment
        MAINTAINERS: Add entry for X-Powers AXP family PMIC drivers
        mfd: smsc-ece1099: Remove unnecessarily remove callback
        mfd: Use IS_ENABLED(CONFIG_FOO) instead of checking FOO || FOO_MODULE
        ...
      6eb59af5
    • Linus Torvalds's avatar
      Merge tag 'hsi-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi · 4d230d4d
      Linus Torvalds authored
      Pull HSI updates from Sebastian Reichel:
      
       - merge omap-ssi and omap-ssi-port modules
      
       - fix omap-ssi module reloading
      
       - add DVFS support to omap-ssi
      
      * tag 'hsi-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
        HSI: omap-ssi: move omap_ssi_port_update_fclk
        HSI: omap-ssi: include pinctrl header files
        HSI: omap-ssi: add COMMON_CLK dependency
        HSI: omap-ssi: add clk change support
        HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module
        HSI: omap_ssi: fix removal of port platform device
        HSI: omap_ssi: make sure probe stays available
        HSI: omap_ssi: fix module unloading
        HSI: omap_ssi_port: switch to gpiod API
      4d230d4d
    • Linus Torvalds's avatar
      Merge tag 'fbdev-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · 410b4297
      Linus Torvalds authored
      Pull fbdev updates from Tomi Valkeinen:
      
       - imxfb: fix lcd power up
      
       - small fixes and cleanups
      
      * tag 'fbdev-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
        fbdev: Use IS_ENABLED() instead of checking for built-in or module
        efifb: Don't show the mapping VA
        video: AMBA CLCD: Remove unncessary include in amba-clcd.c
        fbdev: ssd1307fb: Fix charge pump setting
        Documentation: fb: fix spelling mistakes
        fbdev: fbmem: implement error handling in fbmem_init()
        fbdev: sh_mipi_dsi: remove driver
        video: fbdev: imxfb: add some error handling
        video: fbdev: imxfb: fix semantic of .get_power and .set_power
        video: fbdev: omap2: Remove deprecated regulator_can_change_voltage() usage
      410b4297
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · e4fba88d
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix a regression that causes sha-mb to crash"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: sha1-mb - make sha1_x8_avx2() conform to C function ABI
      e4fba88d
    • Arnd Bergmann's avatar
      irqchip: nps: add 64BIT dependency · ffd565e3
      Arnd Bergmann authored
      
      
      The newly added nps irqchip driver causes build warnings on ARM64.
      
        include/soc/nps/common.h: In function 'nps_host_reg_non_cl':
        include/soc/nps/common.h:148:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      
      As the driver is only used on ARC, we don't need to see it without
      COMPILE_TEST elsewhere, and we can avoid the warnings by only building
      on 32-bit architectures even with CONFIG_COMPILE_TEST.
      
      Acked-by: default avatarMarc Zyngier <narc.zyngier@arm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffd565e3
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · c04a5880
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
       "Highlights:
         - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V
         - Live patching support for ppc64le (also merged via livepatching.git)
      
        Various cleanups & minor fixes from:
         - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
           Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie,
           Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring,
           Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras,
           Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung
           Bauermann, Valentin Rothberg, Vipin K Parashar.
      
        General:
         - Update LMB associativity index during DLPAR add/remove from Nathan
           Fontenot
         - Fix branching to OOL handlers in relocatable kernel from Hari Bathini
         - Add support for userspace Power9 copy/paste from Chris Smart
         - Always use STRICT_MM_TYPECHECKS from Michael Ellerman
         - Add mask of possible MMU features from Michael Ellerman
      
        PCI:
         - Enable pass through of NVLink to guests from Alexey Kardashevskiy
         - Cleanups in preparation for powernv PCI hotplug from Gavin Shan
         - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan
         - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan
         - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
           from Guilherme G Piccoli
         - Remove the dependency on EEH struct in DDW mechanism from Guilherme
           G Piccoli
      
        selftests:
         - Test cp_abort during context switch from Chris Smart
         - Add several tests for transactional memory support from Rashmica
           Gupta
      
        perf:
         - Add support for sampling interrupt register state from Anju T
         - Add support for unwinding perf-stackdump from Chandan Kumar
      
        cxl:
         - Configure the PSL for two CAPI ports on POWER8NVL from Philippe
           Bergheaud
         - Allow initialization on timebase sync failures from Frederic Barrat
         - Increase timeout for detection of AFU mmio hang from Frederic
           Barrat
         - Handle num_of_processes larger than can fit in the SPA from Ian
           Munsie
         - Ensure PSL interrupt is configured for contexts with no AFU IRQs
           from Ian Munsie
         - Add kernel API to allow a context to operate with relocate disabled
           from Ian Munsie
         - Check periodically the coherent platform function's state from
           Christophe Lombard
      
        Freescale:
         - Updates from Scott: "Contains 86xx fixes, minor device tree fixes,
           an erratum workaround, and a kconfig dependency fix."
      
      * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits)
        powerpc/86xx: Fix PCI interrupt map definition
        powerpc/86xx: Move pci1 definition to the include file
        powerpc/fsl: Fix build of the dtb embedded kernel images
        powerpc/fsl: Fix rcpm compatible string
        powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC
        powerpc/fsl-pci: Add a workaround for PCI 5 errata
        powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb
        powerpc/powernv/npu: Add PE to PHB's list
        powerpc/powernv: Fix insufficient memory allocation
        powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
        Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
        powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner()
        powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover()
        powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
        powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
        Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()"
        powerpc/powernv/npu: Enable NVLink pass through
        powerpc/powernv/npu: Rework TCE Kill handling
        powerpc/powernv/npu: Add set/unset window helpers
        powerpc/powernv/ioda2: Export debug helper pe_level_printk()
        ...
      c04a5880
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · a1c28b75
      Linus Torvalds authored
      Pull ARM updates from Russell King:
       "Changes included in this pull request:
      
         - revert pxa2xx-flash back to using ioremap_cached() and switch
           memremap() to use arch_memremap_wb()
      
         - remove pci=firmware command line argument handling
      
         - remove unnecessary arm_dma_set_mask() implementation, the generic
           implementation will do for ARM
      
         - removal of the ARM kallsyms "hack" to work around mode switching
           veneers and vectors located below PAGE_OFFSET
      
         - tidy up build system output a little
      
         - add L2 cache power management DT bindings
      
         - remove duplicated local_irq_disable() in reboot paths
      
         - handle AMBA primecell devices better at registration time with PM
           domains (needed for Samsung SoCs)
      
         - ARM specific preparation to support Keystone II kexec"
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8567/1: cache-uniphier: activate ways for secondary CPUs
        ARM: 8570/2: Documentation: devicetree: Add PL310 PM bindings
        ARM: 8569/1: pl2x0: Add OF control of cache power management
        ARM: 8568/1: reboot: remove duplicated local_irq_disable()
        ARM: 8566/1: drivers: amba: properly handle devices with power domains
        ARM: provide arm_has_idmap_alias() helper
        ARM: kexec: remove 512MB restriction on kexec crashdump
        ARM: provide improved virt_to_idmap() functionality
        ARM: kexec: fix crashkernel= handling
        ARM: 8557/1: specify install, zinstall, and uinstall as PHONY targets
        ARM: 8562/1: suppress "include/generated/mach-types.h is up to date."
        ARM: 8553/1: kallsyms: remove --page-offset command line option
        ARM: 8552/1: kallsyms: remove special lower address limit for CONFIG_ARM
        ARM: 8555/1: kallsyms: ignore ARM mode switching veneers
        ARM: 8548/1: dma-mapping: remove arm_dma_set_mask()
        ARM: 8554/1: kernel: pci: remove pci=firmware command line parameter handling
        ARM: memremap: implement arch_memremap_wb()
        memremap: add arch specific hook for MEMREMAP_WB mappings
        mtd: pxa2xx-flash: switch back from memremap to ioremap_cached
        ARM: reintroduce ioremap_cached() for creating cached I/O mappings
      a1c28b75
  2. May 20, 2016
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · a05a70db
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - fsnotify fix
      
       - poll() timeout fix
      
       - a few scripts/ tweaks
      
       - debugobjects updates
      
       - the (small) ocfs2 queue
      
       - Minor fixes to kernel/padata.c
      
       - Maybe half of the MM queue
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
        mm, page_alloc: restore the original nodemask if the fast path allocation failed
        mm, page_alloc: uninline the bad page part of check_new_page()
        mm, page_alloc: don't duplicate code in free_pcp_prepare
        mm, page_alloc: defer debugging checks of pages allocated from the PCP
        mm, page_alloc: defer debugging checks of freed pages until a PCP drain
        cpuset: use static key better and convert to new API
        mm, page_alloc: inline pageblock lookup in page free fast paths
        mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
        mm, page_alloc: pull out side effects from free_pages_check
        mm, page_alloc: un-inline the bad part of free_pages_check
        mm, page_alloc: check multiple page fields with a single branch
        mm, page_alloc: remove field from alloc_context
        mm, page_alloc: avoid looking up the first zone in a zonelist twice
        mm, page_alloc: shortcut watermark checks for order-0 pages
        mm, page_alloc: reduce cost of fair zone allocation policy retry
        mm, page_alloc: shorten the page allocator fast path
        mm, page_alloc: check once if a zone has isolated pageblocks
        mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
        mm, page_alloc: simplify last cpupid reset
        mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
        ...
      a05a70db
    • Mel Gorman's avatar
      mm, page_alloc: restore the original nodemask if the fast path allocation failed · 4741526b
      Mel Gorman authored
      The page allocator fast path uses either the requested nodemask or
      cpuset_current_mems_allowed if cpusets are enabled.  If the allocation
      context allows watermarks to be ignored then it can also ignore memory
      policies.  However, on entering the allocator slowpath the nodemask may
      still be cpuset_current_mems_allowed and the policies are enforced.
      This patch resets the nodemask appropriately before entering the
      slowpath.
      
      Link: http://lkml.kernel.org/r/20160504143628.GU2858@techsingularity.net
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4741526b
    • Vlastimil Babka's avatar
      mm, page_alloc: uninline the bad page part of check_new_page() · 4e611801
      Vlastimil Babka authored
      
      
      Bad pages should be rare so the code handling them doesn't need to be
      inline for performance reasons.  Put it to separate function which
      returns void.  This also assumes that the initial page_expected_state()
      result will match the result of the thorough check, i.e.  the page
      doesn't become "good" in the meanwhile.  This matches the same
      expectations already in place in free_pages_check().
      
      !DEBUG_VM bloat-o-meter:
      
        add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
        function                                     old     new   delta
        check_new_page_bad                             -     134    +134
        get_page_from_freelist                      3468    3194    -274
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e611801
    • Mel Gorman's avatar
      mm, page_alloc: don't duplicate code in free_pcp_prepare · e2769dbd
      Mel Gorman authored
      
      
      The new free_pcp_prepare() function shares a lot of code with
      free_pages_prepare(), which makes this a maintenance risk when some
      future patch modifies only one of them.  We should be able to achieve
      the same effect (skipping free_pages_check() from !DEBUG_VM configs) by
      adding a parameter to free_pages_prepare() and making it inline, so the
      checks (and the order != 0 parts) are eliminated from the call from
      free_pcp_prepare().
      
      !DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already
      inlining free_pages_prepare() and the elimination seems to work as
      expected
      
      DEBUG_VM bloat-o-meter:
      
        add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
        function                                     old     new   delta
        __free_pages_ok                              297    1060    +763
        free_hot_cold_page                           480     752    +272
        free_pages_prepare                           778       -    -778
      
      Here inlining didn't occur before, and added some code, but it's ok for
      a debug option.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2769dbd
    • Mel Gorman's avatar
      mm, page_alloc: defer debugging checks of pages allocated from the PCP · 479f854a
      Mel Gorman authored
      
      
      Every page allocated checks a number of page fields for validity.  This
      catches corruption bugs of pages that are already freed but it is
      expensive.  This patch weakens the debugging check by checking PCP pages
      only when the PCP lists are being refilled.  All compound pages are
      checked.  This potentially avoids debugging checks entirely if the PCP
      lists are never emptied and refilled so some corruption issues may be
      missed.  Full checking requires DEBUG_VM.
      
      With the two deferred debugging patches applied, the impact to a page
      allocator microbenchmark is
      
                                                   4.6.0-rc3                  4.6.0-rc3
                                                 inline-v3r6            deferalloc-v3r7
        Min      alloc-odr0-1               344.00 (  0.00%)           317.00 (  7.85%)
        Min      alloc-odr0-2               248.00 (  0.00%)           231.00 (  6.85%)
        Min      alloc-odr0-4               209.00 (  0.00%)           192.00 (  8.13%)
        Min      alloc-odr0-8               181.00 (  0.00%)           166.00 (  8.29%)
        Min      alloc-odr0-16              168.00 (  0.00%)           154.00 (  8.33%)
        Min      alloc-odr0-32              161.00 (  0.00%)           148.00 (  8.07%)
        Min      alloc-odr0-64              158.00 (  0.00%)           145.00 (  8.23%)
        Min      alloc-odr0-128             156.00 (  0.00%)           143.00 (  8.33%)
        Min      alloc-odr0-256             168.00 (  0.00%)           154.00 (  8.33%)
        Min      alloc-odr0-512             178.00 (  0.00%)           167.00 (  6.18%)
        Min      alloc-odr0-1024            186.00 (  0.00%)           174.00 (  6.45%)
        Min      alloc-odr0-2048            192.00 (  0.00%)           180.00 (  6.25%)
        Min      alloc-odr0-4096            198.00 (  0.00%)           184.00 (  7.07%)
        Min      alloc-odr0-8192            200.00 (  0.00%)           188.00 (  6.00%)
        Min      alloc-odr0-16384           201.00 (  0.00%)           188.00 (  6.47%)
        Min      free-odr0-1                189.00 (  0.00%)           180.00 (  4.76%)
        Min      free-odr0-2                132.00 (  0.00%)           126.00 (  4.55%)
        Min      free-odr0-4                104.00 (  0.00%)            99.00 (  4.81%)
        Min      free-odr0-8                 90.00 (  0.00%)            85.00 (  5.56%)
        Min      free-odr0-16                84.00 (  0.00%)            80.00 (  4.76%)
        Min      free-odr0-32                80.00 (  0.00%)            76.00 (  5.00%)
        Min      free-odr0-64                78.00 (  0.00%)            74.00 (  5.13%)
        Min      free-odr0-128               77.00 (  0.00%)            73.00 (  5.19%)
        Min      free-odr0-256               94.00 (  0.00%)            91.00 (  3.19%)
        Min      free-odr0-512              108.00 (  0.00%)           112.00 ( -3.70%)
        Min      free-odr0-1024             115.00 (  0.00%)           118.00 ( -2.61%)
        Min      free-odr0-2048             120.00 (  0.00%)           125.00 ( -4.17%)
        Min      free-odr0-4096             123.00 (  0.00%)           129.00 ( -4.88%)
        Min      free-odr0-8192             126.00 (  0.00%)           130.00 ( -3.17%)
        Min      free-odr0-16384            126.00 (  0.00%)           131.00 ( -3.97%)
      
      Note that the free paths for large numbers of pages is impacted as the
      debugging cost gets shifted into that path when the page data is no
      longer necessarily cache-hot.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      479f854a
    • Mel Gorman's avatar
      mm, page_alloc: defer debugging checks of freed pages until a PCP drain · 4db7548c
      Mel Gorman authored
      
      
      Every page free checks a number of page fields for validity.  This
      catches premature frees and corruptions but it is also expensive.  This
      patch weakens the debugging check by checking PCP pages at the time they
      are drained from the PCP list.  This will trigger the bug but the site
      that freed the corrupt page will be lost.  To get the full context, a
      kernel rebuild with DEBUG_VM is necessary.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4db7548c
    • Vlastimil Babka's avatar
      cpuset: use static key better and convert to new API · 002f2906
      Vlastimil Babka authored
      
      
      An important function for cpusets is cpuset_node_allowed(), which
      optimizes on the fact if there's a single root CPU set, it must be
      trivially allowed.  But the check "nr_cpusets() <= 1" doesn't use the
      cpusets_enabled_key static key the right way where static keys eliminate
      branching overhead with jump labels.
      
      This patch converts it so that static key is used properly.  It's also
      switched to the new static key API and the checking functions are
      converted to return bool instead of int.  We also provide a new variant
      __cpuset_zone_allowed() which expects that the static key check was
      already done and they key was enabled.  This is needed for
      get_page_from_freelist() where we want to also avoid the relatively
      slower check when ALLOC_CPUSET is not set in alloc_flags.
      
      The impact on the page allocator microbenchmark is less than expected
      but the cleanup in itself is worthwhile.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                             multcheck-v1r20               cpuset-v1r20
        Min      alloc-odr0-1               348.00 (  0.00%)           348.00 (  0.00%)
        Min      alloc-odr0-2               254.00 (  0.00%)           254.00 (  0.00%)
        Min      alloc-odr0-4               213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-16              173.00 (  0.00%)           171.00 (  1.16%)
        Min      alloc-odr0-32              166.00 (  0.00%)           163.00 (  1.81%)
        Min      alloc-odr0-64              162.00 (  0.00%)           159.00 (  1.85%)
        Min      alloc-odr0-128             160.00 (  0.00%)           157.00 (  1.88%)
        Min      alloc-odr0-256             169.00 (  0.00%)           166.00 (  1.78%)
        Min      alloc-odr0-512             180.00 (  0.00%)           180.00 (  0.00%)
        Min      alloc-odr0-1024            188.00 (  0.00%)           187.00 (  0.53%)
        Min      alloc-odr0-2048            194.00 (  0.00%)           193.00 (  0.52%)
        Min      alloc-odr0-4096            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-8192            202.00 (  0.00%)           201.00 (  0.50%)
        Min      alloc-odr0-16384           203.00 (  0.00%)           202.00 (  0.49%)
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarZefan Li <lizefan@huawei.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      002f2906
    • Mel Gorman's avatar
      mm, page_alloc: inline pageblock lookup in page free fast paths · 0b423ca2
      Mel Gorman authored
      
      
      The function call overhead of get_pfnblock_flags_mask() is measurable in
      the page free paths.  This patch uses an inlined version that is faster.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b423ca2
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary variable from free_pcppages_bulk · e5b31ac2
      Mel Gorman authored
      
      
      The original count is never reused so it can be removed.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5b31ac2
    • Mel Gorman's avatar
      mm, page_alloc: pull out side effects from free_pages_check · da838d4f
      Mel Gorman authored
      
      
      Check without side-effects should be easier to maintain.  It also
      removes the duplicated cpupid and flags reset done in !DEBUG_VM variant
      of both free_pcp_prepare() and then bulkfree_pcp_prepare().  Finally, it
      enables the next patch.
      
      It shouldn't result in new branches, thanks to inlining of the check.
      
      !DEBUG_VM bloat-o-meter:
      
        add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
        function                                     old     new   delta
        __free_pages_ok                              748     739      -9
        free_pcppages_bulk                          1403    1385     -18
      
      DEBUG_VM:
      
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
        function                                     old     new   delta
        free_pages_prepare                           806     778     -28
      
      This is also slightly faster because cpupid information is not set on
      tail pages so we can avoid resets there.
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da838d4f
    • Mel Gorman's avatar
      mm, page_alloc: un-inline the bad part of free_pages_check · bb552ac6
      Mel Gorman authored
      
      
      From: Vlastimil Babka <vbabka@suse.cz>
      
      !DEBUG_VM size and bloat-o-meter:
      
        add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pcppages_bulk                          1288    1171    -117
        __free_pages_ok                              948     695    -253
      
      DEBUG_VM:
      
        add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pages_prepare                          1112     898    -214
      
      [akpm@linux-foundation.org: fix whitespace]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb552ac6
    • Mel Gorman's avatar
      mm, page_alloc: check multiple page fields with a single branch · 7bfec6f4
      Mel Gorman authored
      
      
      Every page allocated or freed is checked for sanity to avoid corruptions
      that are difficult to detect later.  A bad page could be due to a number
      of fields.  Instead of using multiple branches, this patch combines
      multiple fields into a single branch.  A detailed check is only
      necessary if that check fails.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              initonce-v1r20            multcheck-v1r20
        Min      alloc-odr0-1               359.00 (  0.00%)           348.00 (  3.06%)
        Min      alloc-odr0-2               260.00 (  0.00%)           254.00 (  2.31%)
        Min      alloc-odr0-4               214.00 (  0.00%)           213.00 (  0.47%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           166.00 ( -0.61%)
        Min      alloc-odr0-64              162.00 (  0.00%)           162.00 (  0.00%)
        Min      alloc-odr0-128             161.00 (  0.00%)           160.00 (  0.62%)
        Min      alloc-odr0-256             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-512             181.00 (  0.00%)           180.00 (  0.55%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           188.00 (  1.05%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           194.00 (  1.02%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           199.00 (  1.49%)
        Min      alloc-odr0-8192            205.00 (  0.00%)           202.00 (  1.46%)
        Min      alloc-odr0-16384           205.00 (  0.00%)           203.00 (  0.98%)
      
      Again, the benefit is marginal but avoiding excessive branches is
      important.  Ideally the paths would not have to check these conditions
      at all but regrettably abandoning the tests would make use-after-free
      bugs much harder to detect.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bfec6f4
    • Mel Gorman's avatar
      mm, page_alloc: remove field from alloc_context · 93ea9964
      Mel Gorman authored
      
      
      The classzone_idx can be inferred from preferred_zoneref so remove the
      unnecessary field and save stack space.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93ea9964
    • Mel Gorman's avatar
      mm, page_alloc: avoid looking up the first zone in a zonelist twice · c33d6c06
      Mel Gorman authored
      
      
      The allocator fast path looks up the first usable zone in a zonelist and
      then get_page_from_freelist does the same job in the zonelist iterator.
      This patch preserves the necessary information.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              fastmark-v1r20             initonce-v1r20
        Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
        Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
        Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
        Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
        Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
        Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
        Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
        Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
        Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)
      
      The benefit is negligible and the results are within the noise but each
      cycle counts.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c33d6c06
    • Mel Gorman's avatar
      mm, page_alloc: shortcut watermark checks for order-0 pages · 48ee5f36
      Mel Gorman authored
      
      
      Watermarks have to be checked on every allocation including the number
      of pages being allocated and whether reserves can be accessed.  The
      reserves only matter if memory is limited and the free_pages adjustment
      only applies to high-order pages.  This patch adds a shortcut for
      order-0 pages that avoids numerous calculations if there is plenty of
      free memory yielding the following performance difference in a page
      allocator microbenchmark;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optfair-v1r20             fastmark-v1r20
        Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
        Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
        Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
        Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
        Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
        Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
        Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
        Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
        Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
        Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
        Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
        Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48ee5f36
    • Mel Gorman's avatar
      mm, page_alloc: reduce cost of fair zone allocation policy retry · 30534755
      Mel Gorman authored
      
      
      The fair zone allocation policy is not without cost but it can be
      reduced slightly.  This patch removes an unnecessary local variable,
      checks the likely conditions of the fair zone policy first, uses a bool
      instead of a flags check and falls through when a remote node is
      encountered instead of doing a full restart.  The benefit is marginal
      but it's there
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20              optfair-v1r20
        Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
        Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
        Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
        Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
        Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
        Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
        Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
        Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
        Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
        Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)
      
      The benefit is marginal at best but one of the most important benefits,
      avoiding a second search when falling back to another node is not
      triggered by this particular test so the benefit for some corner cases
      is understated.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30534755
    • Mel Gorman's avatar
      mm, page_alloc: shorten the page allocator fast path · 4fcb0971
      Mel Gorman authored
      
      
      The page allocator fast path checks page multiple times unnecessarily.
      This patch avoids all the slowpath checks if the first allocation
      attempt succeeds.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4fcb0971
    • Mel Gorman's avatar
      mm, page_alloc: check once if a zone has isolated pageblocks · 3777999d
      Mel Gorman authored
      
      
      When bulk freeing pages from the per-cpu lists the zone is checked for
      isolated pageblocks on every release.  This patch checks it once per
      drain.
      
      [mgorman@techsingularity.net: fix locking radce, per Vlastimil]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3777999d
    • Mel Gorman's avatar
      mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath · 83d4ca81
      Mel Gorman authored
      
      
      __GFP_HARDWALL only has meaning in the context of cpusets but the fast
      path always applies the flag on the first attempt.  Move the
      manipulations into the cpuset paths where they will be masked by a
      static branch in the common case.
      
      With the other micro-optimisations in this series combined, the impact
      on a page allocator microbenchmark is
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20                micro-v1r20
        Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
        Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
        Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
        Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
        Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
        Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
        Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
        Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
        Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
        Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
        Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
        Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
        Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
        Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83d4ca81
    • Mel Gorman's avatar
      mm, page_alloc: simplify last cpupid reset · 09940a4f
      Mel Gorman authored
      
      
      The current reset unnecessarily clears flags and makes pointless
      calculations.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09940a4f
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() · 5bb1b169
      Mel Gorman authored
      
      
      page is guaranteed to be set before it is read with or without the
      initialisation.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5bb1b169
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist · be06af00
      Mel Gorman authored
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be06af00
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary local variable in get_page_from_freelist · 4dfa6cd8
      Mel Gorman authored
      
      
      zonelist here is a copy of a struct field that is used once.  Ditch it.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4dfa6cd8
    • Mel Gorman's avatar
      mm, page_alloc: convert nr_fair_skipped to bool · fa379b95
      Mel Gorman authored
      
      
      The number of zones skipped to a zone expiring its fair zone allocation
      quota is irrelevant.  Convert to bool.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa379b95
    • Mel Gorman's avatar
      mm, page_alloc: convert alloc_flags to unsigned · c603844b
      Mel Gorman authored
      
      
      alloc_flags is a bitmask of flags but it is signed which does not
      necessarily generate the best code depending on the compiler.  Even
      without an impact, it makes more sense that this be unsigned.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c603844b
    • Mel Gorman's avatar
      mm, page_alloc: avoid unnecessary zone lookups during pageblock operations · f75fb889
      Mel Gorman authored
      
      
      Pageblocks have an associated bitmap to store migrate types and whether
      the pageblock should be skipped during compaction.  The bitmap may be
      associated with a memory section or a zone but the zone is looked up
      unconditionally.  The compiler should optimise this away automatically
      so this is a cosmetic patch only in many cases.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f75fb889
    • Mel Gorman's avatar
      mm, page_alloc: use __dec_zone_state for order-0 page allocation · 754078eb
      Mel Gorman authored
      
      
      __dec_zone_state is cheaper to use for removing an order-0 page as it
      has fewer conditions to check.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optiter-v1r20              decstat-v1r20
        Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
        Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
        Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
        Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
        Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
        Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
        Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
        Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
        Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
        Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
        Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
        Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
        Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
        Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      754078eb
    • Mel Gorman's avatar
      mm, page_alloc: inline the fast path of the zonelist iterator · 682a3385
      Mel Gorman authored
      
      
      The page allocator iterates through a zonelist for zones that match the
      addressing limitations and nodemask of the caller but many allocations
      will not be restricted.  Despite this, there is always functional call
      overhead which builds up.
      
      This patch inlines the optimistic basic case and only calls the iterator
      function for the complex case.  A hindrance was the fact that
      cpuset_current_mems_allowed is used in the fastpath as the allowed
      nodemask even though all nodes are allowed on most systems.  The patch
      handles this by only considering cpuset_current_mems_allowed if a cpuset
      exists.  As well as being faster in the fast-path, this removes some
      junk in the slowpath.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statinline-v1r20              optiter-v1r20
        Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
        Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
        Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
        Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
        Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
        Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
        Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
        Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
        Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
        Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
        Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
        Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
        Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
        Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
        Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)
      
      perf indicated that next_zones_zonelist disappeared in the profile and
      __next_zones_zonelist did not appear.  This is expected as the
      micro-benchmark would hit the inlined fast-path every time.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      682a3385
    • Mel Gorman's avatar
      mm, page_alloc: inline zone_statistics · 060e7417
      Mel Gorman authored
      
      
      zone_statistics has one call-site but it's a public function.  Make it
      static and inline.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statbranch-v1r20           statinline-v1r20
        Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
        Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
        Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
        Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
        Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
        Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
        Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
        Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
        Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
        Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
        Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
        Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
        Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
        Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      060e7417
    • Mel Gorman's avatar
      mm, page_alloc: reduce branches in zone_statistics · b9f00e14
      Mel Gorman authored
      
      
      zone_statistics has more branches than it really needs to take an
      unlikely GFP flag into account.  Reduce the number and annotate the
      unlikely flag.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            nocompound-v1r10           statbranch-v1r10
        Min      alloc-odr0-1               417.00 (  0.00%)           419.00 ( -0.48%)
        Min      alloc-odr0-2               308.00 (  0.00%)           305.00 (  0.97%)
        Min      alloc-odr0-4               253.00 (  0.00%)           250.00 (  1.19%)
        Min      alloc-odr0-8               221.00 (  0.00%)           219.00 (  0.90%)
        Min      alloc-odr0-16              205.00 (  0.00%)           203.00 (  0.98%)
        Min      alloc-odr0-32              199.00 (  0.00%)           195.00 (  2.01%)
        Min      alloc-odr0-64              193.00 (  0.00%)           191.00 (  1.04%)
        Min      alloc-odr0-128             191.00 (  0.00%)           189.00 (  1.05%)
        Min      alloc-odr0-256             200.00 (  0.00%)           198.00 (  1.00%)
        Min      alloc-odr0-512             212.00 (  0.00%)           210.00 (  0.94%)
        Min      alloc-odr0-1024            219.00 (  0.00%)           216.00 (  1.37%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           221.00 (  1.78%)
        Min      alloc-odr0-4096            231.00 (  0.00%)           227.00 (  1.73%)
        Min      alloc-odr0-8192            234.00 (  0.00%)           232.00 (  0.85%)
        Min      alloc-odr0-16384           234.00 (  0.00%)           232.00 (  0.85%)
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9f00e14
    • Mel Gorman's avatar
      mm, page_alloc: use new PageAnonHead helper in the free page fast path · 17514574
      Mel Gorman authored
      
      
      The PageAnon check always checks for compound_head but this is a
      relatively expensive check if the caller already knows the page is a
      head page.  This patch creates a helper and uses it in the page free
      path which only operates on head pages.
      
      With this patch and "Only check PageCompound for high-order pages", the
      performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                                     vanilla           nocompound-v1r20
        Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
        Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
        Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
        Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
        Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
        Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
        Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
        Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
        Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
        Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
        Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
        Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
        Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
        Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
        Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
        Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
        Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
        Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
        Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
        Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
        Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
        Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
        Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
        Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
        Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
        Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
        Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
        Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
        Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)
      
      There is a sizable boost to the free allocator performance.  While there
      is an apparent boost on the allocation side, it's likely a co-incidence
      or due to the patches slightly reducing cache footprint.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17514574