Skip to content
  1. Jul 28, 2016
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs · 0e6acf02
      Linus Torvalds authored
      Pull xfs updates from Dave Chinner:
       "The major addition is the new iomap based block mapping
        infrastructure.  We've been kicking this about locally for years, but
        there are other filesystems want to use it too (e.g. gfs2).  Now it
        is fully working, reviewed and ready for merge and be used by other
        filesystems.
      
        There are a lot of other fixes and cleanups in the tree, but those are
        XFS internal things and none are of the scale or visibility of the
        iomap changes.  See below for details.
      
        I am likely to send another pull request next week - we're just about
        ready to merge some new functionality (on disk block->owner reverse
        mapping infrastructure), but that's a huge chunk of code (74 files
        changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that
        separate to all the "normal" pull request changes so they don't get
        lost in the noise.
      
        Summary of changes in this update:
         - generic iomap based IO path infrastructure
         - generic iomap based fiemap implementation
         - xfs iomap based Io path implementation
         - buffer error handling fixes
         - tracking of in flight buffer IO for unmount serialisation
         - direct IO and DAX io path separation and simplification
         - shortform directory format definition changes for wider platform
           compatibility
         - various buffer cache fixes
         - cleanups in preparation for rmap merge
         - error injection cleanups and fixes
         - log item format buffer memory allocation restructuring to prevent
           rare OOM reclaim deadlocks
         - sparse inode chunks are now fully supported"
      
      * tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits)
        xfs: remove EXPERIMENTAL tag from sparse inode feature
        xfs: bufferhead chains are invalid after end_page_writeback
        xfs: allocate log vector buffers outside CIL context lock
        libxfs: directory node splitting does not have an extra block
        xfs: remove dax code from object file when disabled
        xfs: skip dirty pages in ->releasepage()
        xfs: remove __arch_pack
        xfs: kill xfs_dir2_inou_t
        xfs: kill xfs_dir2_sf_off_t
        xfs: split direct I/O and DAX path
        xfs: direct calls in the direct I/O path
        xfs: stop using generic_file_read_iter for direct I/O
        xfs: split xfs_file_read_iter into buffered and direct I/O helpers
        xfs: remove s_maxbytes enforcement in xfs_file_read_iter
        xfs: kill ioflags
        xfs: don't pass ioflags around in the ioctl path
        xfs: track and serialize in-flight async buffers against unmount
        xfs: exclude never-released buffers from buftarg I/O accounting
        xfs: don't reset b_retries to 0 on every failure
        xfs: remove extraneous buffer flag changes
        ...
      0e6acf02
  2. Jul 27, 2016
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 0e06f5c0
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - a few misc bits
      
       - ocfs2
      
       - most(?) of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
        thp: fix comments of __pmd_trans_huge_lock()
        cgroup: remove unnecessary 0 check from css_from_id()
        cgroup: fix idr leak for the first cgroup root
        mm: memcontrol: fix documentation for compound parameter
        mm: memcontrol: remove BUG_ON in uncharge_list
        mm: fix build warnings in <linux/compaction.h>
        mm, thp: convert from optimistic swapin collapsing to conservative
        mm, thp: fix comment inconsistency for swapin readahead functions
        thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
        shmem: split huge pages beyond i_size under memory pressure
        thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
        khugepaged: add support of collapse for tmpfs/shmem pages
        shmem: make shmem_inode_info::lock irq-safe
        khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
        thp: extract khugepaged from mm/huge_memory.c
        shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
        shmem: add huge pages support
        shmem: get_unmapped_area align huge page
        shmem: prepare huge= mount option and sysfs knob
        mm, rmap: account shmem thp pages
        ...
      0e06f5c0
    • Linus Torvalds's avatar
      Merge tag 'for-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · f7816ad0
      Linus Torvalds authored
      Pull power supply and reset updates from Sebastian Reichel:
       - introduce reboot mode driver
       - add DT support to max8903
       - add power supply support for axp221
       - misc fixes
      
      * tag 'for-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
        power: reset: add reboot mode driver
        dt-bindings: power: reset: add document for reboot-mode driver
        power_supply: fix return value of get_property
        power: qcom_smbb: Make an extcon for usb cable detection
        max8903: adds support for initiation via device tree
        max8903: adds documentation for device tree bindings.
        max8903: remove unnecessary 'out of memory' error message.
        max8903: removes non zero validity checks on gpios.
        max8903: adds requesting of gpios.
        max8903: cleans up confusing relationship between dc_valid, dok and dcm.
        max8903: store pointer to pdata instead of copying it.
        power_supply: bq27xxx_battery: Group register mappings into one table
        docs: Move brcm,bcm21664-resetmgr.txt
        power/reset: make syscon_poweroff() static
        power: axp20x_usb: Add support for usb power-supply on axp22x pmics
        power_supply: bq27xxx_battery: Index register numbers by enum
        power_supply: bq27xxx_battery: Fix copy/paste error in header comment
        MAINTAINERS: Add file patterns for power supply device tree bindings
        power: reset: keystone: Enable COMPILE_TEST
      f7816ad0
    • Linus Torvalds's avatar
      Merge tag 'regulator-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 6097d55e
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "A quiet regulator API release, a few new drivers and some fixes but
        nothing too notable.  There will also be some updates for the PWM
        regulator coming through the PWM tree which provide much smoother
        operation when taking over an already running PWM regulator after boot
        using some new PWM APIs.
      
        Summary:
      
         - Support for configuration of the initial suspend state from DT.
      
         - New drivers for Mediatek MT6323, Ricoh RN5T567 and X-Powers AXP809"
      
      * tag 'regulator-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (38 commits)
        regulator: da9053/52: Fix incorrectly stated minimum and maximum voltage limits
        regulator: mt6323: Constify struct regulator_ops
        regulator: mt6323: Fix module description
        regulator: mt6323: Add support for MT6323 regulator
        regulator: Add document for MT6323 regulator
        regulator: da9210: addition of device tree support
        regulator: act8865: Fix missing of_node_put() in act8865_pdata_from_dt()
        regulator: qcom_smd: Avoid overlapping linear voltage ranges
        regulator: s2mps11: Fix the voltage linear range for s2mps15
        regulator: pwm: Fix regulator ramp delay for continuous mode
        regulator: da9211: add descriptions for da9212/da9214
        mfd: rn5t618: Register restart handler
        mfd: rn5t618: Register power off callback optionally
        regulator: rn5t618: Add RN5T567 PMIC support
        mfd: rn5t618: Add Ricoh RN5T567 PMIC support
        ARM: dts: meson: minix-neo-x8: define PMIC as power controller
        regulator: tps65218: force set power-up/down strobe to 3 for dcdc3
        regulator: tps65218: Enable suspend configuration
        regulator: tps65217: Enable suspend configuration
        regulator: qcom_spmi: Add support for get_mode/set_mode on switches
        ...
      6097d55e
    • Linus Torvalds's avatar
      Merge tag 'regmap-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · ae979997
      Linus Torvalds authored
      Pull regmap updates from Mark Brown:
       "Several small updates and API enhancements:
      
         - provide transparent unrolling of bulk writes into individual writes
           so they can be used with devices without raw formatting.
      
         - fix compatibility between I2C controllers supporting block commands
           and devices with more than 8 bit wide registers.
      
         - add some helpers for iopoll-like functionality and workarounds for
           weird interrupt controllers"
      
      * tag 'regmap-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: add iopoll-like polling macro
        regmap: Support bulk writes for devices without raw formatting
        regmap-i2c: Use i2c block command only if register value width is 8 bit
        regmap: irq: Add support to call client specific pre/post interrupt service
        regmap: Add file patterns for regmap device tree bindings
      ae979997
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1cd04d29
      Linus Torvalds authored
      Pull GPIO updates from Linus Walleij:
       "This is the bulk of GPIO changes for the v4.8 kernel cycle.  The big
        news is the completion of the chardev ABI which I'm very happy about
        and apart from that it's an ordinary, quite busy cycle.  The details
        are below.
      
        The patches are tested in linux-next for some time, patches to other
        subsystem mostly have ACKs.
      
        I got overly ambitious with configureing lines as input for IRQ lines
        but it turns out that some controllers have their interrupt-enable and
        input-enabling in orthogonal settings so the assumption that all IRQ
        lines are input lines does not hold.  Oh well, revert and back to the
        drawing board with that.
      
        Core changes:
      
         - The big item is of course the completion of the character device
           ABI.  It has now replaced and surpassed the former unmaintainable
           sysfs ABI: we can now hammer (bitbang) individual lines or sets of
           lines and read individual lines or sets of lines from userspace,
           and we can also register to listen to GPIO events from userspace.
      
           As a tie-in we have two new tools in tools/gpio: gpio-hammer and
           gpio-event-mon that illustrate the proper use of the new ABI.  As
           someone said: the wild west days of GPIO are now over.
      
         - Continued to remove the pointless ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
           Kconfig symbols.  I'm patching hexagon, openrisc, powerpc, sh,
           unicore, ia64 and microblaze.  These are either ACKed by their
           maintainers or patched anyways after a grace period and no response
           from maintainers.
      
           Some archs (ARM) come in from their trees, and others (x86) are
           still not fixed, so I might send a second pull request to root it
           out later in this merge window, or just defer to v4.9.
      
         - The GPIO tools are moved to the tools build system.
      
        New drivers:
      
         - New driver for the MAX77620/MAX20024.
      
         - New driver for the Intel Merrifield.
      
         - Enabled PCA953x for the TI PCA9536.
      
         - Enabled PCA953x for the Intel Edison.
      
         - Enabled R8A7792 in the RCAR driver.
      
        Driver improvements:
      
         - The STMPE and F7188x now supports the .get_direction() callback.
      
         - The Xilinx driver supports setting multiple lines at once.
      
         - ACPI support for the Vulcan GPIO controller.
      
         - The MMIO GPIO driver supports device tree probing.
      
         - The Acer One 10 is supported through the _DEP ACPI attribute.
      
        Cleanups:
      
         - A major cleanup of the OF/DT support code.  It is way easier to
           read and understand now, probably this improves performance too.
      
         - Drop a few redundant .owner assignments.
      
         - Remove CLPS711x boardfile support: we are 100% DT"
      
      * tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (67 commits)
        MAINTAINERS: Add INTEL MERRIFIELD GPIO entry
        gpio: dwapb: add missing fwnode_handle_put() in dwapb_gpio_get_pdata()
        gpio: merrifield: Protect irq_ack() and gpio_set() by lock
        gpio: merrifield: Introduce GPIO driver to support Merrifield
        gpio: intel-mid: Make it depend to X86_INTEL_MID
        gpio: intel-mid: Sort header block alphabetically
        gpio: intel-mid: Remove potentially harmful code
        gpio: rcar: add R8A7792 support
        gpiolib: remove duplicated include from gpiolib.c
        Revert "gpio: convince line to become input in irq helper"
        gpiolib: of_find_gpio(): Don't discard errors
        gpio: of: Allow overriding the device node
        gpio: free handles in fringe cases
        gpio: tps65218: Add platform_device_id table
        gpio: max77620: get gpio value based on direction
        gpio: lynxpoint: avoid potential warning on error path
        tools/gpio: add install section
        tools/gpio: move to tools buildsystem
        gpio: intel-mid: switch to devm_gpiochip_add_data()
        gpio: 74x164: Use spi_write() helper instead of open coding
        ...
      1cd04d29
    • Linus Torvalds's avatar
      Merge tag 'media/v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 9c1958fc
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - new framework support for HDMI CEC and remote control support
      
       - new encoding codec driver for Mediatek SoC
      
       - new frontend driver: helene tuner
      
       - added support for NetUp almost universal devices, with supports
         DVB-C/S/S2/T/T2 and ISDB-T
      
       - the mn88472 frontend driver got promoted from staging
      
       - a new driver for RCar video input
      
       - some soc_camera legacy drivers got removed: timb, omap1, mx2, mx3
      
       - lots of driver cleanups, improvements and fixups
      
      * tag 'media/v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (377 commits)
        [media] cec: always check all_device_types and features
        [media] cec: poll should check if there is room in the tx queue
        [media] vivid: support monitor all mode
        [media] cec: fix test for unconfigured adapter in main message loop
        [media] cec: limit the size of the transmit queue
        [media] cec: zero unused msg part after msg->len
        [media] cec: don't set fh to NULL in CEC_TRANSMIT
        [media] cec: clear all status fields before transmit and always fill in sequence
        [media] cec: CEC_RECEIVE overwrote the timeout field
        [media] cxd2841er: Reading SNR for DVB-C added
        [media] cxd2841er: Reading BER and UCB for DVB-C added
        [media] cxd2841er: fix switch-case for DVB-C
        [media] cxd2841er: fix signal strength scale for ISDB-T
        [media] cxd2841er: adjust the dB scale for DVB-C
        [media] cxd2841er: provide signal strength for DVB-C
        [media] cxd2841er: fix BER report via DVBv5 stats API
        [media] mb86a20s: apply mask to val after checking for read failure
        [media] airspy: fix error logic during device register
        [media] s5p-cec/TODO: add TODO item
        [media] cec/TODO: drop comment about sphinx documentation
        ...
      9c1958fc
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1b3fc0be
      Linus Torvalds authored
      Pull pstore subsystem updates from Kees Cook:
       "This expands the supported compressors, fixes some bugs, and finally
        adds DT bindings"
      
      * tag 'pstore-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore/ram: add Device Tree bindings
        efi-pstore: implement efivars_pstore_exit()
        pstore: drop file opened reference count
        pstore: add lzo/lz4 compression support
        pstore: Cleanup pstore_dump()
        pstore: Enable compression on normal path (again)
        ramoops: Only unregister when registered
      1b3fc0be
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.8-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · d31dcd92
      Linus Torvalds authored
      Pull orangefs updates from Mike Mashall:
       "Orangefs cleanups and enablement of O_DIRECT in open.
      
        Cleanups:
      
         - remove some unused defines, and also some obfuscatory ones.
      
         - remove a redundant xattr handler.
      
         - Remove useless xattr prefix arguments.
      
         - Be more picky about uid and gid handling WRT namespaces.
      
           Our use of current_user_ns() instead of init_user_ns left open the
           possibility that users could spoof their uids or gids when the
           server was running in a different namespace in "default security"
           mode.
      
         - Allow open(2) to succeed with O_DIRECT"
      
      * tag 'for-linus-4.8-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: fix namespace handling
        Orangefs: allow O_DIRECT in open
        orangefs: Remove useless xattr prefix arguments
        orangefs: Remove redundant "trusted." xattr handler
        orangefs: Remove useless defines
      d31dcd92
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 396d1099
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "The major change this cycle is deleting ext4's copy of the file system
        encryption code and switching things over to using the copies in
        fs/crypto.  I've updated the MAINTAINERS file to add an entry for
        fs/crypto listing Jaeguk Kim and myself as the maintainers.
      
        There are also a number of bug fixes, most notably for some problems
        found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum.  Also
        fixed is a writeback deadlock detected by generic/130, and some
        potential races in the metadata checksum code"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
        ext4: verify extent header depth
        ext4: short-cut orphan cleanup on error
        ext4: fix reference counting bug on block allocation error
        MAINTAINRES: fs-crypto maintainers update
        ext4 crypto: migrate into vfs's crypto engine
        ext2: fix filesystem deadlock while reading corrupted xattr block
        ext4: fix project quota accounting without quota limits enabled
        ext4: validate s_reserved_gdt_blocks on mount
        ext4: remove unused page_idx
        ext4: don't call ext4_should_journal_data() on the journal inode
        ext4: Fix WARN_ON_ONCE in ext4_commit_super()
        ext4: fix deadlock during page writeback
        ext4: correct error value of function verifying dx checksum
        ext4: avoid modifying checksum fields directly during checksum verification
        ext4: check for extents that wrap around
        jbd2: make journal y2038 safe
        jbd2: track more dependencies on transaction commit
        jbd2: move lockdep tracking to journal_s
        jbd2: move lockdep instrumentation for jbd2 handles
        ext4: respect the nobarrier mount option in nojournal mode
        ...
      396d1099
    • Linus Torvalds's avatar
      Merge tag 'pnp-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 59ebc44e
      Linus Torvalds authored
      Pull PNP update from Rafael Wysocki:
       "One simple change to make the PNP core use device_initcall() instead
        of module_init() to run pnpbios_thread_init() (Paul Gortmaker)"
      
      * tag 'pnp-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PNP: make pnpbios core explicitly non-modular
      59ebc44e
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e663107f
      Linus Torvalds authored
      Pull ACPI updates from Rafael Wysocki:
       "The new feaures here are the support for ACPI overlays (allowing ACPI
        tables to be loaded at any time from EFI variables or via configfs)
        and the LPI (Low-Power Idle) support.  Also notable is the ACPI-based
        NUMA support for ARM64.
      
        Apart from that we have two new drivers, for the DPTF (Dynamic Power
        and Thermal Framework) power participant device and for the Intel
        Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for
        the Boot Error Record Table (BERT) in APEI and support for
        platform-initiated graceful shutdown.
      
        Plus two new pieces of documentation and usual assorted fixes and
        cleanups in quite a few places.
      
        Specifics:
      
         - Support for ACPI SSDT overlays allowing Secondary System
           Description Tables (SSDTs) to be loaded at any time from EFI
           variables or via configfs (Octavian Purdila, Mika Westerberg).
      
         - Support for the ACPI LPI (Low-Power Idle) feature introduced in
           ACPI 6.0 and allowing processor idle states to be represented in
           ACPI tables in a hierarchical way (with the help of Processor
           Container objects) and support for ACPI idle states management on
           ARM64, based on LPI (Sudeep Holla).
      
         - General improvements of ACPI support for NUMA and ARM64 support for
           ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
      
         - General improvements of the ACPI table upgrade mechanism and ARM64
           support for that feature (Aleksey Makarov, Jon Masters).
      
         - Support for the Boot Error Record Table (BERT) in APEI and
           improvements of kernel messages printed by the error injection code
           (Huang Ying, Borislav Petkov).
      
         - New driver for the Intel Broxton WhiskeyCove PMIC operation region
           and support for the REGS operation region on Broxton, PMIC code
           cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
      
         - New driver for the power participant device which is part of the
           Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
           reorganization (Srinivas Pandruvada).
      
         - Support for the platform-initiated graceful shutdown feature
           introduced in ACPI 6.1 (Prashanth Prakash).
      
         - ACPI button driver update related to lid input events generated
           automatically on initialization and system resume that have been
           problematic for some time (Lv Zheng).
      
         - ACPI EC driver cleanups (Lv Zheng).
      
         - Documentation of the ACPICA release automation process and the
           in-kernel ACPI AML debugger (Lv Zheng).
      
         - New blacklist entry and two fixes for the ACPI backlight driver
           (Alex Hung, Arvind Yadav, Ralf Gerbig).
      
         - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
      
         - ACPI CPPC code changes to make it more robust against possible
           defects in ACPI tables and new symbol definitions for PCC (Hoan
           Tran).
      
         - System reboot code modification to execute the ACPI _PTS (Prepare
           To Sleep) method in addition to _TTS (Ocean He).
      
         - ACPICA-related change to carry out lock ordering checks in ACPICA
           if ACPICA debug is enabled in the kernel (Lv Zheng).
      
         - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
           Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)"
      
      * tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
        ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
        arm64: add support for ACPI Low Power Idle(LPI)
        drivers: firmware: psci: initialise idle states using ACPI LPI
        cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
        arm64: cpuidle: drop __init section marker to arm_cpuidle_init
        ACPI / processor_idle: Add support for Low Power Idle(LPI) states
        ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
        ACPI / DPTF: move int340x_thermal.c to the DPTF folder
        ACPI / DPTF: Add DPTF power participant driver
        ACPI / lpat: make it explicitly non-modular
        ACPI / dock: make dock explicitly non-modular
        ACPI / PCI: make pci_slot explicitly non-modular
        ACPI / PMIC: remove modular references from non-modular code
        ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
        ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
        ACPI / debugger: Add AML debugger documentation
        ACPI: Add documentation describing ACPICA release automation
        ACPI: add support for loading SSDTs via configfs
        ACPI: add support for configfs
        efi / ACPI: load SSTDs from EFI variables
        ...
      e663107f
    • Linus Torvalds's avatar
      Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6453dbdd
      Linus Torvalds authored
      Pull power management updates from Rafael  Wysocki:
       "Again, the majority of changes go into the cpufreq subsystem, but
        there are no big features this time.  The cpufreq changes that stand
        out somewhat are the governor interface rework and improvements
        related to the handling of frequency tables.  Apart from those, there
        are fixes and new device/CPU IDs in drivers, cleanups and an
        improvement of the new schedutil governor.
      
        Next, there are some changes in the hibernation core, including a fix
        for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
        during resume from hibernation, a few core improvements related to
        memory management during resume, a couple of additional debug features
        and cleanups.
      
        Finally, we have some fixes and cleanups in the devfreq subsystem,
        generic power domains framework improvements related to system
        suspend/resume, support for some new chips in intel_idle and in the
        power capping RAPL driver, a new version of the AnalyzeSuspend utility
        and some assorted fixes and cleanups.
      
        Specifics:
      
         - Rework the cpufreq governor interface to make it more
           straightforward and modify the conservative governor to avoid using
           transition notifications (Rafael Wysocki).
      
         - Rework the handling of frequency tables by the cpufreq core to make
           it more efficient (Viresh Kumar).
      
         - Modify the schedutil governor to reduce the number of wakeups it
           causes to occur in cases when the CPU frequency doesn't need to be
           changed (Steve Muckle, Viresh Kumar).
      
         - Fix some minor issues and clean up code in the cpufreq core and
           governors (Rafael Wysocki, Viresh Kumar).
      
         - Add Intel Broxton support to the intel_pstate driver (Srinivas
           Pandruvada).
      
         - Fix problems related to the config TDP feature and to the validity
           of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
           Srinivas Pandruvada).
      
         - Make intel_pstate update the cpu_frequency tracepoint even if the
           frequency doesn't change to avoid confusing powertop (Rafael
           Wysocki).
      
         - Clean up the usage of __init/__initdata in intel_pstate, mark some
           of its internal variables as __read_mostly and drop an unused
           structure element from it (Jisheng Zhang, Carsten Emde).
      
         - Clean up the usage of some duplicate MSR symbols in intel_pstate
           and turbostat (Srinivas Pandruvada).
      
         - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
           Adiga, Viresh Kumar, Ben Dooks).
      
         - Fix a regression (introduced during the 4.5 cycle) in the
           pcc-cpufreq driver by reverting the problematic commit (Andreas
           Herrmann).
      
         - Add support for Intel Denverton to intel_idle, clean up Broxton
           support in it and make it explicitly non-modular (Jacob Pan, Jan
           Beulich, Paul Gortmaker).
      
         - Add support for Denverton and Ivy Bridge server to the Intel RAPL
           power capping driver and make it more careful about the handing of
           MSRs that may not be present (Jacob Pan, Xiaolong Wang).
      
         - Fix resume from hibernation on x86-64 by making the CPU offline
           during resume avoid using MONITOR/MWAIT in the "play dead" loop
           which may lead to an inadvertent "revival" of a "dead" CPU and a
           page fault leading to a kernel crash from it (Rafael Wysocki).
      
         - Make memory management during resume from hibernation more
           straightforward (Rafael Wysocki).
      
         - Add debug features that should help to detect problems related to
           hibernation and resume from it (Rafael Wysocki, Chen Yu).
      
         - Clean up hibernation core somewhat (Rafael Wysocki).
      
         - Prevent KASAN from instrumenting the hibernation core which leads
           to large numbers of false-positives from it (James Morse).
      
         - Prevent PM (hibernate and suspend) notifiers from being called
           during the cleanup phase if they have not been called during the
           corresponding preparation phase which is possible if one of the
           other notifiers returns an error at that time (Lianwei Wang).
      
         - Improve suspend-related debug printout in the tasks freezer and
           clean up suspend-related console handling (Roger Lu, Borislav
           Petkov).
      
         - Update the AnalyzeSuspend script in the kernel sources to version
           4.2 (Todd Brandt).
      
         - Modify the generic power domains framework to make it handle system
           suspend/resume better (Ulf Hansson).
      
         - Make the runtime PM framework avoid resuming devices synchronously
           when user space changes the runtime PM settings for them and
           improve its error reporting (Rafael Wysocki, Linus Walleij).
      
         - Fix error paths in devfreq drivers (exynos, exynos-ppmu,
           exynos-bus) and in the core, make some devfreq code explicitly
           non-modular and change some of it into tristate (Bartlomiej
           Zolnierkiewicz, Peter Chen, Paul Gortmaker).
      
         - Add DT support to the generic PM clocks management code and make it
           export some more symbols (Jon Hunter, Paul Gortmaker).
      
         - Make the PCI PM core code slightly more robust against possible
           driver errors (Andy Shevchenko).
      
         - Make it possible to change DESTDIR and PREFIX in turbostat (Andy
           Shevchenko)"
      
      * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
        Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
        PM / hibernate: Introduce test_resume mode for hibernation
        cpufreq: export cpufreq_driver_resolve_freq()
        cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
        PCI / PM: check all fields in pci_set_platform_pm()
        cpufreq: acpi-cpufreq: use cached frequency mapping when possible
        cpufreq: schedutil: map raw required frequency to driver frequency
        cpufreq: add cpufreq_driver_resolve_freq()
        cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
        intel_pstate: Update cpu_frequency tracepoint every time
        cpufreq: intel_pstate: clean remnant struct element
        PM / tools: scripts: AnalyzeSuspend v4.2
        x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
        cpufreq: powernv: Replacing pstate_id with frequency table index
        intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
        PM / hibernate: Image data protection during restoration
        PM / hibernate: Add missing braces in __register_nosave_region()
        PM / hibernate: Clean up comments in snapshot.c
        PM / hibernate: Clean up function headers in snapshot.c
        PM / hibernate: Add missing braces in hibernate_setup()
        ...
      6453dbdd
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.8-1' of... · 27b79027
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v4.8-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull x8 platform driver updates from Darren Hart:
       "Several new quirks and tweaks for new platforms to existing laptop
        drivers.  A new ACPI virtual power button driver, similar to the
        intel-hid driver.  A rework of the dell keymap, using a single sparse
        keymap for all machines.  A few fixes and cleanups.
      
        Summary:
      
        intel-vbtn:
         - new driver for Intel Virtual Button
      
        intel_pmc_core:
         - Convert to DEFINE_DEBUGFS_ATTRIBUTE
      
        fujitsu-laptop:
         - Rework brightness of eco led
      
        asus-wmi:
         - Add quirk_no_rfkill_wapf4 for the Asus X456UA
         - Add quirk_no_rfkill_wapf4 for the Asus X456UF
         - Add quirk_no_rfkill for the Asus Z550MA
         - Add quirk_no_rfkill for the Asus U303LB
         - Add quirk_no_rfkill for the Asus N552VW
         - Create quirk for airplane_mode LED
         - Add ambient light sensor toggle key
      
        asus-wireless:
         - Toggle airplane mode LED
      
        intel_telemetry:
         - Remove Monitor MWAIT feature dependency
      
        intel-hid:
         - Remove duplicated acpi_remove_notify_handler
      
        fujitsu-laptop:
         - Add support for eco LED
         - Support touchpad toggle hotkey on Skylake-based models
         - Remove unused macros
         - Use module name in debug messages
      
        hp-wmi:
         - Fix wifi cannot be hard-unblocked
      
        toshiba_acpi:
         - Bump driver version and update copyright year
         - Remove the position sysfs entry
         - Add IIO interface for accelerometer axis data
      
        dell-wmi:
         - Add a WMI event code for display on/off
         - Generate one sparse keymap for all machines
         - Add information about other WMI event codes
         - Sort WMI event codes and update comments
         - Ignore WMI event code 0xe045"
      
      * tag 'platform-drivers-x86-v4.8-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (26 commits)
        intel-vbtn: new driver for Intel Virtual Button
        intel_pmc_core: Convert to DEFINE_DEBUGFS_ATTRIBUTE
        fujitsu-laptop: Rework brightness of eco led
        asus-wmi: Add quirk_no_rfkill_wapf4 for the Asus X456UA
        asus-wmi: Add quirk_no_rfkill_wapf4 for the Asus X456UF
        asus-wmi: Add quirk_no_rfkill for the Asus Z550MA
        asus-wmi: Add quirk_no_rfkill for the Asus U303LB
        asus-wmi: Add quirk_no_rfkill for the Asus N552VW
        asus-wmi: Create quirk for airplane_mode LED
        asus-wireless: Toggle airplane mode LED
        intel_telemetry: Remove Monitor MWAIT feature dependency
        intel-hid: Remove duplicated acpi_remove_notify_handler
        asus-wmi: Add ambient light sensor toggle key
        fujitsu-laptop: Add support for eco LED
        fujitsu-laptop: Support touchpad toggle hotkey on Skylake-based models
        fujitsu-laptop: Remove unused macros
        fujitsu-laptop: Use module name in debug messages
        hp-wmi: Fix wifi cannot be hard-unblocked
        toshiba_acpi: Bump driver version and update copyright year
        toshiba_acpi: Remove the position sysfs entry
        ...
      27b79027
    • Linus Torvalds's avatar
      Merge tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · f7e68169
      Linus Torvalds authored
      Pull device mapper updates from Mike Snitzer:
      
       - initially based on Jens' 'for-4.8/core' (given all the flag churn)
         and later merged with 'for-4.8/core' to pickup the QUEUE_FLAG_DAX
         commits that DM depends on to provide its DAX support
      
       - clean up the bio-based vs request-based DM core code by moving the
         request-based DM core code out to dm-rq.[hc]
      
       - reinstate bio-based support in the DM multipath target (done with the
         idea that fast storage like NVMe over Fabrics could benefit) -- while
         preserving support for request_fn and blk-mq request-based DM mpath
      
       - SCSI and DM multipath persistent reservation fixes that were
         coordinated with Martin Petersen.
      
       - the DM raid target saw the most extensive change this cycle; it now
         provides reshape and takeover support (by layering ontop of the
         corresponding MD capabilities)
      
       - DAX support for DM core and the linear, stripe and error targets
      
       - a DM thin-provisioning block discard vs allocation race fix that
         addresses potential for corruption
      
       - a stable fix for DM verity-fec's block calculation during decode
      
       - a few cleanups and fixes to DM core and various targets
      
      * tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (73 commits)
        dm: allow bio-based table to be upgraded to bio-based with DAX support
        dm snap: add fake origin_direct_access
        dm stripe: add DAX support
        dm error: add DAX support
        dm linear: add DAX support
        dm: add infrastructure for DAX support
        dm thin: fix a race condition between discarding and provisioning a block
        dm btree: fix a bug in dm_btree_find_next_single()
        dm raid: fix random optimal_io_size for raid0
        dm raid: address checkpatch.pl complaints
        dm: call PR reserve/unreserve on each underlying device
        sd: don't use the ALL_TG_PT bit for reservations
        dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
        dm raid: change logical functions to actually return bool
        dm raid: use rdev_for_each in status
        dm raid: use rs->raid_disks to avoid memory leaks on free
        dm raid: support delta_disks for raid1, fix table output
        dm raid: enhance reshape check and factor out reshape setup
        dm raid: allow resize during recovery
        dm raid: fix rs_is_recovering() to allow for lvextend
        ...
      f7e68169
    • Huang Ying's avatar
    • Johannes Weiner's avatar
      cgroup: remove unnecessary 0 check from css_from_id() · cb773df8
      Johannes Weiner authored
      css_idr allocation starts at 1, so index 0 will never point to an item.
      css_from_id() currently filters that before asking idr_find(), but
      idr_find() would also just return NULL, so this is not needed.
      
      Link: http://lkml.kernel.org/r/20160617162427.GC19084@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb773df8
    • Johannes Weiner's avatar
      cgroup: fix idr leak for the first cgroup root · 1fe4d021
      Johannes Weiner authored
      The valid cgroup hierarchy ID range includes 0, so we can't filter for
      positive numbers when freeing it, or it'll leak the first ID.  No big
      deal, just disruptive when reading the code.
      
      The ID is freed during error handling and when the reference count hits
      zero, so the double-free test is not necessary; remove it.
      
      Link: http://lkml.kernel.org/r/20160617162359.GB19084@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1fe4d021
    • Li RongQing's avatar
      mm: memcontrol: fix documentation for compound parameter · 25843c2b
      Li RongQing authored
      Commit f627c2f5 ("memcg: adjust to support new THP refcounting")
      adds a compound parameter for several functions, and change one as
      compound for mem_cgroup_move_account but it does not change the
      comments.
      
      Link: http://lkml.kernel.org/r/1465368216-9393-1-git-send-email-roy.qing.li@gmail.com
      
      
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25843c2b
    • Li RongQing's avatar
      mm: memcontrol: remove BUG_ON in uncharge_list · 17408d78
      Li RongQing authored
      When calling uncharge_list, if a page is transparent huge we don't need
      to BUG_ON about non-transparent huge, since nobody should be able to see
      the page at this stage and this page cannot be raced against with a THP
      split.
      
      This check became unneeded after 0a31bc97 ("mm: memcontrol: rewrite
      uncharge API").
      
      [mhocko@suse.com: changelog enhancements]
      Link: http://lkml.kernel.org/r/1465369248-13865-1-git-send-email-roy.qing.li@gmail.com
      
      
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17408d78
    • Minchan Kim's avatar
      mm: fix build warnings in <linux/compaction.h> · dd4123f3
      Minchan Kim authored
      Randy reported below build error.
      
      > In file included from ../include/linux/balloon_compaction.h:48:0,
      >                  from ../mm/balloon_compaction.c:11:
      > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
      >  static inline int compaction_register_node(struct node *node)
      > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
      >  static inline void compaction_unregister_node(struct node *node)
      >
      
      It was caused by non-lru page migration which needs compaction.h but
      compaction.h doesn't include any header to be standalone.
      
      I think proper header for non-lru page migration is migrate.h rather
      than compaction.h because migrate.h has already headers needed to work
      non-lru page migration indirectly like isolate_mode_t, migrate_mode
      MIGRATEPAGE_SUCCESS.
      
      [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
      Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
      
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd4123f3
    • Ebru Akagunduz's avatar
      mm, thp: convert from optimistic swapin collapsing to conservative · 0db501f7
      Ebru Akagunduz authored
      To detect whether khugepaged swapin is worthwhile, this patch checks the
      amount of young pages.  There should be at least half of HPAGE_PMD_NR to
      swapin.
      
      Link: http://lkml.kernel.org/r/1468109451-1615-1-git-send-email-ebru.akagunduz@gmail.com
      
      
      Signed-off-by: default avatarEbru Akagunduz <ebru.akagunduz@gmail.com>
      Suggested-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0db501f7
    • Ebru Akagunduz's avatar
      mm, thp: fix comment inconsistency for swapin readahead functions · 47f863ea
      Ebru Akagunduz authored
      After fixing swapin issues, comment lines stayed as in old version.
      This patch updates the comments.
      
      Link: http://lkml.kernel.org/r/1468109345-32258-1-git-send-email-ebru.akagunduz@gmail.com
      
      
      Signed-off-by: default avatarEbru Akagunduz <ebru.akagunduz@gmail.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47f863ea
    • Kirill A. Shutemov's avatar
    • Kirill A. Shutemov's avatar
      shmem: split huge pages beyond i_size under memory pressure · 779750d2
      Kirill A. Shutemov authored
      Even if user asked to allocate huge pages always (huge=always), we
      should be able to free up some memory by splitting pages which are
      partly byound i_size if memory presure comes or once we hit limit on
      filesystem size (-o size=).
      
      In order to do this we maintain per-superblock list of inodes, which
      potentially have huge pages on the border of file size.
      
      Per-fs shrinker can reclaim memory by splitting such pages.
      
      If we hit -ENOSPC during shmem_getpage_gfp(), we try to split a page to
      free up space on the filesystem and retry allocation if it succeed.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-37-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      779750d2
    • Kirill A. Shutemov's avatar
      thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE · e496cf3d
      Kirill A. Shutemov authored
      For file mappings, we don't deposit page tables on THP allocation
      because it's not strictly required to implement split_huge_pmd(): we can
      just clear pmd and let following page faults to reconstruct the page
      table.
      
      But Power makes use of deposited page table to address MMU quirk.
      
      Let's hide THP page cache, including huge tmpfs, under separate config
      option, so it can be forbidden on Power.
      
      We can revert the patch later once solution for Power found.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-36-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e496cf3d
    • Kirill A. Shutemov's avatar
      khugepaged: add support of collapse for tmpfs/shmem pages · f3f0e1d2
      Kirill A. Shutemov authored
      This patch extends khugepaged to support collapse of tmpfs/shmem pages.
      We share fair amount of infrastructure with anon-THP collapse.
      
      Few design points:
      
        - First we are looking for VMA which can be suitable for mapping huge
          page;
      
        - If the VMA maps shmem file, the rest scan/collapse operations
          operates on page cache, not on page tables as in anon VMA case.
      
        - khugepaged_scan_shmem() finds a range which is suitable for huge
          page. The scan is lockless and shouldn't disturb system too much.
      
        - once the candidate for collapse is found, collapse_shmem() attempts
          to create a huge page:
      
            + scan over radix tree, making the range point to new huge page;
      
            + new huge page is not-uptodate, locked and freezed (refcount
              is 0), so nobody can touch them until we say so.
      
            + we swap in pages during the scan. khugepaged_scan_shmem()
              filters out ranges with more than khugepaged_max_ptes_swap
      	swapped out pages. It's HPAGE_PMD_NR/8 by default.
      
            + old pages are isolated, unmapped and put to local list in case
              to be restored back if collapse failed.
      
        - if collapse succeed, we retract pte page tables from VMAs where huge
          pages mapping is possible. The huge page will be mapped as PMD on
          next minor fault into the range.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-35-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3f0e1d2
    • Kirill A. Shutemov's avatar
      shmem: make shmem_inode_info::lock irq-safe · 4595ef88
      Kirill A. Shutemov authored
      We are going to need to call shmem_charge() under tree_lock to get
      accoutning right on collapse of small tmpfs pages into a huge one.
      
      The problem is that tree_lock is irq-safe and lockdep is not happy, that
      we take irq-unsafe lock under irq-safe[1].
      
      Let's convert the lock to irq-safe.
      
      [1] https://gist.github.com/kiryl/80c0149e03ed35dfaf26628b8e03cdbc
      
      Link: http://lkml.kernel.org/r/1466021202-61880-34-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4595ef88
    • Kirill A. Shutemov's avatar
      khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() · 988ddb71
      Kirill A. Shutemov authored
      Both variants of khugepaged_alloc_page() do up_read(&mm->mmap_sem)
      first: no point keep it inside the function.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-33-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      988ddb71
    • Kirill A. Shutemov's avatar
      thp: extract khugepaged from mm/huge_memory.c · b46e756f
      Kirill A. Shutemov authored
      khugepaged implementation grew to the point when it deserve separate
      file in source.
      
      Let's move it to mm/khugepaged.c.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-32-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b46e756f
    • Kirill A. Shutemov's avatar
      shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings · 657e3038
      Kirill A. Shutemov authored
      Let's wire up existing madvise() hugepage hints for file mappings.
      
      MADV_HUGEPAGE advise shmem to allocate huge page on page fault in the
      VMA.  It only has effect if the filesystem is mounted with huge=advise
      or huge=within_size.
      
      MADV_NOHUGEPAGE prevents hugepage from being allocated on page fault in
      the VMA.  It doesn't prevent a huge page from being allocated by other
      means, i.e.  page fault into different mapping or write(2) into file.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-31-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      657e3038
    • Kirill A. Shutemov's avatar
      shmem: add huge pages support · 800d8c63
      Kirill A. Shutemov authored
      Here's basic implementation of huge pages support for shmem/tmpfs.
      
      It's all pretty streight-forward:
      
        - shmem_getpage() allcoates huge page if it can and try to inserd into
          radix tree with shmem_add_to_page_cache();
      
        - shmem_add_to_page_cache() puts the page onto radix-tree if there's
          space for it;
      
        - shmem_undo_range() removes huge pages, if it fully within range.
          Partial truncate of huge pages zero out this part of THP.
      
          This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE)
          behaviour. As we don't really create hole in this case,
          lseek(SEEK_HOLE) may have inconsistent results depending what
          pages happened to be allocated.
      
        - no need to change shmem_fault: core-mm will map an compound page as
          huge if VMA is suitable;
      
      Link: http://lkml.kernel.org/r/1466021202-61880-30-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      800d8c63
    • Hugh Dickins's avatar
      shmem: get_unmapped_area align huge page · c01d5b30
      Hugh Dickins authored
      Provide a shmem_get_unmapped_area method in file_operations, called at
      mmap time to decide the mapping address.  It could be conditional on
      CONFIG_TRANSPARENT_HUGEPAGE, but save #ifdefs in other places by making
      it unconditional.
      
      shmem_get_unmapped_area() first calls the usual mm->get_unmapped_area
      (which we treat as a black box, highly dependent on architecture and
      config and executable layout).  Lots of conditions, and in most cases it
      just goes with the address that chose; but when our huge stars are
      rightly aligned, yet that did not provide a suitable address, go back to
      ask for a larger arena, within which to align the mapping suitably.
      
      There have to be some direct calls to shmem_get_unmapped_area(), not via
      the file_operations: because of the way shmem_zero_setup() is called to
      create a shmem object late in the mmap sequence, when MAP_SHARED is
      requested with MAP_ANONYMOUS or /dev/zero.  Though this only matters
      when /proc/sys/vm/shmem_huge has been set.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-29-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c01d5b30
    • Kirill A. Shutemov's avatar
      shmem: prepare huge= mount option and sysfs knob · 5a6e75f8
      Kirill A. Shutemov authored
      This patch adds new mount option "huge=".  It can have following values:
      
        - "always":
      	Attempt to allocate huge pages every time we need a new page;
      
        - "never":
      	Do not allocate huge pages;
      
        - "within_size":
      	Only allocate huge page if it will be fully within i_size.
      	Also respect fadvise()/madvise() hints;
      
        - "advise:
      	Only allocate huge pages if requested with fadvise()/madvise();
      
      Default is "never" for now.
      
      "mount -o remount,huge= /mountpoint" works fine after mount: remounting
      huge=never will not attempt to break up huge pages at all, just stop
      more from being allocated.
      
      No new config option: put this under CONFIG_TRANSPARENT_HUGEPAGE, which
      is the appropriate option to protect those who don't want the new bloat,
      and with which we shall share some pmd code.
      
      Prohibit the option when !CONFIG_TRANSPARENT_HUGEPAGE, just as mpol is
      invalid without CONFIG_NUMA (was hidden in mpol_parse_str(): make it
      explicit).
      
      Allow enabling THP only if the machine has_transparent_hugepage().
      
      But what about Shmem with no user-visible mount? SysV SHM, memfds,
      shared anonymous mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM
      objects, Ashmem.  Though unlikely to suit all usages, provide sysfs knob
      /sys/kernel/mm/transparent_hugepage/shmem_enabled to experiment with
      huge on those.
      
      And allow shmem_enabled two further values:
      
        - "deny":
      	For use in emergencies, to force the huge option off from
      	all mounts;
        - "force":
      	Force the huge option on for all - very useful for testing;
      
      Based on patch by Hugh Dickins.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-28-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a6e75f8
    • Kirill A. Shutemov's avatar
      mm, rmap: account shmem thp pages · 65c45377
      Kirill A. Shutemov authored
      Let's add ShmemHugePages and ShmemPmdMapped fields into meminfo and
      smaps.  It indicates how many times we allocate and map shmem THP.
      
      NR_ANON_TRANSPARENT_HUGEPAGES is renamed to NR_ANON_THPS.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-27-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65c45377
    • Kirill A. Shutemov's avatar
      truncate: handle file thp · fc127da0
      Kirill A. Shutemov authored
      For shmem/tmpfs we only need to tweak truncate_inode_page() and
      invalidate_mapping_pages().
      
      truncate_inode_pages_range() and invalidate_inode_pages2_range() are
      adjusted to use page_to_pgoff().
      
      Link: http://lkml.kernel.org/r/1466021202-61880-26-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc127da0
    • Kirill A. Shutemov's avatar
      filemap: prepare find and delete operations for huge pages · 83929372
      Kirill A. Shutemov authored
      For now, we would have HPAGE_PMD_NR entries in radix tree for every huge
      page.  That's suboptimal and it will be changed to use Matthew's
      multi-order entries later.
      
      'add' operation is not changed, because we don't need it to implement
      hugetmpfs: shmem uses its own implementation.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-25-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83929372
    • Kirill A. Shutemov's avatar
      radix-tree: implement radix_tree_maybe_preload_order() · c78c66d1
      Kirill A. Shutemov authored
      The new helper is similar to radix_tree_maybe_preload(), but tries to
      preload number of nodes required to insert (1 << order) continuous
      naturally-aligned elements.
      
      This is required to push huge pages into pagecache.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c78c66d1
    • Kirill A. Shutemov's avatar
    • Kirill A. Shutemov's avatar
      vmscan: split file huge pages before paging them out · 7751b2da
      Kirill A. Shutemov authored
      This is preparation of vmscan for file huge pages.  We cannot write out
      huge pages, so we need to split them on the way out.
      
      Link: http://lkml.kernel.org/r/1466021202-61880-22-git-send-email-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7751b2da