Skip to content
  1. Apr 02, 2016
    • Rafael J. Wysocki's avatar
      cpufreq: schedutil: New governor based on scheduler utilization data · 9bdcb44e
      Rafael J. Wysocki authored
      Add a new cpufreq scaling governor, called "schedutil", that uses
      scheduler-provided CPU utilization information as input for making
      its decisions.
      
      Doing that is possible after commit 34e2c555
      
       (cpufreq: Add
      mechanism for registering utilization update callbacks) that
      introduced cpufreq_update_util() called by the scheduler on
      utilization changes (from CFS) and RT/DL task status updates.
      In particular, CPU frequency scaling decisions may be based on
      the the utilization data passed to cpufreq_update_util() by CFS.
      
      The new governor is relatively simple.
      
      The frequency selection formula used by it depends on whether or not
      the utilization is frequency-invariant.  In the frequency-invariant
      case the new CPU frequency is given by
      
      	next_freq = 1.25 * max_freq * util / max
      
      where util and max are the last two arguments of cpufreq_update_util().
      In turn, if util is not frequency-invariant, the maximum frequency in
      the above formula is replaced with the current frequency of the CPU:
      
      	next_freq = 1.25 * curr_freq * util / max
      
      The coefficient 1.25 corresponds to the frequency tipping point at
      (util / max) = 0.8.
      
      All of the computations are carried out in the utilization update
      handlers provided by the new governor.  One of those handlers is
      used for cpufreq policies shared between multiple CPUs and the other
      one is for policies with one CPU only (and therefore it doesn't need
      to use any extra synchronization means).
      
      The governor supports fast frequency switching if that is supported
      by the cpufreq driver in use and possible for the given policy.
      In the fast switching case, all operations of the governor take
      place in its utilization update handlers.  If fast switching cannot
      be used, the frequency switch operations are carried out with the
      help of a work item which only calls __cpufreq_driver_target()
      (under a mutex) to trigger a frequency update (to a value already
      computed beforehand in one of the utilization update handlers).
      
      Currently, the governor treats all of the RT and DL tasks as
      "unknown utilization" and sets the frequency to the allowed
      maximum when updated from the RT or DL sched classes.  That
      heavy-handed approach should be replaced with something more
      subtle and specifically targeted at RT and DL tasks.
      
      The governor shares some tunables management code with the
      "ondemand" and "conservative" governors and uses some common
      definitions from cpufreq_governor.h, but apart from that it
      is stand-alone.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      9bdcb44e
    • Rafael J. Wysocki's avatar
      cpufreq: Support for fast frequency switching · b7898fda
      Rafael J. Wysocki authored
      
      
      Modify the ACPI cpufreq driver to provide a method for switching
      CPU frequencies from interrupt context and update the cpufreq core
      to support that method if available.
      
      Introduce a new cpufreq driver callback, ->fast_switch, to be
      invoked for frequency switching from interrupt context by (future)
      governors supporting that feature via (new) helper function
      cpufreq_driver_fast_switch().
      
      Add two new policy flags, fast_switch_possible, to be set by the
      cpufreq driver if fast frequency switching can be used for the
      given policy and fast_switch_enabled, to be set by the governor
      if it is going to use fast frequency switching for the given
      policy.  Also add a helper for setting the latter.
      
      Since fast frequency switching is inherently incompatible with
      cpufreq transition notifiers, make it possible to set the
      fast_switch_enabled only if there are no transition notifiers
      already registered and make the registration of new transition
      notifiers fail if fast_switch_enabled is set for at least one
      policy.
      
      Implement the ->fast_switch callback in the ACPI cpufreq driver
      and make it set fast_switch_possible during policy initialization
      as appropriate.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      b7898fda
    • Rafael J. Wysocki's avatar
      cpufreq: Move governor symbols to cpufreq.h · 379480d8
      Rafael J. Wysocki authored
      
      
      Move definitions of symbols related to transition latency and
      sampling rate to include/linux/cpufreq.h so they can be used by
      (future) goverernors located outside of drivers/cpufreq/.
      
      No functional changes.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      379480d8
    • Rafael J. Wysocki's avatar
      cpufreq: Move governor attribute set headers to cpufreq.h · 66893b6a
      Rafael J. Wysocki authored
      
      
      Move definitions and function headers related to struct gov_attr_set
      to include/linux/cpufreq.h so they can be used by (future) goverernors
      located outside of drivers/cpufreq/.
      
      No functional changes.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      66893b6a
    • Rafael J. Wysocki's avatar
      cpufreq: governor: Move abstract gov_attr_set code to seperate file · 2d0c58ad
      Rafael J. Wysocki authored
      
      
      Move abstract code related to struct gov_attr_set to a separate (new)
      file so it can be shared with (future) goverernors that won't share
      more code with "ondemand" and "conservative".
      
      No intentional functional changes.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      2d0c58ad
    • Rafael J. Wysocki's avatar
      cpufreq: governor: New data type for management part of dbs_data · 0dd3c1d6
      Rafael J. Wysocki authored
      
      
      In addition to fields representing governor tunables, struct dbs_data
      contains some fields needed for the management of objects of that
      type.  As it turns out, that part of struct dbs_data may be shared
      with (future) governors that won't use the common code used by
      "ondemand" and "conservative", so move it to a separate struct type
      and modify the code using struct dbs_data to follow.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      0dd3c1d6
    • Rafael J. Wysocki's avatar
      cpufreq: sched: Helpers to add and remove update_util hooks · 0bed612b
      Rafael J. Wysocki authored
      
      
      Replace the single helper for adding and removing cpufreq utilization
      update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
      cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
      and modify the users of cpufreq_set_update_util_data() accordingly.
      
      With the new helpers, the code using them doesn't need to worry
      about the internals of struct update_util_data and in particular
      it doesn't need to worry about populating the func field in it
      properly upfront.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      0bed612b
    • Rafael J. Wysocki's avatar
      Merge back intel_pstate fixes for v4.6. · 9fa64d64
      Rafael J. Wysocki authored
      * pm-cpufreq:
        intel_pstate: Avoid extra invocation of intel_pstate_sample()
        intel_pstate: Do not set utilization update hook too early
      9fa64d64
    • Rafael J. Wysocki's avatar
      intel_pstate: Avoid extra invocation of intel_pstate_sample() · febce40f
      Rafael J. Wysocki authored
      The initialization of intel_pstate for a given CPU involves populating
      the fields of its struct cpudata that represent the previous sample,
      but currently that is done in a problematic way.
      
      Namely, intel_pstate_init_cpu() makes an extra call to
      intel_pstate_sample() so it reads the current register values that
      will be used to populate the "previous sample" record during the
      next invocation of intel_pstate_sample().  However, after commit
      a4675fbc
      
       (cpufreq: intel_pstate: Replace timers with utilization
      update callbacks) that doesn't work for last_sample_time, because
      the time value is passed to intel_pstate_sample() as an argument now.
      Passing 0 to it from intel_pstate_init_cpu() is problematic, because
      that causes cpu->last_sample_time == 0 to be visible in
      get_target_pstate_use_performance() (and hence the extra
      cpu->last_sample_time > 0 check in there) and effectively allows
      the first invocation of intel_pstate_sample() from
      intel_pstate_update_util() to happen immediately after the
      initialization which may lead to a significant "turn on"
      effect in the governor algorithm.
      
      To mitigate that issue, rework the initialization to avoid the
      extra intel_pstate_sample() call from intel_pstate_init_cpu().
      Instead, make intel_pstate_sample() return false if it has been
      called with cpu->sample.time equal to zero, which will make
      intel_pstate_update_util() skip the sample in that case, and
      reset cpu->sample.time from intel_pstate_set_update_util_hook()
      to make the algorithm start properly every time the hook is set.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      febce40f
  2. Mar 31, 2016
    • Rafael J. Wysocki's avatar
      intel_pstate: Do not set utilization update hook too early · bb6ab52f
      Rafael J. Wysocki authored
      
      
      The utilization update hook in the intel_pstate driver is set too
      early, as it only should be set after the policy has been fully
      initialized by the core.  That may cause intel_pstate_update_util()
      to use incorrect data and put the CPUs into incorrect P-states as
      a result.
      
      To prevent that from happening, make intel_pstate_set_policy() set
      the utilization update hook instead of intel_pstate_init_cpu() so
      intel_pstate_update_util() only runs when all things have been
      initialized as appropriate.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bb6ab52f
  3. Mar 27, 2016
    • Linus Torvalds's avatar
      Linux 4.6-rc1 · f55532a0
      Linus Torvalds authored
      v4.6-rc1
      f55532a0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · d5a38f6e
      Linus Torvalds authored
      Pull Ceph updates from Sage Weil:
       "There is quite a bit here, including some overdue refactoring and
        cleanup on the mon_client and osd_client code from Ilya, scattered
        writeback support for CephFS and a pile of bug fixes from Zheng, and a
        few random cleanups and fixes from others"
      
      [ I already decided not to pull this because of it having been rebased
        recently, but ended up changing my mind after all.  Next time I'll
        really hold people to it.  Oh well.   - Linus ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
        libceph: use KMEM_CACHE macro
        ceph: use kmem_cache_zalloc
        rbd: use KMEM_CACHE macro
        ceph: use lookup request to revalidate dentry
        ceph: kill ceph_get_dentry_parent_inode()
        ceph: fix security xattr deadlock
        ceph: don't request vxattrs from MDS
        ceph: fix mounting same fs multiple times
        ceph: remove unnecessary NULL check
        ceph: avoid updating directory inode's i_size accidentally
        ceph: fix race during filling readdir cache
        libceph: use sizeof_footer() more
        ceph: kill ceph_empty_snapc
        ceph: fix a wrong comparison
        ceph: replace CURRENT_TIME by current_fs_time()
        ceph: scattered page writeback
        libceph: add helper that duplicates last extent operation
        libceph: enable large, variable-sized OSD requests
        libceph: osdc->req_mempool should be backed by a slab pool
        libceph: make r_request msg_size calculation clearer
        ...
      d5a38f6e
    • Linus Torvalds's avatar
      Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 698f415c
      Linus Torvalds authored
      Pull orangefs filesystem from Mike Marshall.
      
      This finally merges the long-pending orangefs filesystem, which has been
      much cleaned up with input from Al Viro over the last six months.  From
      the documentation file:
      
       "OrangeFS is an LGPL userspace scale-out parallel storage system.  It
        is ideal for large storage problems faced by HPC, BigData, Streaming
        Video, Genomics, Bioinformatics.
      
        Orangefs, originally called PVFS, was first developed in 1993 by Walt
        Ligon and Eric Blumer as a parallel file system for Parallel Virtual
        Machine (PVM) as part of a NASA grant to study the I/O patterns of
        parallel programs.
      
        Orangefs features include:
      
          - Distributes file data among multiple file servers
          - Supports simultaneous access by multiple clients
          - Stores file data and metadata on servers using local file system
            and access methods
          - Userspace implementation is easy to install and maintain
          - Direct MPI support
          - Stateless"
      
      see Documentation/filesystems/orangefs.txt for more in-depth details.
      
      * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
        orangefs: fix orangefs_superblock locking
        orangefs: fix do_readv_writev() handling of error halfway through
        orangefs: have ->kill_sb() evict the VFS side of things first
        orangefs: sanitize ->llseek()
        orangefs-bufmap.h: trim unused junk
        orangefs: saner calling conventions for getting a slot
        orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
        orangefs: get rid of readdir_handle_s
        ornagefs: ensure that truncate has an up to date inode size
        orangefs: move code which sets i_link to orangefs_inode_getattr
        orangefs: remove needless wrapper around GFP_KERNEL
        orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
        orangefs: refactor inode type or link_target change detection
        orangefs: use new getattr for revalidate and remove old getattr
        orangefs: use new getattr in inode getattr and permission
        orangefs: use new orangefs_inode_getattr to get size in write and llseek
        orangefs: use new orangefs_inode_getattr to create new inodes
        orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
        orangefs: remove inode->i_lock wrapper
        orangefs: put register_chrdev immediately before register_filesystem
        ...
      698f415c
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb · b4cec5f6
      Linus Torvalds authored
      Pull NTB bug fixes from Jon Mason:
       "NTB bug fixes for tasklet from spinning forever, link errors,
        translation window setup, NULL ptr dereference, and ntb-perf errors.
      
        Also, a modification to the driver API that makes _addr functions
        optional"
      
      * tag 'ntb-4.6' of git://github.com/jonmason/ntb:
        NTB: Remove _addr functions from ntb_hw_amd
        NTB: Make _addr functions optional in the API
        NTB: Fix incorrect clean up routine in ntb_perf
        NTB: Fix incorrect return check in ntb_perf
        ntb: fix possible NULL dereference
        ntb: add missing setup of translation window
        ntb: stop link work when we do not have memory
        ntb: stop tasklet from spinning forever during shutdown.
        ntb: perf test: fix address space confusion
      b4cec5f6
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 895a1067
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "The only new stuff which missed the first pull request is an update to
        the UFS driver.
      
        The rest is an assortment of bug fixes and minor tweaks which appeared
        recently (some are fixes for recent code and some are stuff spotted
        recently by the checkers or the new gcc-6 compiler [most of Arnd's
        stuff])"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
        scsi_common: do not clobber fixed sense information
        scsi: ufs: select CONFIG_NLS
        scsi: fc: use get/put_unaligned64 for wwn access
        fnic: move printk()s outside of the critical code section.
        qla2xxx: avoid maybe_uninitialized warning
        megaraid_sas: add missing curly braces in ioctl handler
        lpfc: fix misleading indentation
        scsi_transport_sas: add 'scsi_target_id' sysfs attribute
        scsi_dh_alua: uninitialized variable in alua_check_vpd()
        scsi: ufs-qcom: add printouts of testbus debug registers
        scsi: ufs-qcom: enable/disable the device ref clock
        scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup
        scsi: ufs: add device quirk delay before putting UFS rails in LPM
        scsi: ufs: fix leakage during link off state
        scsi: ufs: tune UniPro parameters to optimize hibern8 exit time
        scsi: ufs: handle non spec compliant bkops behaviour by device
        scsi: ufs: add retry for query descriptors
        scsi: ufs: add error recovery after DL NAC error
        scsi: ufs: make error handling bit faster
        scsi: ufs: disable vccq if it's not needed by UFS device
        ...
      895a1067
    • Linus Torvalds's avatar
      f2fs/crypto: fix xts_tweak initialization · 02fc59a0
      Linus Torvalds authored
      Commit 0b81d077 ("fs crypto: move per-file encryption from f2fs
      tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and
      renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_"
      to just "FS" for preprocessor symbols).
      
      Because of the symbol renaming, it's a bit hard to see it as a file
      move: use
      
          git show -M30 0b81d077
      
      
      
      to lower the rename detection to just 30% similarity and make git show
      the files as renamed (the header file won't be shown as a rename even
      then - since all it contains is symbol definitions, it looks almost
      completely different).
      
      Even with the renames showing as renames, the diffs are not all that
      easy to read, since so much is just the renames.  But Eric Biggers
      noticed that it's not just all renames: the initialization of the
      xts_tweak had been broken too, using the inode number rather than the
      page offset.
      
      That's not right - it makes the xfs_tweak the same for all pages of each
      inode.  It _might_ make sense to make the xfs_tweak contain both the
      offset _and_ the inode number, but not just the inode number.
      
      Reported-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02fc59a0
  4. Mar 26, 2016
    • Allen Hubbe's avatar
      NTB: Remove _addr functions from ntb_hw_amd · 4f1b50c3
      Allen Hubbe authored
      Kernel zero day testing warned about address space confusion.  A virtual
      iomem address was used where a physical address is expected.  The
      offending functions implement an optional part of the api, so they are
      removed.  They can be added later, after testing.
      
      Fixes: a1b36958
      
      
      
      Signed-off-by: default avatarAllen Hubbe <Allen.Hubbe@emc.com>
      Acked-by: default avatarXiangliang Yu <Xiangliang.Yu@amd.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      4f1b50c3
    • Al Viro's avatar
      orangefs: fix orangefs_superblock locking · 45996492
      Al Viro authored
      
      
      * switch orangefs_remount() to taking ORANGEFS_SB(sb) instead of sb
      * remove from the list _before_ orangefs_unmount() - request_mutex
      in the latter will make sure that nothing observed in the loop in
      ORANGEFS_DEV_REMOUNT_ALL handling will get freed until the end
      of loop
      * on removal, keep the forward pointer and zero the back one.  That
      way we can drop and regain the spinlock in the loop body (again,
      ORANGEFS_DEV_REMOUNT_ALL one) and still be able to get to the
      rest of the list.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      45996492
    • Al Viro's avatar
      orangefs: fix do_readv_writev() handling of error halfway through · 6d4c1a30
      Al Viro authored
      
      
      Error should only be returned if nothing had been read/written.
      Otherwise we need to report a short read/write instead.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      6d4c1a30
    • Al Viro's avatar
    • Al Viro's avatar
      orangefs: sanitize ->llseek() · 177f8fc4
      Al Viro authored
      
      
      a) open files can't have NULL inodes
      b) it's SEEK_END, not ORANGEFS_SEEK_END; no need to get cute.
      c) make_bad_inode() on lseek()?
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      177f8fc4
    • Al Viro's avatar
      orangefs-bufmap.h: trim unused junk · 7df240d7
      Al Viro authored
      
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      7df240d7
    • Al Viro's avatar
      orangefs: saner calling conventions for getting a slot · b8a99a8f
      Al Viro authored
      
      
      just have it return the slot number or -E... - the caller checks
      the sign anyway
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      b8a99a8f
    • Al Viro's avatar
      orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer · bf6bf606
      Al Viro authored
      
      
      it's always __orangefs_bufmap
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      bf6bf606
    • Al Viro's avatar
      orangefs: get rid of readdir_handle_s · 9f5e2f7f
      Al Viro authored
      
      
      no point, really - we couldn't keep those across the calls of
      getdents(); it would be too easy to DoS, having all slots exhausted.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      9f5e2f7f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 606c61a0
      Linus Torvalds authored
      Merge fourth patch-bomb from Andrew Morton:
       "A lot more stuff than expected, sorry.  A bunch of ocfs2 reviewing was
        finished off.
      
         - mhocko's oom-reaper out-of-memory-handler changes
      
         - ocfs2 fixes and features
      
         - KASAN feature work
      
         - various fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (42 commits)
        thp: fix typo in khugepaged_scan_pmd()
        MAINTAINERS: fill entries for KASAN
        mm/filemap: generic_file_read_iter(): check for zero reads unconditionally
        kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
        mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
        arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
        mm, kasan: add GFP flags to KASAN API
        mm, kasan: SLAB support
        kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
        include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill()
        mm/page_alloc: prevent merging between isolated and other pageblocks
        drivers/memstick/host/r592.c: avoid gcc-6 warning
        ocfs2: extend enough credits for freeing one truncate record while replaying truncate records
        ocfs2: extend transaction for ocfs2_remove_rightmost_path() and ocfs2_update_edge_lengths() before to avoid inconsistency between inode and et
        ocfs2/dlm: move lock to the tail of grant queue while doing in-place convert
        ocfs2: solve a problem of crossing the boundary in updating backups
        ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local
        ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list
        ocfs2/dlm: fix race between convert and recovery
        ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write()
        ...
      606c61a0
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 15dbc136
      Linus Torvalds authored
      Pull power management fixlet from Rafael Wysocki:
       "One of commits in my previous pull request changed the permissions of
        drivers/power/avs/rockchip-io-domain.c to executable by mistake"
      
      * tag 'pm+acpi-4.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Fix permissions of drivers/power/avs/rockchip-io-domain.c
      15dbc136
    • Linus Torvalds's avatar
      Merge tag 'please-pull-preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · dad44dec
      Linus Torvalds authored
      Pull ia64 update from Tony Luck:
       "Wire up new system calls p{read,write}v2 for ia64"
      
      * tag 'please-pull-preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Enable preadv2 and pwritev2 syscalls for ia64
      dad44dec
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · c155c749
      Linus Torvalds authored
      Pull more input updates from Dmitry Torokhov:
       "Second round of updates for the input subsystem.
      
        The BYD PS/2 protocol driver now uses absolute reporting mode and
        should behave more like other touchpads; Synaptics driver needed to
        extend one of its quirks to a newer firmware version, and a few USB
        drivers got tightened up checks for the contents of their descriptors"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: sur40 - fix DMA on stack
        Input: ati_remote2 - fix crashes on detecting device with invalid descriptor
        Input: synaptics - handle spurious release of trackstick buttons, again
        Input: synaptics-rmi4 - remove check of Non-NULL array
        Input: byd - enable absolute mode
        Input: ims-pcu - sanity check against missing interfaces
        Input: melfas_mip4 - add hw_version sysfs attribute
      c155c749
    • Kirill A. Shutemov's avatar
      thp: fix typo in khugepaged_scan_pmd() · 0fda2788
      Kirill A. Shutemov authored
      
      
      !PageLRU should lead to SCAN_PAGE_LRU, not SCAN_SCAN_ABORT result.
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fda2788
    • Andrey Ryabinin's avatar
      MAINTAINERS: fill entries for KASAN · 0ba1d91d
      Andrey Ryabinin authored
      
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ba1d91d
    • Nicolai Stange's avatar
      mm/filemap: generic_file_read_iter(): check for zero reads unconditionally · e7080a43
      Nicolai Stange authored
      
      
      If
       - generic_file_read_iter() gets called with a zero read length,
       - the read offset is at a page boundary,
       - IOCB_DIRECT is not set
      -  and the page in question hasn't made it into the page cache yet,
      then do_generic_file_read() will trigger a readahead with a req_size hint
      of zero.
      
      Since roundup_pow_of_two(0) is undefined, UBSAN reports
      
        UBSAN: Undefined behaviour in include/linux/log2.h:63:13
        shift exponent 64 is too large for 64-bit type 'long unsigned int'
        CPU: 3 PID: 1017 Comm: sa1 Tainted: G L 4.5.0-next-20160318+ #14
        [...]
        Call Trace:
         [...]
         [<ffffffff813ef61a>] ondemand_readahead+0x3aa/0x3d0
         [<ffffffff813ef61a>] ? ondemand_readahead+0x3aa/0x3d0
         [<ffffffff813c73bd>] ? find_get_entry+0x2d/0x210
         [<ffffffff813ef9c3>] page_cache_sync_readahead+0x63/0xa0
         [<ffffffff813cc04d>] do_generic_file_read+0x80d/0xf90
         [<ffffffff813cc955>] generic_file_read_iter+0x185/0x420
         [...]
         [<ffffffff81510b06>] __vfs_read+0x256/0x3d0
         [...]
      
      when get_init_ra_size() gets called from ondemand_readahead().
      
      The net effect is that the initial readahead size is arch dependent for
      requested read lengths of zero: for example, since
      
        1UL << (sizeof(unsigned long) * 8)
      
      evaluates to 1 on x86 while its result is 0 on ARMv7, the initial readahead
      size becomes 4 on the former and 0 on the latter.
      
      What's more, whether or not the file access timestamp is updated for zero
      length reads is decided differently for the two cases of IOCB_DIRECT
      being set or cleared: in the first case, generic_file_read_iter()
      explicitly skips updating that timestamp while in the latter case, it is
      always updated through the call to do_generic_file_read().
      
      According to POSIX, zero length reads "do not modify the last data access
      timestamp" and thus, the IOCB_DIRECT behaviour is POSIXly correct.
      
      Let generic_file_read_iter() unconditionally check the requested read
      length at its entry and return immediately with success if it is zero.
      
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7080a43
    • Alexander Potapenko's avatar
      kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2 · 9dcadd38
      Alexander Potapenko authored
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9dcadd38
    • Alexander Potapenko's avatar
      mm, kasan: stackdepot implementation. Enable stackdepot for SLAB · cd11016e
      Alexander Potapenko authored
      
      
      Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
      will allow KASAN store allocation/deallocation stack traces for memory
      chunks.  The stack traces are stored in a hash table and referenced by
      handles which reside in the kasan_alloc_meta and kasan_free_meta
      structures in the allocated memory chunks.
      
      IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
      duplication.
      
      Right now stackdepot support is only enabled in SLAB allocator.  Once
      KASAN features in SLAB are on par with those in SLUB we can switch SLUB
      to stackdepot as well, thus removing the dependency on SLUB stack
      bookkeeping, which wastes a lot of memory.
      
      This patch is based on the "mm: kasan: stack depots" patch originally
      prepared by Dmitry Chernenkov.
      
      Joonsoo has said that he plans to reuse the stackdepot code for the
      mm/page_owner.c debugging facility.
      
      [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
      [aryabinin@virtuozzo.com: comment style fixes]
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd11016e
    • Alexander Potapenko's avatar
      arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections · be7635e7
      Alexander Potapenko authored
      
      
      KASAN needs to know whether the allocation happens in an IRQ handler.
      This lets us strip everything below the IRQ entry point to reduce the
      number of unique stack traces needed to be stored.
      
      Move the definition of __irq_entry to <linux/interrupt.h> so that the
      users don't need to pull in <linux/ftrace.h>.  Also introduce the
      __softirq_entry macro which is similar to __irq_entry, but puts the
      corresponding functions to the .softirqentry.text section.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be7635e7
    • Alexander Potapenko's avatar
      mm, kasan: add GFP flags to KASAN API · 505f5dcb
      Alexander Potapenko authored
      
      
      Add GFP flags to KASAN hooks for future patches to use.
      
      This patch is based on the "mm: kasan: unified support for SLUB and SLAB
      allocators" patch originally prepared by Dmitry Chernenkov.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505f5dcb
    • Alexander Potapenko's avatar
      mm, kasan: SLAB support · 7ed2f9e6
      Alexander Potapenko authored
      
      
      Add KASAN hooks to SLAB allocator.
      
      This patch is based on the "mm: kasan: unified support for SLUB and SLAB
      allocators" patch originally prepared by Dmitry Chernenkov.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ed2f9e6
    • Alexander Potapenko's avatar
      kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right() · e6e8379c
      Alexander Potapenko authored
      
      
      This patchset implements SLAB support for KASAN
      
      Unlike SLUB, SLAB doesn't store allocation/deallocation stacks for heap
      objects, therefore we reimplement this feature in mm/kasan/stackdepot.c.
      The intention is to ultimately switch SLUB to use this implementation as
      well, which will save a lot of memory (right now SLUB bloats each object
      by 256 bytes to store the allocation/deallocation stacks).
      
      Also neither SLUB nor SLAB delay the reuse of freed memory chunks, which
      is necessary for better detection of use-after-free errors.  We
      introduce memory quarantine (mm/kasan/quarantine.c), which allows
      delayed reuse of deallocated memory.
      
      This patch (of 7):
      
      Rename kmalloc_large_oob_right() to kmalloc_pagealloc_oob_right(), as
      the test only checks the page allocator functionality.  Also reimplement
      kmalloc_large_oob_right() so that the test allocates a large enough
      chunk of memory that still does not trigger the page allocator fallback.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6e8379c
    • Tetsuo Handa's avatar
      include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill() · aaf4fb71
      Tetsuo Handa authored
      A leftover from commit c32b3cbe
      
       ("oom, PM: make OOM detection in the
      freezer path raceless").
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aaf4fb71
    • Vlastimil Babka's avatar
      mm/page_alloc: prevent merging between isolated and other pageblocks · d9dddbf5
      Vlastimil Babka authored
      Hanjun Guo has reported that a CMA stress test causes broken accounting of
      CMA and free pages:
      
      > Before the test, I got:
      > -bash-4.3# cat /proc/meminfo | grep Cma
      > CmaTotal:         204800 kB
      > CmaFree:          195044 kB
      >
      >
      > After running the test:
      > -bash-4.3# cat /proc/meminfo | grep Cma
      > CmaTotal:         204800 kB
      > CmaFree:         6602584 kB
      >
      > So the freed CMA memory is more than total..
      >
      > Also the the MemFree is more than mem total:
      >
      > -bash-4.3# cat /proc/meminfo
      > MemTotal:       16342016 kB
      > MemFree:        22367268 kB
      > MemAvailable:   22370528 kB
      
      Laura Abbott has confirmed the issue and suspected the freepage accounting
      rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
      caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
      pageblocks:
      
      > CMA isolates MAX_ORDER aligned blocks, but, during the process,
      > partialy isolated block exists. If MAX_ORDER is 11 and
      > pageblock_order is 9, two pageblocks make up MAX_ORDER
      > aligned block and I can think following scenario because pageblock
      > (un)isolation would be done one by one.
      >
      > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
      > MIGRATE_ISOLATE, respectively.
      >
      > CC -> IC -> II (Isolation)
      > II -> CI -> CC (Un-isolation)
      >
      > If some pages are freed at this intermediate state such as IC or CI,
      > that page could be merged to the other page that is resident on
      > different type of pageblock and it will cause wrong freepage count.
      
      This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
      but since it doesn't hold the zone->lock between pageblocks, a race
      window does exist.
      
      It's also likely that unexpected merging can occur between
      MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
      __free_one_page() since commit 3c605096 ("mm/page_alloc: restrict
      max order of merging on isolated pageblock").  However, we only check
      the migratetype of the pageblock where buddy merging has been initiated,
      not the migratetype of the buddy pageblock (or group of pageblocks)
      which can be MIGRATE_ISOLATE.
      
      Joonsoo has suggested checking for buddy migratetype as part of
      page_is_buddy(), but that would add extra checks in allocator hotpath
      and bloat-o-meter has shown significant code bloat (the function is
      inline).
      
      This patch reduces the bloat at some expense of more complicated code.
      The buddy-merging while-loop in __free_one_page() is initially bounded
      to pageblock_border and without any migratetype checks.  The checks are
      placed outside, bumping the max_order if merging is allowed, and
      returning to the while-loop with a statement which can't be possibly
      considered harmful.
      
      This fixes the accounting bug and also removes the arguably weird state
      in the original commit 3c605096 where buddies could be left
      unmerged.
      
      Fixes: 3c605096
      
       ("mm/page_alloc: restrict max order of merging on isolated pageblock")
      Link: https://lkml.org/lkml/2016/3/2/280
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Tested-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Debugged-by: default avatarLaura Abbott <labbott@redhat.com>
      Debugged-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9dddbf5