Skip to content
  1. Oct 29, 2018
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm · 53b3b6bb
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is going to rebuild more than drm as it adds a new helper to
        list.h for doing bulk updates. Seemed like a reasonable addition to
        me.
      
        Otherwise the usual merge window stuff lots of i915 and amdgpu, not so
        much nouveau, and piles of everything else.
      
        Core:
         - Adds a new list.h helper for doing bulk list updates for TTM.
         - Don't leak fb address in smem_start to userspace (comes with EXPORT
           workaround for people using mali out of tree hacks)
         - udmabuf device to turn memfd regions into dma-buf
         - Per-plane blend mode property
         - ref/unref replacements with get/put
         - fbdev conflicting framebuffers code cleaned up
         - host-endian format variants
         - panel orientation quirk for Acer One 10
      
        bridge:
         - TI SN65DSI86 chip support
      
        vkms:
         - GEM support.
         - Cursor support
      
        amdgpu:
         - Merge amdkfd and amdgpu into one module
         - CEC over DP AUX support
         - Picasso APU support + VCN dynamic powergating
         - Raven2 APU support
         - Vega20 enablement + kfd support
         - ACP powergating improvements
         - ABGR/XBGR display support
         - VCN jpeg support
         - xGMI support
         - DC i2c/aux cleanup
         - Ycbcr 4:2:0 support
         - GPUVM improvements
         - Powerplay and powerplay endian fixes
         - Display underflow fixes
      
        vmwgfx:
         - Move vmwgfx specific TTM code to vmwgfx
         - Split out vmwgfx buffer/resource validation code
         - Atomic operation rework
      
        bochs:
         - use more helpers
         - format/byteorder improvements
      
        qxl:
         - use more helpers
      
        i915:
         - GGTT coherency getparam
         - Turn off resource streamer API
         - More Icelake enablement + DMC firmware
         - Full PPGTT for Ivybridge, Haswell and Valleyview
         - DDB distribution based on resolution
         - Limited range DP display support
      
        nouveau:
         - CEC over DP AUX support
         - Initial HDMI 2.0 support
      
        virtio-gpu:
         - vmap support for PRIME objects
      
        tegra:
         - Initial Tegra194 support
         - DMA/IOMMU integration fixes
      
        msm:
         - a6xx perf improvements + clock prefix
         - GPU preemption optimisations
         - a6xx devfreq support
         - cursor support
      
        rockchip:
         - PX30 support
         - rgb output interface support
      
        mediatek:
         - HDMI output support on mt2701 and mt7623
      
        rcar-du:
         - Interlaced modes on Gen3
         - LVDS on R8A77980
         - D3 and E3 SoC support
      
        hisilicon:
         - misc fixes
      
        mxsfb:
         - runtime pm support
      
        sun4i:
         - R40 TCON support
         - Allwinner A64 support
         - R40 HDMI support
      
        omapdrm:
         - Driver rework changing display pipeline ordering to use common code
         - DMM memory barrier and irq fixes
         - Errata workarounds
      
        exynos:
         - out-bridge support for LVDS bridge driver
         - Samsung 16x16 tiled format support
         - Plane alpha and pixel blend mode support
      
        tilcdc:
         - suspend/resume update
      
        mali-dp:
         - misc updates"
      
      * tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm: (1382 commits)
        firmware/dmc/icl: Add missing MODULE_FIRMWARE() for Icelake.
        drm/i915/icl: Fix signal_levels
        drm/i915/icl: Fix DDI/TC port clk_off bits
        drm/i915/icl: create function to identify combophy port
        drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
        drm/i915: Large page offsets for pread/pwrite
        drm/i915/selftests: Disable shrinker across mmap-exhaustion
        drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode
        drm/i915: Fix intel_dp_mst_best_encoder()
        drm/i915: Skip vcpi allocation for MSTB ports that are gone
        drm/i915: Don't unset intel_connector->mst_port
        drm/i915: Only reset seqno if actually idle
        drm/i915: Use the correct crtc when sanitizing plane mapping
        drm/i915: Restore vblank interrupts earlier
        drm/i915: Check fb stride against plane max stride
        drm/amdgpu/vcn:Fix uninitialized symbol error
        drm: panel-orientation-quirks: Add quirk for Acer One 10 (S1003)
        drm/amd/amdgpu: Fix debugfs error handling
        drm/amdgpu: Update gc_9_0 golden settings.
        drm/amd/powerplay: update PPtable with DC BTC and Tvr SocLimit fields
        ...
      53b3b6bb
    • Linus Torvalds's avatar
      Merge tag 'vla-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 746bb4ed
      Linus Torvalds authored
      Pull VLA removal from Kees Cook:
       "Globally warn on VLA use.
      
        This turns on "-Wvla" globally now that the last few trees with their
        VLA removals have landed (crypto, block, net, and powerpc).
      
        Arnd mentioned that there may be a couple more VLAs hiding in
        hard-to-find randconfigs, but nothing big has shaken out in the last
        month or so in linux-next.
      
        We should be basically VLA-free now! Wheee. :)
      
        Summary:
      
         - Remove unused fallback for BUILD_BUG_ON (which technically contains
           a VLA)
      
         - Lift -Wvla to the top-level Makefile"
      
      * tag 'vla-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        Makefile: Globally enable VLA warning
        compiler.h: give up __compiletime_assert_fallback()
      746bb4ed
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · ac747c07
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - optimize kallsyms slightly
      
       - remove check for old CFLAGS usage
      
       - add some compiler flags unconditionally instead of evaluating
         $(call cc-option,...)
      
       - fix variable shadowing in host tools
      
       - refactor scripts/mkmakefile
      
       - refactor various makefiles
      
      * tag 'kbuild-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        modpost: Create macro to avoid variable shadowing
        ASN.1: Remove unnecessary shadowed local variable
        kbuild: use 'else ifeq' for checksrc to improve readability
        kbuild: remove unneeded link_multi_deps
        kbuild: add -Wno-unused-but-set-variable flag unconditionally
        kbuild: add -Wdeclaration-after-statement flag unconditionally
        kbuild: add -Wno-pointer-sign flag unconditionally
        modpost: remove leftover symbol prefix handling for module device table
        kbuild: simplify command line creation in scripts/mkmakefile
        kbuild: do not pass $(objtree) to scripts/mkmakefile
        kbuild: remove user ID check in scripts/mkmakefile
        kbuild: remove VERSION and PATCHLEVEL from $(objtree)/Makefile
        kbuild: add --include-dir flag only for out-of-tree build
        kbuild: remove dead code in cmd_files calculation in top Makefile
        kbuild: hide most of targets when running config or mixed targets
        kbuild: remove old check for CFLAGS use
        kbuild: prefix Makefile.dtbinst path with $(srctree) unconditionally
        kallsyms: remove left-over Blackfin code
        kallsyms: reduce size a little on 64-bit
      ac747c07
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.20-rc1' of... · f8cab69b
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest updates from Shuah Khan:
       "This Kselftest update for Linux 4.20-rc1 consists of:
      
         - Improvements to ftrace test suite from Masami Hiramatsu.
      
         - Color coded ftrace PASS / FAIL results from Steven Rostedt (VMware)
           to improve readability of reports.
      
         - watchdog Fixes and enhancement to add gettimeout and get|set
           pretimeout options from Jerry Hoemann.
      
         - Several fixes to warnings and spelling etc"
      
      * tag 'linux-kselftest-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (40 commits)
        selftests/ftrace: Strip escape sequences for log file
        selftests/ftrace: Use colored output when available
        selftests: fix warning: "_GNU_SOURCE" redefined
        selftests: kvm: Fix -Wformat warnings
        selftests/ftrace: Add color to the PASS / FAIL results
        kvm: selftests: fix spelling mistake "Insufficent" -> "Insufficient"
        selftests: gpio: Fix OUTPUT directory in Makefile
        selftests: gpio: restructure Makefile
        selftests: watchdog: Fix ioctl SET* error paths to take oneshot exit path
        selftests: watchdog: Add gettimeout and get|set pretimeout
        selftests: watchdog: Fix error message.
        selftests: watchdog: fix message when /dev/watchdog open fails
        selftests/ftrace: Add ftrace cpumask testcase
        selftests/ftrace: Add wakeup_rt tracer testcase
        selftests/ftrace: Add wakeup tracer testcase
        selftests/ftrace: Add stacktrace ftrace filter command testcase
        selftests/ftrace: Add trace_pipe testcase
        selftests/ftrace: Add function filter on module testcase
        selftests/ftrace: Add max stack tracer testcase
        selftests/ftrace: Add function profiling stat testcase
        ...
      f8cab69b
    • Linus Torvalds's avatar
      Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax · dad4f140
      Linus Torvalds authored
      Pull XArray conversion from Matthew Wilcox:
       "The XArray provides an improved interface to the radix tree data
        structure, providing locking as part of the API, specifying GFP flags
        at allocation time, eliminating preloading, less re-walking the tree,
        more efficient iterations and not exposing RCU-protected pointers to
        its users.
      
        This patch set
      
         1. Introduces the XArray implementation
      
         2. Converts the pagecache to use it
      
         3. Converts memremap to use it
      
        The page cache is the most complex and important user of the radix
        tree, so converting it was most important. Converting the memremap
        code removes the only other user of the multiorder code, which allows
        us to remove the radix tree code that supported it.
      
        I have 40+ followup patches to convert many other users of the radix
        tree over to the XArray, but I'd like to get this part in first. The
        other conversions haven't been in linux-next and aren't suitable for
        applying yet, but you can see them in the xarray-conv branch if you're
        interested"
      
      * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
        radix tree: Remove multiorder support
        radix tree test: Convert multiorder tests to XArray
        radix tree tests: Convert item_delete_rcu to XArray
        radix tree tests: Convert item_kill_tree to XArray
        radix tree tests: Move item_insert_order
        radix tree test suite: Remove multiorder benchmarking
        radix tree test suite: Remove __item_insert
        memremap: Convert to XArray
        xarray: Add range store functionality
        xarray: Move multiorder_check to in-kernel tests
        xarray: Move multiorder_shrink to kernel tests
        xarray: Move multiorder account test in-kernel
        radix tree test suite: Convert iteration test to XArray
        radix tree test suite: Convert tag_tagged_items to XArray
        radix tree: Remove radix_tree_clear_tags
        radix tree: Remove radix_tree_maybe_preload_order
        radix tree: Remove split/join code
        radix tree: Remove radix_tree_update_node_t
        page cache: Finish XArray conversion
        dax: Convert page fault handlers to XArray
        ...
      dad4f140
  2. Oct 28, 2018
    • Leonardo Bras's avatar
      modpost: Create macro to avoid variable shadowing · c2b1a922
      Leonardo Bras authored
      
      
      Create DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
      macro, allowing usage of a variable name other than the struct element name.
      Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
      the var name is the same as the struct element name.
      Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name,
      in order to avoid variable shadowing.
      
      Signed-off-by: default avatarLeonardo Bras <leobras.c@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      c2b1a922
    • Leonardo Bras's avatar
      ASN.1: Remove unnecessary shadowed local variable · 9e1e8194
      Leonardo Bras authored
      
      
      Remove an unnecessary shadowed local variable (start).
      It was used only once, with the same value it was started before
      the if block.
      
      Signed-off-by: default avatarLeonardo Bras <leobras.c@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      9e1e8194
    • Linus Torvalds's avatar
      HID: we do not randomly make new drivers 'default y' · 69d5b97c
      Linus Torvalds authored
      
      
      .. even when that "default y" is hidden syntactically as a
      
      	default !EXPERT
      
      it's wrong.
      
      The only reason something should be 'default y' is if it used to be
      built-in, and it was made configurable, and the 'default y' is just
      retaining the status quo.
      
      Altheratively, the hardware for the driver has become _so_ common that
      it really makes sense for everybody to build it.  Finally, one possible
      reason for 'default y' is because the option is not enabling any new
      code at all, but is just enabling other options (the networking people
      do this for vendor options, for example, so that you can disable whole
      vendors at a time).
      
      Clearly, none of these cases hold for the BigBen Interactive Kids'
      gamepad, and HID_BIGBEN_FF should thus most definitely not default
      to on for everybody.
      
      Cc: Hanno Zulla <kontakt@hanno.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69d5b97c
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-4.20-rc1' of git://www.linux-watchdog.org/linux-watchdog · 5ecf3e11
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - Add Armada 37xx CPU watchdog
      
       - w83627hf_wdt: Add Support for NCT6796D, NCT6797D, NCT6798D
      
       - hpwdt: several improvements
      
       - renesas_wdt: SPDX identifiers, stop when unregistering, support for
         R7S9210
      
       - rza_wdt: SPDX identifiers, support longer timeouts
      
       - core: fix null pointer dereference when releasing cdev
      
       - iTCO_wdt: Drop option vendorsupport=2
      
       - sama5d4: fix timeout-sec usage
      
       - lantiq_wdt: convert to watchdog framework
      
       - several small fixes
      
      * tag 'linux-watchdog-4.20-rc1' of git://www.linux-watchdog.org/linux-watchdog: (30 commits)
        watchdog: ts4800: release syscon device node in ts4800_wdt_probe()
        watchdog: armada_37xx_wdt: use do_div for u64 division
        documentation: watchdog: add documentation for armada-37xx-wdt
        dt-bindings: watchdog: Document armada-37xx-wdt binding
        watchdog: Add support for Armada 37xx CPU watchdog
        dt-bindings: watchdog: add mpc8xxx-wdt support
        watchdog: mpc8xxx: provide boot status
        MAINTAINERS: Fix file pattern for MEN Z069 watchdog driver
        dt-bindings: watchdog: renesas-wdt: Add support for R7S9210
        watchdog: rza_wdt: Support longer timeouts
        watchdog: hpwdt: Disable PreTimeout when Timeout is smaller
        watchdog: w83627hf_wdt: Support NCT6796D, NCT6797D, NCT6798D
        watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()
        watchdog: lantiq: add get_timeleft callback
        watchdog: lantiq: Convert to watchdog_device
        watchdog: lantiq: update register names to better match spec
        watchdog: sama5d4: fix timeout-sec usage
        watchdog: fix a small number of "watchog" typos in comments
        watchdog: rza_wdt: convert to SPDX identifiers
        watchdog: iTCO_wdt: Remove unused hooks
        ...
      5ecf3e11
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ed3f4e23
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
       "Just random driver fixups, nothing exiting"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - avoid using uninitialized variable when probing
        Input: xen-kbdfront - mark expected switch fall-through
        Input: atmel_mxt_ts - mark expected switch fall-through
        Input: cyapa - mark expected switch fall-throughs
        Input: wm97xx-ts - fix exit path
        Input: of_touchscreen - add support for touchscreen-min-x|y
        Input: Fix DIR-685 touchkeys MAINTAINERS entry
        Input: elants_i2c - use DMA safe i2c when possible
        Input: silead - try firmware reload after unsuccessful resume
        Input: st1232 - set INPUT_PROP_DIRECT property
        Input: xilinx_ps2 - convert to using %pOFn instead of device_node.name
        Input: atmel_mxt_ts - fix multiple <linux/property.h> includes
        Input: sun4i-lradc - convert to using %pOFn instead of device_node.name
        Input: pwm-vibrator - correct pwms in DT binding example
      ed3f4e23
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · c7b7eefa
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "This cycle, there were mostly non urgent fixes in drivers. I also
        finally unexported the non managed registration.
      
        Subsystem:
      
         - non devm managed registration is now removed from the driver API
      
         - all the unnecessary rtc_valid_tm() calls have been removed
      
        Drivers:
      
         - abx80X: watchdog support
      
         - cmos: fix non ACPI support
      
         - sc27xx: fix alarm support
      
         - Remove a possible sysfs race condition for ab8500, ds1307, ds1685,
           isl1208
      
         - Fix a possible race condition where an irq handler may be called
           before the rtc_device struct is allocated for mt6397, pl030,
           menelaus, armada38x"
      
      * tag 'rtc-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (54 commits)
        rtc: sc27xx: Always read normal alarm when registering RTC device
        rtc: sc27xx: Add check to see if need to enable the alarm interrupt
        rtc: sc27xx: Remove interrupts disable and clear in probe()
        rtc: sc27xx: Clear SPG value update interrupt status
        rtc: sc27xx: Set wakeup capability before registering rtc device
        rtc: s35390a: Change buf's type to u8 in s35390a_init
        rtc: ds1307: fix ds1339 wakealarm support
        rtc: ds1685: simplify getting .driver_data
        rtc: m41t80: mark expected switch fall-through
        rtc: tegra: Propagate errors from platform_get_irq()
        rtc: cmos: Remove the `use_acpi_alarm' module parameter for !ACPI
        rtc: cmos: Fix non-ACPI undefined reference to `hpet_rtc_interrupt'
        rtc: mv: let the core handle invalid alarms
        rtc: vr41xx: switch to rtc_time64_to_tm/rtc_tm_to_time64
        rtc: ab8500: remove useless check
        rtc: ab8500: let the core handle range
        rtc: ab8500: use rtc_add_group
        rtc: rs5c348: report error when time is invalid
        rtc: rs5c348: remove forward declaration
        rtc: rs5c348: remove useless label
        ...
      c7b7eefa
    • Linus Torvalds's avatar
      Merge tag 'led-fix-for-4.20-rc1' of... · e5585453
      Linus Torvalds authored
      Merge tag 'led-fix-for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
      
      Pull LED fix from Jacek Anaszewski.
      
      * tag 'led-fix-for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        leds: gpio: set led_dat->gpiod pointer for OF defined GPIO leds
      e5585453
    • Linus Torvalds's avatar
      i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array · b59dfdae
      Linus Torvalds authored
      Commit 9ee3e066
      
       ("HID: i2c-hid: override HID descriptors for certain
      devices") added a new dmi_system_id quirk table to override certain HID
      report descriptors for some systems that lack them.
      
      But the table wasn't properly terminated, causing the dmi matching to
      walk off into la-la-land, and starting to treat random data as dmi
      descriptor pointers, causing boot-time oopses if you were at all
      unlucky.
      
      Terminate the array.
      
      We really should have some way to just statically check that arrays that
      should be terminated by an empty entry actually are so.  But the HID
      people really should have caught this themselves, rather than have me
      deal with an oops during the merge window.  Tssk, tssk.
      
      Cc: Julian Sax <jsbc@gmx.de>
      Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b59dfdae
  3. Oct 27, 2018
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 345671ea
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - a few misc things
      
       - ocfs2 updates
      
       - most of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
        hugetlbfs: dirty pages as they are added to pagecache
        mm: export add_swap_extent()
        mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
        tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
        mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
        mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
        mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
        mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
        mm/gup: cache dev_pagemap while pinning pages
        Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
        mm: return zero_resv_unavail optimization
        mm: zero remaining unavailable struct pages
        tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
        tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
        tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
        tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
        mm/gup_benchmark.c: add additional pinning methods
        mm/gup_benchmark.c: time put_page()
        mm: don't raise MEMCG_OOM event due to failed high-order allocation
        mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
        ...
      345671ea
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 49040081
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "What better way to start off a weekend than with some networking bug
        fixes:
      
        1) net namespace leak in dump filtering code of ipv4 and ipv6, fixed
           by David Ahern and Bjørn Mork.
      
        2) Handle bad checksums from hardware when using CHECKSUM_COMPLETE
           properly in UDP, from Sean Tranchetti.
      
        3) Remove TCA_OPTIONS from policy validation, it turns out we don't
           consistently use nested attributes for this across all packet
           schedulers. From David Ahern.
      
        4) Fix SKB corruption in cadence driver, from Tristram Ha.
      
        5) Fix broken WoL handling in r8169 driver, from Heiner Kallweit.
      
        6) Fix OOPS in pneigh_dump_table(), from Eric Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
        net/neigh: fix NULL deref in pneigh_dump_table()
        net: allow traceroute with a specified interface in a vrf
        bridge: do not add port to router list when receives query with source 0.0.0.0
        net/smc: fix smc_buf_unuse to use the lgr pointer
        ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
        net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
        lan743x: Remove SPI dependency from Microchip group.
        drivers: net: remove <net/busy_poll.h> inclusion when not needed
        net: phy: genphy_10g_driver: Avoid NULL pointer dereference
        r8169: fix broken Wake-on-LAN from S5 (poweroff)
        octeontx2-af: Use GFP_ATOMIC under spin lock
        net: ethernet: cadence: fix socket buffer corruption problem
        net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
        net: sched: Remove TCA_OPTIONS from policy
        ice: Poll for link status change
        ice: Allocate VF interrupts and set queue map
        ice: Introduce ice_dev_onetime_setup
        net: hns3: Fix for warning uninitialized symbol hw_err_lst3
        octeontx2-af: Copy the right amount of memory
        net: udp: fix handling of CHECKSUM_COMPLETE packets
        ...
      49040081
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · a45dcff7
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Some more sparc fixups, mostly aimed at getting the allmodconfig build
        up and clean again"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Rework xchg() definition to avoid warnings.
        sparc64: Export __node_distance.
        sparc64: Make corrupted user stacks more debuggable.
      a45dcff7
    • Mike Kravetz's avatar
      hugetlbfs: dirty pages as they are added to pagecache · 22146c3c
      Mike Kravetz authored
      Some test systems were experiencing negative huge page reserve counts and
      incorrect file block counts.  This was traced to /proc/sys/vm/drop_caches
      removing clean pages from hugetlbfs file pagecaches.  When non-hugetlbfs
      explicit code removes the pages, the appropriate accounting is not
      performed.
      
      This can be recreated as follows:
       fallocate -l 2M /dev/hugepages/foo
       echo 1 > /proc/sys/vm/drop_caches
       fallocate -l 2M /dev/hugepages/foo
       grep -i huge /proc/meminfo
         AnonHugePages:         0 kB
         ShmemHugePages:        0 kB
         HugePages_Total:    2048
         HugePages_Free:     2047
         HugePages_Rsvd:    18446744073709551615
         HugePages_Surp:        0
         Hugepagesize:       2048 kB
         Hugetlb:         4194304 kB
       ls -lsh /dev/hugepages/foo
         4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
      
      To address this issue, dirty pages as they are added to pagecache.  This
      can easily be reproduced with fallocate as shown above.  Read faulted
      pages will eventually end up being marked dirty.  But there is a window
      where they are clean and could be impacted by code such as drop_caches.
      So, just dirty them all as they are added to the pagecache.
      
      Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
      Fixes: 6bda666a
      
       ("hugepages: fold find_or_alloc_pages into huge_no_page()")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMihcla Hocko <mhocko@suse.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22146c3c
    • Omar Sandoval's avatar
      mm: export add_swap_extent() · aa8aa8a3
      Omar Sandoval authored
      Btrfs currently does not support swap files because swap's use of bmap
      does not work with copy-on-write and multiple devices.  See 35054394
      ("Btrfs: stop providing a bmap operation to avoid swapfile corruptions").
      
      However, the swap code has a mechanism for the filesystem to manually add
      swap extents using add_swap_extent() from the ->swap_activate() aop.
      iomap has done this since 67482129 ("iomap: add a swapfile activation
      function").  Btrfs will do the same in a later patch, so export
      add_swap_extent().
      
      Link: http://lkml.kernel.org/r/bb1208575e02829aae51b538709476964f97b1ea.1536704650.git.osandov@fb.com
      
      
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa8aa8a3
    • Omar Sandoval's avatar
      mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS · bc4ae27d
      Omar Sandoval authored
      The SWP_FILE flag serves two purposes: to make swap_{read,write}page() go
      through the filesystem, and to make swapoff() call ->swap_deactivate().
      For Btrfs, we want the latter but not the former, so split this flag into
      two.  This makes us always call ->swap_deactivate() if ->swap_activate()
      succeeded, not just if it didn't add any swap extents itself.
      
      This also resolves the issue of the very misleading name of SWP_FILE,
      which is only used for swap files over NFS.
      
      Link: http://lkml.kernel.org/r/6d63d8668c4287a4f6d203d65696e96f80abdfc7.1536704650.git.osandov@fb.com
      
      
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc4ae27d
    • Michael Ellerman's avatar
      tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE · 91cbacc3
      Michael Ellerman authored
      Add a test for MAP_FIXED_NOREPLACE, based on some code originally by Jann
      Horn.  This would have caught the overlap bug reported by Daniel Micay.
      
      I originally suggested to Michal that we create MAP_FIXED_NOREPLACE, but
      instead of writing a selftest I spent my time bike-shedding whether it
      should be called MAP_FIXED_SAFE/NOCLOBBER/WEAK/NEW ..  mea culpa.
      
      Link: http://lkml.kernel.org/r/20181013133929.28653-1-mpe@ellerman.id.au
      
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91cbacc3
    • Andrea Arcangeli's avatar
      mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() · 7eef5f97
      Andrea Arcangeli authored
      There should be no cache left by the time we overwrite the old transhuge
      pmd with the new one.  It's already too late to flush through the virtual
      address because we already copied the page data to the new physical
      address.
      
      So flush the cache before the data copy.
      
      Also delete the "end" variable to shutoff a "unused variable" warning on
      x86 where flush_cache_range() is a noop.
      
      Link: http://lkml.kernel.org/r/20181015202311.7209-1-aarcange@redhat.com
      
      
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7eef5f97
    • Andrea Arcangeli's avatar
      mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() · 7066f0f9
      Andrea Arcangeli authored
      change_huge_pmd() after arming the numa/protnone pmd doesn't flush the TLB
      right away.  do_huge_pmd_numa_page() flushes the TLB before calling
      migrate_misplaced_transhuge_page().  By the time do_huge_pmd_numa_page()
      runs some CPU could still access the page through the TLB.
      
      change_huge_pmd() before arming the numa/protnone transhuge pmd calls
      mmu_notifier_invalidate_range_start().  So there's no need of
      mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end()
      sequence in migrate_misplaced_transhuge_page() too, because by the time
      migrate_misplaced_transhuge_page() runs, the pmd mapping has already been
      invalidated in the secondary MMUs.  It has to or if a secondary MMU can
      still write to the page, the migrate_page_copy() would lose data.
      
      However an explicit mmu_notifier_invalidate_range() is needed before
      migrate_misplaced_transhuge_page() starts copying the data of the
      transhuge page or the below can happen for MMU notifier users sharing the
      primary MMU pagetables and only implementing ->invalidate_range:
      
      CPU0		CPU1		GPU sharing linux pagetables using
                                      only ->invalidate_range
      -----------	------------	---------
      				GPU secondary MMU writes to the page
      				mapped by the transhuge pmd
      change_pmd_range()
      mmu..._range_start()
      ->invalidate_range_start() noop
      change_huge_pmd()
      set_pmd_at(numa/protnone)
      pmd_unlock()
      		do_huge_pmd_numa_page()
      		CPU TLB flush globally (1)
      		CPU cannot write to page
      		migrate_misplaced_transhuge_page()
      				GPU writes to the page...
      		migrate_page_copy()
      				...GPU stops writing to the page
      CPU TLB flush (2)
      mmu..._range_end() (3)
      ->invalidate_range_stop() noop
      ->invalidate_range()
      				GPU secondary MMU is invalidated
      				and cannot write to the page anymore
      				(too late)
      
      Just like we need a CPU TLB flush (1) because the TLB flush (2) arrives
      too late, we also need a mmu_notifier_invalidate_range() before calling
      migrate_misplaced_transhuge_page(), because the ->invalidate_range() in
      (3) also arrives too late.
      
      This requirement is the result of the lazy optimization in
      change_huge_pmd() that releases the pmd_lock without first flushing the
      TLB and without first calling mmu_notifier_invalidate_range().
      
      Even converting the removed mmu_notifier_invalidate_range_only_end() into
      a mmu_notifier_invalidate_range_end() would not have been enough to fix
      this, because it run after migrate_page_copy().
      
      After the hugepage data copy is done migrate_misplaced_transhuge_page()
      can proceed and call set_pmd_at without having to flush the TLB nor any
      secondary MMUs because the secondary MMU invalidate, just like the CPU TLB
      flush, has to happen before the migrate_page_copy() is called or it would
      be a bug in the first place (and it was for drivers using
      ->invalidate_range()).
      
      KVM is unaffected because it doesn't implement ->invalidate_range().
      
      The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and
      uses the generic migrate_pages which transitions the pte from
      numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs
      and all mmu notifiers there before copying the page.
      
      Link: http://lkml.kernel.org/r/20181013002430.698-3-aarcange@redhat.com
      
      
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7066f0f9
    • Andrea Arcangeli's avatar
      mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition · d7c33934
      Andrea Arcangeli authored
      Patch series "migrate_misplaced_transhuge_page race conditions".
      
      Aaron found a new instance of the THP MADV_DONTNEED race against
      pmdp_clear_flush* variants, that was apparently left unfixed.
      
      While looking into the race found by Aaron, I may have found two more
      issues in migrate_misplaced_transhuge_page.
      
      These race conditions would not cause kernel instability, but they'd
      corrupt userland data or leave data non zero after MADV_DONTNEED.
      
      I did only minor testing, and I don't expect to be able to reproduce this
      (especially the lack of ->invalidate_range before migrate_page_copy,
      requires the latest iommu hardware or infiniband to reproduce).  The last
      patch is noop for x86 and it needs further review from maintainers of
      archs that implement flush_cache_range() (not in CC yet).
      
      To avoid confusion, it's not the first patch that introduces the bug fixed
      in the second patch, even before removing the
      pmdp_huge_clear_flush_notify, that _notify suffix was called after
      migrate_page_copy already run.
      
      This patch (of 3):
      
      This is a corollary of ced10803 ("thp: fix MADV_DONTNEED vs.  numa
      balancing race"), 58ceeb6b ("thp: fix MADV_DONTNEED vs.  MADV_FREE
      race") and 5b7abeae ("thp: fix MADV_DONTNEED vs clear soft dirty
      race).
      
      When the above three fixes where posted Dave asked
      https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com
      but apparently this was missed.
      
      The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced
      in a54a407f ("mm: Close races between THP migration and PMD numa
      clearing").
      
      The important part of such commit is only the part where the page lock is
      not released until the first do_huge_pmd_numa_page() finished disarming
      the pagenuma/protnone.
      
      The addition of pmdp_clear_flush() wasn't beneficial to such commit and
      there's no commentary about such an addition either.
      
      I guess the pmdp_clear_flush() in such commit was added just in case for
      safety, but it ended up introducing the MADV_DONTNEED race condition found
      by Aaron.
      
      At that point in time nobody thought of such kind of MADV_DONTNEED race
      conditions yet (they were fixed later) so the code may have looked more
      robust by adding the pmdp_clear_flush().
      
      This specific race condition won't destabilize the kernel, but it can
      confuse userland because after MADV_DONTNEED the memory won't be zeroed
      out.
      
      This also optimizes the code and removes a superfluous TLB flush.
      
      [akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)]
      Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.com
      
      
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7c33934
    • Clark Williams's avatar
      mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t · 026d1eaf
      Clark Williams authored
      The static lock quarantine_lock is used in quarantine.c to protect the
      quarantine queue datastructures.  It is taken inside quarantine queue
      manipulation routines (quarantine_put(), quarantine_reduce() and
      quarantine_remove_cache()), with IRQs disabled.  This is not a problem on
      a stock kernel but is problematic on an RT kernel where spin locks are
      sleeping spinlocks, which can sleep and can not be acquired with disabled
      interrupts.
      
      Convert the quarantine_lock to a raw spinlock_t.  The usage of
      quarantine_lock is confined to quarantine.c and the work performed while
      the lock is held is used for debug purpose.
      
      [bigeasy@linutronix.de: slightly altered the commit message]
      Link: http://lkml.kernel.org/r/20181010214945.5owshc3mlrh74z4b@linutronix.de
      
      
      Signed-off-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      026d1eaf
    • Keith Busch's avatar
      mm/gup: cache dev_pagemap while pinning pages · df06b37f
      Keith Busch authored
      Getting pages from ZONE_DEVICE memory needs to check the backing device's
      live-ness, which is tracked in the device's dev_pagemap metadata.  This
      metadata is stored in a radix tree and looking it up adds measurable
      software overhead.
      
      This patch avoids repeating this relatively costly operation when
      dev_pagemap is used by caching the last dev_pagemap while getting user
      pages.  The gup_benchmark kernel self test reports this reduces time to
      get user pages to as low as 1/3 of the previous time.
      
      Link: http://lkml.kernel.org/r/20181012173040.15669-1-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df06b37f
    • Masayoshi Mizuma's avatar
      Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" · 9fd61bc9
      Masayoshi Mizuma authored
      commit 124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved") breaks movable_node kernel option because it changed
      the memory gap range to reserved memblock.  So, the node is marked as
      Normal zone even if the SRAT has Hot pluggable affinity.
      
          =====================================================================
          kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
          kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
          ...
          kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
          ...
          kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
          kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
          ...
          kernel: Movable zone start for each node
          kernel:  Node 3: 0x00001c0000000000
          kernel: Early memory node ranges
          ...
          =====================================================================
      
      The original issue is fixed by the former patches, so let's revert commit
      124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved").
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-4-msys.mizuma@gmail.com
      
      
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fd61bc9
    • Pavel Tatashin's avatar
      mm: return zero_resv_unavail optimization · ec393a0f
      Pavel Tatashin authored
      When checking for valid pfns in zero_resv_unavail(), it is not necessary
      to verify that pfns within pageblock_nr_pages ranges are valid, only the
      first one needs to be checked.  This is because memory for pages are
      allocated in contiguous chunks that contain pageblock_nr_pages struct
      pages.
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.com
      
      
      Signed-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ec393a0f
    • Naoya Horiguchi's avatar
      mm: zero remaining unavailable struct pages · 907ec5fc
      Naoya Horiguchi authored
      Patch series "mm: Fix for movable_node boot option", v3.
      
      This patch series contains a fix for the movable_node boot option issue
      which was introduced by commit 124049de ("x86/e820: put !E820_TYPE_RAM
      regions into memblock.reserved").
      
      The commit breaks the option because it changed the memory gap range to
      reserved memblock.  So, the node is marked as Normal zone even if the SRAT
      has Hot pluggable affinity.
      
      First and second patch fix the original issue which the commit tried to
      fix, then revert the commit.
      
      This patch (of 3):
      
      There is a kernel panic that is triggered when reading /proc/kpageflags on
      the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
      
        BUG: unable to handle kernel paging request at fffffffffffffffe
        PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
        Oops: 0000 [#1] SMP PTI
        CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
        RIP: 0010:stable_page_flags+0x27/0x3c0
        Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
        RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
        RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
        R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
        R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
        FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
        Call Trace:
         kpageflags_read+0xc7/0x120
         proc_reg_read+0x3c/0x60
         __vfs_read+0x36/0x170
         vfs_read+0x89/0x130
         ksys_pread64+0x71/0x90
         do_syscall_64+0x5b/0x160
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7efc42e75e23
        Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
      
      According to kernel bisection, this problem became visible due to commit
      f7f99100 which changes how struct pages are initialized.
      
      Memblock layout affects the pfn ranges covered by node/zone.  Consider
      that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
      default (no memmap= given) memblock layout is like below:
      
        MEMBLOCK configuration:
         memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x4
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
         memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      If you give memmap=1G!4G (so it just covers memory[0x2]),
      the range [0x100000000-0x13fffffff] is gone:
      
        MEMBLOCK configuration:
         memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x3
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      This causes shrinking node 0's pfn range because it is calculated by the
      address range of memblock.memory.  So some of struct pages in the gap
      range are left uninitialized.
      
      We have a function zero_resv_unavail() which does zeroing the struct pages
      outside memblock.memory, but currently it covers only the reserved
      unavailable range (i.e.  memblock.memory && !memblock.reserved).  This
      patch extends it to cover all unavailable range, which fixes the reported
      issue.
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com
      Fixes: f7f99100
      
       ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Tested-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      907ec5fc
    • Keith Busch's avatar
      tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option · 3821b76c
      Keith Busch authored
      Add a new option '-H' to the gup benchmark to help understand how hugetlb
      mapping pages compare with the default.
      
      Link: http://lkml.kernel.org/r/20181010195605.10689-6-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3821b76c
    • Keith Busch's avatar
      tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option · 0dd8666a
      Keith Busch authored
      Add a new benchmark option, -S, to request MAP_SHARED.  This can be used
      to compare with MAP_PRIVATE, or for files that require this option, like
      dax.
      
      Link: http://lkml.kernel.org/r/20181010195605.10689-5-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0dd8666a
    • Keith Busch's avatar
      tools/testing/selftests/vm/gup_benchmark.c: allow user specified file · aeb85ed4
      Keith Busch authored
      Allow a user to specify a file to map by adding a new option, '-f',
      providing a means to test various file backings.
      
      If not specified, the benchmark will use a private mapping of /dev/zero,
      which produces an anonymous mapping as before.
      
      [akpm@linux-foundation.org: avoid using comma operator]
      Link: http://lkml.kernel.org/r/20181010195605.10689-4-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aeb85ed4
    • Keith Busch's avatar
      tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage · 319e0bec
      Keith Busch authored
      If the '-w' parameter was provided, the benchmark would exit due to a
      mssing 'break'.
      
      Link: http://lkml.kernel.org/r/20181010195605.10689-3-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      319e0bec
    • Keith Busch's avatar
      mm/gup_benchmark.c: add additional pinning methods · 714a3a1e
      Keith Busch authored
      Provide new gup benchmark ioctl commands to run different user page
      pinning methods, get_user_pages_longterm() and get_user_pages(), in
      addition to the existing get_user_pages_fast().
      
      Link: http://lkml.kernel.org/r/20181010195605.10689-2-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      714a3a1e
    • Keith Busch's avatar
      mm/gup_benchmark.c: time put_page() · 26db3d09
      Keith Busch authored
      We'd like to measure time to unpin user pages, so this adds a second
      benchmark timer on put_page, separate from get_page.
      
      Adding the field breaks this ioctl ABI, but should be okay since this an
      in-tree kernel selftest.
      
      [akpm@linux-foundation.org: add expansion to struct gup_benchmark for future use]
      Link: http://lkml.kernel.org/r/20181010195605.10689-1-keith.busch@intel.com
      
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26db3d09
    • Roman Gushchin's avatar
      mm: don't raise MEMCG_OOM event due to failed high-order allocation · 7a1adfdd
      Roman Gushchin authored
      It was reported that on some of our machines containers were restarted
      with OOM symptoms without an obvious reason.  Despite there were almost no
      memory pressure and plenty of page cache, MEMCG_OOM event was raised
      occasionally, causing the container management software to think, that OOM
      has happened.  However, no tasks have been killed.
      
      The following investigation showed that the problem is caused by a failing
      attempt to charge a high-order page.  In such case, the OOM killer is
      never invoked.  As shown below, it can happen under conditions, which are
      very far from a real OOM: e.g.  there is plenty of clean page cache and no
      memory pressure.
      
      There is no sense in raising an OOM event in this case, as it might
      confuse a user and lead to wrong and excessive actions (e.g.  restart the
      workload, as in my case).
      
      Let's look at the charging path in try_charge().  If the memory usage is
      about memory.max, which is absolutely natural for most memory cgroups, we
      try to reclaim some pages.  Even if we were able to reclaim enough memory
      for the allocation, the following check can fail due to a race with
      another concurrent allocation:
      
          if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
              goto retry;
      
      For regular pages the following condition will save us from triggering
      the OOM:
      
         if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
             goto retry;
      
      But for high-order allocation this condition will intentionally fail.  The
      reason behind is that we'll likely fall to regular pages anyway, so it's
      ok and even preferred to return ENOMEM.
      
      In this case the idea of raising MEMCG_OOM looks dubious.
      
      Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation
      order check, so that the event won't be raised for high order allocations.
      This change doesn't affect regular pages allocation and charging.
      
      Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com
      
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a1adfdd
    • Dave Chinner's avatar
      mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock · 64081362
      Dave Chinner authored
      We've recently seen a workload on XFS filesystems with a repeatable
      deadlock between background writeback and a multi-process application
      doing concurrent writes and fsyncs to a small range of a file.
      
      range_cyclic
      writeback		Process 1		Process 2
      
      xfs_vm_writepages
        write_cache_pages
          writeback_index = 2
          cycled = 0
          ....
          find page 2 dirty
          lock Page 2
          ->writepage
            page 2 writeback
            page 2 clean
            page 2 added to bio
          no more pages
      			write()
      			locks page 1
      			dirties page 1
      			locks page 2
      			dirties page 1
      			fsync()
      			....
      			xfs_vm_writepages
      			write_cache_pages
      			  start index 0
      			  find page 1 towrite
      			  lock Page 1
      			  ->writepage
      			    page 1 writeback
      			    page 1 clean
      			    page 1 added to bio
      			  find page 2 towrite
      			  lock Page 2
      			  page 2 is writeback
      			  <blocks>
      						write()
      						locks page 1
      						dirties page 1
      						fsync()
      						....
      						xfs_vm_writepages
      						write_cache_pages
      						  start index 0
      
          !done && !cycled
            sets index to 0, restarts lookup
          find page 1 dirty
      						  find page 1 towrite
      						  lock Page 1
      						  page 1 is writeback
      						  <blocks>
      
          lock Page 1
          <blocks>
      
      DEADLOCK because:
      
      	- process 1 needs page 2 writeback to complete to make
      	  enough progress to issue IO pending for page 1
      	- writeback needs page 1 writeback to complete so process 2
      	  can progress and unlock the page it is blocked on, then it
      	  can issue the IO pending for page 2
      	- process 2 can't make progress until process 1 issues IO
      	  for page 1
      
      The underlying cause of the problem here is that range_cyclic writeback is
      processing pages in descending index order as we hold higher index pages
      in a structure controlled from above write_cache_pages().  The
      write_cache_pages() caller needs to be able to submit these pages for IO
      before write_cache_pages restarts writeback at mapping index 0 to avoid
      wcp inverting the page lock/writeback wait order.
      
      generic_writepages() is not susceptible to this bug as it has no private
      context held across write_cache_pages() - filesystems using this
      infrastructure always submit pages in ->writepage immediately and so there
      is no problem with range_cyclic going back to mapping index 0.
      
      However:
      	mpage_writepages() has a private bio context,
      	exofs_writepages() has page_collect
      	fuse_writepages() has fuse_fill_wb_data
      	nfs_writepages() has nfs_pageio_descriptor
      	xfs_vm_writepages() has xfs_writepage_ctx
      
      All of these ->writepages implementations can hold pages under writeback
      in their private structures until write_cache_pages() returns, and hence
      they are all susceptible to this deadlock.
      
      Also worth noting is that ext4 has it's own bastardised version of
      write_cache_pages() and so it /may/ have an equivalent deadlock.  I looked
      at the code long enough to understand that it has a similar retry loop for
      range_cyclic writeback reaching the end of the file and then promptly ran
      away before my eyes bled too much.  I'll leave it for the ext4 developers
      to determine if their code is actually has this deadlock and how to fix it
      if it has.
      
      There's a few ways I can see avoid this deadlock.  There's probably more,
      but these are the first I've though of:
      
      1. get rid of range_cyclic altogether
      
      2. range_cyclic always stops at EOF, and we start again from
      writeback index 0 on the next call into write_cache_pages()
      
      2a. wcp also returns EAGAIN to ->writepages implementations to
      indicate range cyclic has hit EOF. writepages implementations can
      then flush the current context and call wpc again to continue. i.e.
      lift the retry into the ->writepages implementation
      
      3. range_cyclic uses trylock_page() rather than lock_page(), and it
      skips pages it can't lock without blocking. It will already do this
      for pages under writeback, so this seems like a no-brainer
      
      3a. all non-WB_SYNC_ALL writeback uses trylock_page() to avoid
      blocking as per pages under writeback.
      
      I don't think #1 is an option - range_cyclic prevents frequently
      dirtied lower file offset from starving background writeback of
      rarely touched higher file offsets.
      
      #2 is simple, and I don't think it will have any impact on
      performance as going back to the start of the file implies an
      immediate seek. We'll have exactly the same number of seeks if we
      switch writeback to another inode, and then come back to this one
      later and restart from index 0.
      
      #2a is pretty much "status quo without the deadlock". Moving the
      retry loop up into the wcp caller means we can issue IO on the
      pending pages before calling wcp again, and so avoid locking or
      waiting on pages in the wrong order. I'm not convinced we need to do
      this given that we get the same thing from #2 on the next writeback
      call from the writeback infrastructure.
      
      #3 is really just a band-aid - it doesn't fix the access/wait
      inversion problem, just prevents it from becoming a deadlock
      situation. I'd prefer we fix the inversion, not sweep it under the
      carpet like this.
      
      #3a is really an optimisation that just so happens to include the
      band-aid fix of #3.
      
      So it seems that the simplest way to fix this issue is to implement
      solution #2
      
      Link: http://lkml.kernel.org/r/20181005054526.21507-1-david@fromorbit.com
      
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64081362
    • Pavel Tatashin's avatar
      mm: move mirrored memory specific code outside of memmap_init_zone · a9a9e77f
      Pavel Tatashin authored
      memmap_init_zone, is getting complex, because it is called from different
      contexts: hotplug, and during boot, and also because it must handle some
      architecture quirks.  One of them is mirrored memory.
      
      Move the code that decides whether to skip mirrored memory outside of
      memmap_init_zone, into a separate function.
      
      [pasha.tatashin@oracle.com: uninline overlap_memmap_init()]
        Link: http://lkml.kernel.org/r/20180726193509.3326-4-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180724235520.10200-4-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a9e77f
    • Pavel Tatashin's avatar
      mm: calculate deferred pages after skipping mirrored memory · d3035be4
      Pavel Tatashin authored
      update_defer_init() should be called only when struct page is about to be
      initialized. Because it counts number of initialized struct pages, but
      there we may skip struct pages if there is some mirrored memory.
      
      So move, update_defer_init() after checking for mirrored memory.
      
      Also, rename update_defer_init() to defer_init() and reverse the return
      boolean to emphasize that this is a boolean function, that tells that the
      reset of memmap initialization should be deferred.
      
      Make this function self-contained: do not pass number of already
      initialized pages in this zone by using static counters.
      
      I found this bug by reading the code.  The effect is that fewer than
      expected struct pages are initialized early in boot, and it is possible
      that in some corner cases we may fail to boot when mirrored pages are
      used.  The deferred on demand code should somewhat mitigate this.  But
      this still brings some inconsistencies compared to when booting without
      mirrored pages, so it is better to fix.
      
      [pasha.tatashin@oracle.com: add comment about defer_init's lack of locking]
        Link: http://lkml.kernel.org/r/20180726193509.3326-3-pasha.tatashin@oracle.com
      [akpm@linux-foundation.org: make defer_init non-inline, __meminit]
      Link: http://lkml.kernel.org/r/20180724235520.10200-3-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3035be4
    • Pavel Tatashin's avatar
      mm: make memmap_init a proper function · dfb3ccd0
      Pavel Tatashin authored
      memmap_init is sometimes a macro sometimes a function based on
      __HAVE_ARCH_MEMMAP_INIT.  It is only a function on ia64.  Make memmap_init
      a weak function instead, and let ia64 redefine it.
      
      Link: http://lkml.kernel.org/r/20180724235520.10200-2-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfb3ccd0
    • Kirill Tkhai's avatar
      mm/memcontrol.c: convert mem_cgroup_id::ref to refcount_t type · 1c2d479a
      Kirill Tkhai authored
      This will allow to use generic refcount_t interfaces to check counters
      overflow instead of currently existing VM_BUG_ON().  The only difference
      after the patch is VM_BUG_ON() may cause BUG(), while refcount_t fires
      with WARN().  But this seems not to be significant here, since such the
      problems are usually caught by syzbot with panic-on-warn enabled.
      
      Link: http://lkml.kernel.org/r/153910718919.7006.13400779039257185427.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c2d479a