Skip to content
  1. Oct 05, 2017
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · b7e14164
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "A lot of stuff, sorry about that. A week on a beach, then a bunch of
        time catching up then more time letting it bake in -next. Shan't do
        that again!"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (51 commits)
        include/linux/fs.h: fix comment about struct address_space
        checkpatch: fix ignoring cover-letter logic
        m32r: fix build failure
        lib/ratelimit.c: use deferred printk() version
        kernel/params.c: improve STANDARD_PARAM_DEF readability
        kernel/params.c: fix an overflow in param_attr_show
        kernel/params.c: fix the maximum length in param_get_string
        mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
        mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
        kernel/kcmp.c: drop branch leftover typo
        memremap: add scheduling point to devm_memremap_pages
        mm, page_alloc: add scheduling point to memmap_init_zone
        mm, memory_hotplug: add scheduling point to __add_pages
        lib/idr.c: fix comment for idr_replace()
        mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
        kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
        include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
        lib/lz4: make arrays static const, reduces object code size
        exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
        exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
        ...
      b7e14164
    • Linus Torvalds's avatar
      Merge branch 'fixes-v4.14-rc4' of... · 6c795b30
      Linus Torvalds authored
      Merge branch 'fixes-v4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      
      Pull smack fix from James Morris:
       "It fixes a bug in xattr_getsecurity() where security_release_secctx()
        was being called instead of kfree(), which leads to a memory leak in
        the capabilities code. smack_inode_getsecurity is also fixed to behave
        correctly when called from there"
      
      * 'fixes-v4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
      6c795b30
  2. Oct 04, 2017
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 013a8ee6
      Linus Torvalds authored
      Pull tracing fixlets from Steven Rostedt:
       "Two updates:
      
         - A memory fix with left over code from spliting out ftrace_ops and
           function graph tracer, where the function graph tracer could reset
           the trampoline pointer, leaving the old trampoline not to be freed
           (memory leak).
      
         - The update to Paul's patch that added the unnecessary READ_ONCE().
           This removes the unnecessary READ_ONCE() instead of having to
           rebase the branch to update the patch that added it"
      
      * tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
        ftrace: Fix kmemleak in unregister_ftrace_graph
      013a8ee6
    • Casey Schaufler's avatar
      lsm: fix smack_inode_removexattr and xattr_getsecurity memleak · 57e7ba04
      Casey Schaufler authored
      
      
      security_inode_getsecurity() provides the text string value
      of a security attribute. It does not provide a "secctx".
      The code in xattr_getsecurity() that calls security_inode_getsecurity()
      and then calls security_release_secctx() happened to work because
      SElinux and Smack treat the attribute and the secctx the same way.
      It fails for cap_inode_getsecurity(), because that module has no
      secctx that ever needs releasing. It turns out that Smack is the
      one that's doing things wrong by not allocating memory when instructed
      to do so by the "alloc" parameter.
      
      The fix is simple enough. Change the security_release_secctx() to
      kfree() because it isn't a secctx being returned by
      security_inode_getsecurity(). Change Smack to allocate the string when
      told to do so.
      
      Note: this also fixes memory leaks for LSMs which implement
      inode_getsecurity but not release_secctx, such as capabilities.
      
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      57e7ba04
    • Mike Rapoport's avatar
      include/linux/fs.h: fix comment about struct address_space · 32e57c29
      Mike Rapoport authored
      Before commit 9c5d760b
      
       ("mm: split gfp_mask and mapping flags into
      separate fields") the private_* fields of struct adrress_space were
      grouped together and using "ditto" in comments describing the last
      fields was correct.
      
      With introduction of gpf_mask between private_lock and private_list
      "ditto" references the wrong description.
      
      Fix it by using the elaborate description.
      
      Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.com
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32e57c29
    • Stafford Horne's avatar
      checkpatch: fix ignoring cover-letter logic · a08ffbef
      Stafford Horne authored
      
      
      Currently running checkpatch on a directory with a cover-letter.patch
      file reports the following error:
      
        -----------------------------------------
        patches/smp-v2/v2-0000-cover-letter.patch
        -----------------------------------------
      
        ERROR: Does not appear to be a unified-diff format patch
      
      The logic to suppress the unified-diff check for cover letters is there
      but is checking $file instead of $filename.  Fix the variable to use the
      correct one.
      
      Link: http://lkml.kernel.org/r/20170909090406.31523-1-shorne@gmail.com
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a08ffbef
    • Sudip Mukherjee's avatar
      m32r: fix build failure · d22e3d69
      Sudip Mukherjee authored
      
      
      The allmodconfig build of m32r is failing with the error:
      
        lib/mpi/mpih-div.o: In function 'mpihelp_divrem':
        mpih-div.c:(.text+0x40): undefined reference to 'abort'
        mpih-div.c:(.text+0x40): relocation truncated to fit:
      	R_M32R_26_PCREL_RELA against undefined symbol 'abort'
      
      The function 'abort' was never defined for the m32r architecture.
      
      Create 'abort' as is done in other arch like 'arm' and 'unicore32'.
      
      Link: http://lkml.kernel.org/r/1506727220-6108-1-git-send-email-sudip.mukherjee@codethink.co.uk
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d22e3d69
    • Sergey Senozhatsky's avatar
      lib/ratelimit.c: use deferred printk() version · 656d61ce
      Sergey Senozhatsky authored
      printk_ratelimit() invokes ___ratelimit() which may invoke a normal
      printk() (pr_warn() in this particular case) to warn about suppressed
      output.  Given that printk_ratelimit() may be called from anywhere, that
      pr_warn() is dangerous - it may end up deadlocking the system.  Fix
      ___ratelimit() by using deferred printk().
      
      Sasha reported the following lockdep error:
      
       : Unregister pv shared memory for cpu 8
       : select_fallback_rq: 3 callbacks suppressed
       : process 8583 (trinity-c78) no longer affine to cpu8
       :
       : ======================================================
       : WARNING: possible circular locking dependency detected
       : 4.14.0-rc2-next-20170927+ #252 Not tainted
       : ------------------------------------------------------
       : migration/8/62 is trying to acquire lock:
       : (&port_lock_key){-.-.}, at: serial8250_console_write()
       :
       : but task is already holding lock:
       : (&rq->lock){-.-.}, at: sched_cpu_dying()
       :
       : which lock already depends on the new lock.
       :
       :
       : the existing dependency chain (in reverse order) is:
       :
       : -> #3 (&rq->lock){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock()
       : task_fork_fair()
       : sched_fork()
       : copy_process.part.31()
       : _do_fork()
       : kernel_thread()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #2 (&p->pi_lock){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : try_to_wake_up()
       : default_wake_function()
       : woken_wake_function()
       : __wake_up_common()
       : __wake_up_common_lock()
       : __wake_up()
       : tty_wakeup()
       : tty_port_default_wakeup()
       : tty_port_tty_wakeup()
       : uart_write_wakeup()
       : serial8250_tx_chars()
       : serial8250_handle_irq.part.25()
       : serial8250_default_handle_irq()
       : serial8250_interrupt()
       : __handle_irq_event_percpu()
       : handle_irq_event_percpu()
       : handle_irq_event()
       : handle_level_irq()
       : handle_irq()
       : do_IRQ()
       : ret_from_intr()
       : native_safe_halt()
       : default_idle()
       : arch_cpu_idle()
       : default_idle_call()
       : do_idle()
       : cpu_startup_entry()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #1 (&tty->write_wait){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : __wake_up_common_lock()
       : __wake_up()
       : tty_wakeup()
       : tty_port_default_wakeup()
       : tty_port_tty_wakeup()
       : uart_write_wakeup()
       : serial8250_tx_chars()
       : serial8250_handle_irq.part.25()
       : serial8250_default_handle_irq()
       : serial8250_interrupt()
       : __handle_irq_event_percpu()
       : handle_irq_event_percpu()
       : handle_irq_event()
       : handle_level_irq()
       : handle_irq()
       : do_IRQ()
       : ret_from_intr()
       : native_safe_halt()
       : default_idle()
       : arch_cpu_idle()
       : default_idle_call()
       : do_idle()
       : cpu_startup_entry()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #0 (&port_lock_key){-.-.}:
       : check_prev_add()
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : serial8250_console_write()
       : univ8250_console_write()
       : console_unlock()
       : vprintk_emit()
       : vprintk_default()
       : vprintk_func()
       : printk()
       : ___ratelimit()
       : __printk_ratelimit()
       : select_fallback_rq()
       : sched_cpu_dying()
       : cpuhp_invoke_callback()
       : take_cpu_down()
       : multi_cpu_stop()
       : cpu_stopper_thread()
       : smpboot_thread_fn()
       : kthread()
       : ret_from_fork()
       :
       : other info that might help us debug this:
       :
       : Chain exists of:
       :   &port_lock_key --> &p->pi_lock --> &rq->lock
       :
       :  Possible unsafe locking scenario:
       :
       :        CPU0                    CPU1
       :        ----                    ----
       :   lock(&rq->lock);
       :                                lock(&p->pi_lock);
       :                                lock(&rq->lock);
       :   lock(&port_lock_key);
       :
       :  *** DEADLOCK ***
       :
       : 4 locks held by migration/8/62:
       : #0: (&p->pi_lock){-.-.}, at: sched_cpu_dying()
       : #1: (&rq->lock){-.-.}, at: sched_cpu_dying()
       : #2: (printk_ratelimit_state.lock){....}, at: ___ratelimit()
       : #3: (console_lock){+.+.}, at: vprintk_emit()
       :
       : stack backtrace:
       : CPU: 8 PID: 62 Comm: migration/8 Not tainted 4.14.0-rc2-next-20170927+ #252
       : Call Trace:
       : dump_stack()
       : print_circular_bug()
       : check_prev_add()
       : ? add_lock_to_list.isra.26()
       : ? check_usage()
       : ? kvm_clock_read()
       : ? kvm_sched_clock_read()
       : ? sched_clock()
       : ? check_preemption_disabled()
       : __lock_acquire()
       : ? __lock_acquire()
       : ? add_lock_to_list.isra.26()
       : ? debug_check_no_locks_freed()
       : ? memcpy()
       : lock_acquire()
       : ? serial8250_console_write()
       : _raw_spin_lock_irqsave()
       : ? serial8250_console_write()
       : serial8250_console_write()
       : ? serial8250_start_tx()
       : ? lock_acquire()
       : ? memcpy()
       : univ8250_console_write()
       : console_unlock()
       : ? __down_trylock_console_sem()
       : vprintk_emit()
       : vprintk_default()
       : vprintk_func()
       : printk()
       : ? show_regs_print_info()
       : ? lock_acquire()
       : ___ratelimit()
       : __printk_ratelimit()
       : select_fallback_rq()
       : sched_cpu_dying()
       : ? sched_cpu_starting()
       : ? rcutree_dying_cpu()
       : ? sched_cpu_starting()
       : cpuhp_invoke_callback()
       : ? cpu_disable_common()
       : take_cpu_down()
       : ? trace_hardirqs_off_caller()
       : ? cpuhp_invoke_callback()
       : multi_cpu_stop()
       : ? __this_cpu_preempt_check()
       : ? cpu_stop_queue_work()
       : cpu_stopper_thread()
       : ? cpu_stop_create()
       : smpboot_thread_fn()
       : ? sort_range()
       : ? schedule()
       : ? __kthread_parkme()
       : kthread()
       : ? sort_range()
       : ? kthread_create_on_node()
       : ret_from_fork()
       : process 9121 (trinity-c78) no longer affine to cpu8
       : smpboot: CPU 8 is now offline
      
      Link: http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
      Fixes: 6b1d174b
      
       ("ratelimit: extend to print suppressed messages on release")
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      656d61ce
    • Jean Delvare's avatar
      kernel/params.c: improve STANDARD_PARAM_DEF readability · e0596c80
      Jean Delvare authored
      
      
      Align the parameters passed to STANDARD_PARAM_DEF for clarity.
      
      Link: http://lkml.kernel.org/r/20170928162728.756143cc@endymion
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0596c80
    • Jean Delvare's avatar
      kernel/params.c: fix an overflow in param_attr_show · 96802e6b
      Jean Delvare authored
      
      
      Function param_attr_show could overflow the buffer it is operating on.
      
      The buffer size is PAGE_SIZE, and the string returned by
      attribute->param->ops->get is generated by scnprintf(buffer, PAGE_SIZE,
      ...) so it could be PAGE_SIZE - 1 long, with the terminating '\0' at the
      very end of the buffer.  Calling strcat(..., "\n") on this isn't safe, as
      the '\0' will be replaced by '\n' (OK) and then another '\0' will be added
      past the end of the buffer (not OK.)
      
      Simply add the trailing '\n' when writing the attribute contents to the
      buffer originally.  This is safe, and also faster.
      
      Credits to Teradata for discovering this issue.
      
      Link: http://lkml.kernel.org/r/20170928162602.60c379c7@endymion
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96802e6b
    • Jean Delvare's avatar
      kernel/params.c: fix the maximum length in param_get_string · 90ceb2a3
      Jean Delvare authored
      
      
      The length parameter of strlcpy() is supposed to reflect the size of the
      target buffer, not of the source string.  Harmless in this case as the
      buffer is PAGE_SIZE long and the source string is always much shorter than
      this, but conceptually wrong, so let's fix it.
      
      Link: http://lkml.kernel.org/r/20170928162515.24846b4f@endymion
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ceb2a3
    • YASUAKI ISHIMATSU's avatar
      mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long · d09b0137
      YASUAKI ISHIMATSU authored
      find_{smallest|biggest}_section_pfn()s find the smallest/biggest section
      and return the pfn of the section.  But the functions are defined as int.
      So the functions always return 0x00000000 - 0xffffffff.  It means if
      memory address is over 16TB, the functions does not work correctly.
      
      To handle 64 bit value, the patch defines
      find_{smallest|biggest}_section_pfn() as unsigned long.
      
      Fixes: 815121d2
      
       ("memory_hotplug: clear zone when removing the memory")
      Link: http://lkml.kernel.org/r/d9d5593a-d0a4-c4be-ab08-493df59a85c6@gmail.com
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d09b0137
    • YASUAKI ISHIMATSU's avatar
      mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function · 1dd2bfc8
      YASUAKI ISHIMATSU authored
      pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
      pfn_to_section_nr() has no issue even if it is defined as macro.  But
      section_nr_to_pfn() has overflow issue if sec is defined as int.
      
      section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT.  If sec is
      defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
      But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
      value.
      
      __remove_section() calculates start_pfn using section_nr_to_pfn() and
      scn_nr defined as int.  So if hot-removed memory address is over 16TB,
      overflow issue occurs and section_nr_to_pfn() does not calculate correct
      pfn.
      
      To make callers use proper arg, the patch changes the macros to inline
      functions.
      
      Fixes: 815121d2
      
       ("memory_hotplug: clear zone when removing the memory")
      Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.com
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1dd2bfc8
    • Cyrill Gorcunov's avatar
      kernel/kcmp.c: drop branch leftover typo · c9653850
      Cyrill Gorcunov authored
      The else branch been left over and escaped the source code refresh.  Not
      a problem but better clean it up.
      
      Fixes: 0791e364
      
       ("kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files")
      Link: http://lkml.kernel.org/r/20170917165838.GA1887@uranus.lan
      Reported-by: default avatarEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9653850
    • Michal Hocko's avatar
      memremap: add scheduling point to devm_memremap_pages · 1fdcce6e
      Michal Hocko authored
      devm_memremap_pages is initializing struct pages in for_each_device_pfn
      and that can take quite some time.  We have even seen a soft lockup
      triggering on a non preemptive kernel
      
        NMI watchdog: BUG: soft lockup - CPU#61 stuck for 22s! [kworker/u641:11:1808]
        [...]
        RIP: 0010:[<ffffffff8118b6b7>]  [<ffffffff8118b6b7>] devm_memremap_pages+0x327/0x430
        [...]
        Call Trace:
          pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
          nvdimm_bus_probe+0x64/0x110 [libnvdimm]
          driver_probe_device+0x1f7/0x420
          bus_for_each_drv+0x52/0x80
          __device_attach+0xb0/0x130
          bus_probe_device+0x87/0xa0
          device_add+0x3fc/0x5f0
          nd_async_device_register+0xe/0x40 [libnvdimm]
          async_run_entry_fn+0x43/0x150
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xc7/0xe0
          ret_from_fork+0x3f/0x70
      
      fix this by adding cond_resched every 1024 pages.
      
      Link: http://lkml.kernel.org/r/20170918121410.24466-4-mhocko@kernel.org
      Signed-off-by: Michal Hock...
      1fdcce6e
    • Michal Hocko's avatar
      mm, page_alloc: add scheduling point to memmap_init_zone · 9b6e63cb
      Michal Hocko authored
      
      
      memmap_init_zone gets a pfn range to initialize and it can be really
      large resulting in a soft lockup on non-preemptible kernels
      
        NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720]
        [...]
        task: ffff88ecd7e902c0 ti: ffff88eca4e50000 task.ti: ffff88eca4e50000
        RIP: move_pfn_range_to_zone+0x185/0x1d0
        [...]
        Call Trace:
          devm_memremap_pages+0x2c7/0x430
          pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
          nvdimm_bus_probe+0x64/0x110 [libnvdimm]
          driver_probe_device+0x1f7/0x420
          bus_for_each_drv+0x52/0x80
          __device_attach+0xb0/0x130
          bus_probe_device+0x87/0xa0
          device_add+0x3fc/0x5f0
          nd_async_device_register+0xe/0x40 [libnvdimm]
          async_run_entry_fn+0x43/0x150
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xc7/0xe0
          ret_from_fork+0x3f/0x70
      
      Fix this by adding a scheduling point once per page block.
      
      Link: http://lkml.kernel.org/r/20170918121410.24466-3-mhocko@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b6e63cb
    • Michal Hocko's avatar
      mm, memory_hotplug: add scheduling point to __add_pages · f64ac5e6
      Michal Hocko authored
      
      
      Patch series "mm, memory_hotplug: fix few soft lockups in memory
      hotadd".
      
      Johannes has noticed few soft lockups when adding a large nvdimm device.
      All of them were caused by a long loop without any explicit cond_resched
      which is a problem for !PREEMPT kernels.
      
      The fix is quite straightforward.  Just make sure that cond_resched gets
      called from time to time.
      
      This patch (of 3):
      
      __add_pages gets a pfn range to add and there is no upper bound for a
      single call.  This is usually a memory block aligned size for the
      regular memory hotplug - smaller sizes are usual for memory balloning
      drivers, or the whole NUMA node for physical memory online.  There is no
      explicit scheduling point in that code path though.
      
      This can lead to long latencies while __add_pages is executed and we
      have even seen a soft lockup report during nvdimm initialization with
      !PREEMPT kernel
      
        NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [kworker/u641:3:832]
        [...]
        Workqueue: events_unbound async_run_entry_fn
        task: ffff881809270f40 ti: ffff881809274000 task.ti: ffff881809274000
        RIP: _raw_spin_unlock_irqrestore+0x11/0x20
        RSP: 0018:ffff881809277b10  EFLAGS: 00000286
        [...]
        Call Trace:
          sparse_add_one_section+0x13d/0x18e
          __add_pages+0x10a/0x1d0
          arch_add_memory+0x4a/0xc0
          devm_memremap_pages+0x29d/0x430
          pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
          nvdimm_bus_probe+0x64/0x110 [libnvdimm]
          driver_probe_device+0x1f7/0x420
          bus_for_each_drv+0x52/0x80
          __device_attach+0xb0/0x130
          bus_probe_device+0x87/0xa0
          device_add+0x3fc/0x5f0
          nd_async_device_register+0xe/0x40 [libnvdimm]
          async_run_entry_fn+0x43/0x150
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xc7/0xe0
          ret_from_fork+0x3f/0x70
        DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
      
      Fix this by adding cond_resched once per each memory section in the
      given pfn range.  Each section is constant amount of work which itself
      is not too expensive but many of them will just add up.
      
      Link: http://lkml.kernel.org/r/20170918121410.24466-2-mhocko@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f64ac5e6
    • Eric Biggers's avatar
      lib/idr.c: fix comment for idr_replace() · a70e43a5
      Eric Biggers authored
      
      
      idr_replace() returns the old value on success, not 0.
      
      Link: http://lkml.kernel.org/r/20170918162642.37511-1-ebiggers3@gmail.com
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a70e43a5
    • Johannes Weiner's avatar
      mm: memcontrol: use vmalloc fallback for large kmem memcg arrays · f80c7dab
      Johannes Weiner authored
      For quick per-memcg indexing, slab caches and list_lru structures
      maintain linear arrays of descriptors.  As the number of concurrent
      memory cgroups in the system goes up, this requires large contiguous
      allocations (8k cgroups = order-5, 16k cgroups = order-6 etc.) for every
      existing slab cache and list_lru, which can easily fail on loaded
      systems.  E.g.:
      
        mkdir: page allocation failure: order:5, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
        CPU: 1 PID: 6399 Comm: mkdir Not tainted 4.13.0-mm1-00065-g720bbe532b7c-dirty #481
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
        Call Trace:
         ? __alloc_pages_direct_compact+0x4c/0x110
         __alloc_pages_nodemask+0xf50/0x1430
         alloc_pages_current+0x60/0xc0
         kmalloc_order_trace+0x29/0x1b0
         __kmalloc+0x1f4/0x320
         memcg_update_all_list_lrus+0xca/0x2e0
         mem_cgroup_css_alloc+0x612/0x670
         cgroup_apply_control_enable+0x19e/0x360
      ...
      f80c7dab
    • Luis R. Rodriguez's avatar
      kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv() · 3181c38e
      Luis R. Rodriguez authored
      do_proc_douintvec_conv() has two UINT_MAX checks, we can remove one.
      This has no functional changes other than fixing a compiler warning:
      
        kernel/sysctl.c:2190]: (warning) Identical condition '*lvalp>UINT_MAX', second condition is always false
      
      Fixes: 4f2fec00
      
       ("sysctl: simplify unsigned int support")
      Link: http://lkml.kernel.org/r/20170919072918.12066-1-mcgrof@kernel.org
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Reported-by: default avatarDavid Binderman <dcb314@hotmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3181c38e
    • Masahiro Yamada's avatar
      include/linux/bitfield.h: remove 32bit from FIELD_GET comment block · 72407674
      Masahiro Yamada authored
      
      
      I do not see anything that restricts this macro to 32 bit width.
      
      Link: http://lkml.kernel.org/r/1505921975-23379-1-git-send-email-yamada.masahiro@socionext.com
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72407674
    • Colin Ian King's avatar
      lib/lz4: make arrays static const, reduces object code size · 8cb5d748
      Colin Ian King authored
      
      
      Don't populate the read-only arrays dec32table and dec64table on the
      stack, instead make them both static const.  Makes the object code
      smaller by over 10K bytes:
      
        Before:
           text	   data	    bss	    dec	    hex	filename
          31500	      0	      0	  31500	   7b0c	lib/lz4/lz4_decompress.o
      
        After:
           text	   data	    bss	    dec	    hex	filename
          20237	    176	      0	  20413	   4fbd	lib/lz4/lz4_decompress.o
      
      (gcc version 7.2.0 x86_64)
      
      Link: http://lkml.kernel.org/r/20170921221939.20820-1-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8cb5d748
    • Oleg Nesterov's avatar
      exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array · 50097f74
      Oleg Nesterov authored
      
      
      After the previous change "fmt" can't go away, we can kill
      iname/iname_addr and use fmt->interpreter.
      
      Link: http://lkml.kernel.org/r/20170922143653.GA17232@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jim Foraker <foraker1@llnl.gov>
      Cc: <tdhooge@llnl.gov>
      Cc: Travis Gummels <tgummels@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50097f74
    • Oleg Nesterov's avatar
      exec: binfmt_misc: fix race between load_misc_binary() and kill_node() · 43a4f261
      Oleg Nesterov authored
      
      
      load_misc_binary() makes a local copy of fmt->interpreter under
      entries_lock to avoid the race with kill_node() but this is not enough;
      the whole Node can be freed after we drop entries_lock, not only the
      ->interpreter string.
      
      Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free
      this Node.
      
      Link: http://lkml.kernel.org/r/20170922143650.GA17227@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jim Foraker <foraker1@llnl.gov>
      Cc: Travis Gummels <tgummels@redhat.com>
      Cc: <tdhooge@llnl.gov>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43a4f261
    • Oleg Nesterov's avatar
      exec: binfmt_misc: remove the confusing e->interp_file != NULL checks · eb23aa03
      Oleg Nesterov authored
      
      
      If MISC_FMT_OPEN_FILE flag is set e->interp_file must be valid or we
      have a bug which should not be silently ignored.
      
      Link: http://lkml.kernel.org/r/20170922143647.GA17222@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jim Foraker <foraker1@llnl.gov>
      Cc: <tdhooge@llnl.gov>
      Cc: Travis Gummels <tgummels@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb23aa03
    • Oleg Nesterov's avatar
      exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode() · 83f91827
      Oleg Nesterov authored
      
      
      To ensure that load_misc_binary() can't use the partially destroyed
      Node, see also the next patch.
      
      The current logic looks wrong in any case, once we close interp_file it
      doesn't make any sense to delay kfree(inode->i_private), this Node is no
      longer valid.  Even if the MISC_FMT_OPEN_FILE/interp_file checks were
      not racy (they are), load_misc_binary() should not try to reopen
      ->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL.
      
      And I can't understand why do we use filp_close(), not fput().
      
      Link: http://lkml.kernel.org/r/20170922143644.GA17216@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jim Foraker <foraker1@llnl.gov>
      Cc: <tdhooge@llnl.gov>
      Cc: Travis Gummels <tgummels@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83f91827
    • Oleg Nesterov's avatar
      exec: binfmt_misc: don't nullify Node->dentry in kill_node() · baba1b29
      Oleg Nesterov authored
      
      
      kill_node() nullifies/checks Node->dentry to avoid double free.  This
      complicates the next changes and this is very confusing:
      
       - we do not need to check dentry != NULL under entries_lock,
         kill_node() is always called under inode_lock(d_inode(root)) and we
         rely on this inode_lock() anyway, without this lock the
         MISC_FMT_OPEN_FILE cleanup could race with itself.
      
       - if kill_inode() was already called and ->dentry == NULL we should not
         even try to close e->interp_file.
      
      We can change bm_entry_write() to simply check !list_empty(list) before
      kill_node.  Again, we rely on inode_lock(), in particular it saves us
      from the race with bm_status_write(), another caller of kill_node().
      
      Link: http://lkml.kernel.org/r/20170922143641.GA17210@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jim Foraker <foraker1@llnl.gov>
      Cc: <tdhooge@llnl.gov>
      Cc: Travis Gummels <tgummels@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      baba1b29
    • Oleg Nesterov's avatar
      exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array · c2315c18
      Oleg Nesterov authored
      Patch series "exec: binfmt_misc: fix use-after-free, kill
      iname[BINPRM_BUF_SIZE]".
      
      It looks like this code was always wrong, then commit 948b701a
      
      
      ("binfmt_misc: add persistent opened binary handler for containers")
      added more problems.
      
      This patch (of 6):
      
      load_script() can simply use i_name instead, it points into bprm->buf[]
      and nobody can change this memory until we call prepare_binprm().
      
      The only complication is that we need to also change the signature of
      bprm_change_interp() but this change looks good too.
      
      While at it, do whitespace/style cleanups.
      
      NOTE: the real motivation for this change is that people want to
      increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
      this looks more complicated because afaics it is very buggy.
      
      Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Travis Gummels <tgummels@re...
      c2315c18
    • Andrea Arcangeli's avatar
      userfaultfd: non-cooperative: fix fork use after free · 384632e6
      Andrea Arcangeli authored
      When reading the event from the uffd, we put it on a temporary
      fork_event list to detect if we can still access it after releasing and
      retaking the event_wqh.lock.
      
      If fork aborts and removes the event from the fork_event all is fine as
      long as we're still in the userfault read context and fork_event head is
      still alive.
      
      We've to put the event allocated in the fork kernel stack, back from
      fork_event list-head to the event_wqh head, before returning from
      userfaultfd_ctx_read, because the fork_event head lifetime is limited to
      the userfaultfd_ctx_read stack lifetime.
      
      Forgetting to move the event back to its event_wqh place then results in
      __remove_wait_queue(&ctx->event_wqh, &ewq->wq); in
      userfaultfd_event_wait_completion to remove it from a head that has been
      already freed from the reader stack.
      
      This could only happen if resolve_userfault_fork failed (for example if
      there are no file descriptors available to allocate the fork uffd).  If
      it succeeded it was put back correctly.
      
      Furthermore, after find_userfault_evt receives a fork event, the forked
      userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can
      be released by the fork thread as soon as the event_wqh.lock is
      released.  Taking a reference on the fork_nctx before dropping the lock
      prevents an use after free in resolve_userfault_fork().
      
      If the fork side aborted and it already released everything, we still
      try to succeed resolve_userfault_fork(), if possible.
      
      Fixes: 893e26e6
      
       ("userfaultfd: non-cooperative: Add fork() event")
      Link: http://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      384632e6
    • Reza Arbab's avatar
      mm/device-public-memory: fix edge case in _vm_normal_page() · 7d790d2d
      Reza Arbab authored
      With device public pages at the end of my memory space, I'm getting
      output from _vm_normal_page():
      
        BUG: Bad page map in process migrate_pages  pte:c0800001ffff0d06 pmd:f95d3000
        addr:00007fff89330000 vm_flags:00100073 anon_vma:c0000000fa899320 mapping:          (null) index:7fff8933
        file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
        CPU: 0 PID: 13963 Comm: migrate_pages Tainted: P    B      OE 4.14.0-rc1-wip #155
        Call Trace:
           dump_stack+0xb0/0xf4 (unreliable)
           print_bad_pte+0x28c/0x340
           _vm_normal_page+0xc0/0x140
           zap_pte_range+0x664/0xc10
           unmap_page_range+0x318/0x670
           unmap_vmas+0x74/0xe0
           exit_mmap+0xe8/0x1f0
           mmput+0xac/0x1f0
           do_exit+0x348/0xcd0
           do_group_exit+0x5c/0xf0
           SyS_exit_group+0x1c/0x20
           system_call+0x58/0x6c
      
      The pfn causing this is the very last one.  Correct the bounds check
      accordingly.
      
      Fixes: df6ad698 ("mm/device-public-memo...
      7d790d2d
    • Shaohua Li's avatar
      mm: fix data corruption caused by lazyfree page · 9625456c
      Shaohua Li authored
      MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
      SwapBacked).  There is no lock to prevent the page is added to swap
      cache between these two steps by page reclaim.  If page reclaim finds
      such page, it will simply add the page to swap cache without pageout the
      page to swap because the page is marked as clean.  Next time, page fault
      will read data from the swap slot which doesn't have the original data,
      so we have a data corruption.  To fix issue, we mark the page dirty and
      pageout the page.
      
      However, we shouldn't dirty all pages which is clean and in swap cache.
      swapin page is swap cache and clean too.  So we only dirty page which is
      added into swap cache in page reclaim, which shouldn't be swapin page.
      As Minchan suggested, simply dirty the page in add_to_swap can do the
      job.
      
      Fixes: 802a3a92
      
       ("mm: reclaim MADV_FREE pages")
      Link: http://lkml.kernel.org/r/08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reported-by: default avatarArtem Savkov <asavkov@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[4.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9625456c
    • Shaohua Li's avatar
      mm: avoid marking swap cached page as lazyfree · 24c92eb7
      Shaohua Li authored
      MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
      SwapBacked).  There is no lock to prevent the page is added to swap
      cache between these two steps by page reclaim.  Page reclaim could add
      the page to swap cache and unmap the page.  After page reclaim, the page
      is added back to lru.  At that time, we probably start draining per-cpu
      pagevec and mark the page lazyfree.  So the page could be in a state
      with SwapBacked cleared and PG_swapcache set.  Next time there is a
      refault in the virtual address, do_swap_page can find the page from swap
      cache but the page has PageSwapCache false because SwapBacked isn't set,
      so do_swap_page will bail out and do nothing.  The task will keep
      running into fault handler.
      
      Fixes: 802a3a92
      
       ("mm: reclaim MADV_FREE pages")
      Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reported-by: default avatarArtem Savkov <asavkov@redhat.com>
      Te...
      24c92eb7
    • Jeff Layton's avatar
      mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC · f4e222c5
      Jeff Layton authored
      
      
      Eryu noticed that he could sometimes get a leftover error reported when
      it shouldn't be on fsync with ext2 and non-journalled ext4.
      
      The problem is that writeback_single_inode still uses filemap_fdatawait.
      That picks up a previously set AS_EIO flag, which would ordinarily have
      been cleared before.
      
      Since we're mostly using this function as a replacement for
      filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO
      and AS_ENOSPC when reporting an error.  That should allow the new
      function to better emulate the behavior of the old with respect to these
      flags.
      
      Link: http://lkml.kernel.org/r/20170922133331.28812-1-jlayton@kernel.org
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reported-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f4e222c5
    • Sudip Mukherjee's avatar
      m32r: define CPU_BIG_ENDIAN · 5bdfca64
      Sudip Mukherjee authored
      
      
      The build of m32r allmodconfig is giving lots of build warnings about:
      
        include/linux/byteorder/big_endian.h:7:2:
      	warning: #warning inconsistent configuration,
      		needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
      	#warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN
      
      Define CPU_BIG_ENDIAN like the way CPU_LITTLE_ENDIAN is defined.
      
      Link: http://lkml.kernel.org/r/1505678083-10320-1-git-send-email-sudipm.mukherjee@gmail.com
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5bdfca64
    • Minchan Kim's avatar
      zram: fix null dereference of handle · ae94264e
      Minchan Kim authored
      In testing I found handle passed to zs_map_object in __zram_bvec_read is
      NULL so eh kernel goes oops in pin_object().
      
      The reason is there is no routine to check the slot's freeing after
      getting the slot's lock.  This patch fixes it.
      
      [minchan@kernel.org: v2]
        Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org
      Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org
      Fixes: 1f7319c7
      
       ("zram: partial IO refactoring")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae94264e
    • Christophe Leroy's avatar
      mm: fix RODATA_TEST failure "rodata_test: test data was not read only" · a872eb21
      Christophe Leroy authored
      On powerpc, RODATA_TEST fails with message the following messages:
      
        Freeing unused kernel memory: 528K
        rodata_test: test data was not read only
      
      This is because GCC allocates it to .data section:
      
        c0695034 g     O .data	00000004 rodata_test_data
      
      Since commit 056b9d8a ("mm: remove rodata_test_data export, add
      pr_fmt"), rodata_test_data is used only inside rodata_test.c By
      declaring it static, it gets properly allocated into .rodata section
      instead of .data:
      
        c04df710 l     O .rodata	00000004 rodata_test_data
      
      Fixes: 056b9d8a
      
       ("mm: remove rodata_test_data export, add pr_fmt")
      Link: http://lkml.kernel.org/r/20170921093729.1080368AC1@po15668-vm-win7.idsi0.si.c-s.fr
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a872eb21
    • Ioan Nicu's avatar
      rapidio: remove global irq spinlocks from the subsystem · 31d1e130
      Ioan Nicu authored
      
      
      Locking of config and doorbell operations should be done only if the
      underlying hardware requires it.
      
      This patch removes the global spinlocks from the rapidio subsystem and
      moves them to the mport drivers (fsl_rio and tsi721), only to the
      necessary places.  For example, local config space read and write
      operations (lcread/lcwrite) are atomic in all existing drivers, so there
      should be no need for locking, while the cread/cwrite operations which
      generate maintenance transactions need to be synchronized with a lock.
      
      Later, each driver could chose to use a per-port lock instead of a
      global one, or even more granular locking.
      
      Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com
      Signed-off-by: default avatarIoan Nicu <ioan.nicu.ext@nokia.com>
      Signed-off-by: default avatarFrank Kunz <frank.kunz@nokia.com>
      Acked-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31d1e130
    • Arnd Bergmann's avatar
      mm: meminit: mark init_reserved_page as __meminit · 57148a64
      Arnd Bergmann authored
      The function is called from __meminit context and calls other __meminit
      functions but isn't it self mark as such today:
      
        WARNING: vmlinux.o(.text.unlikely+0x4516): Section mismatch in reference from the function init_reserved_page() to the function .meminit.text:early_pfn_to_nid()
        The function init_reserved_page() references the function __meminit early_pfn_to_nid().
        This is often because init_reserved_page lacks a __meminit annotation or the annotation of early_pfn_to_nid is wrong.
      
      On most compilers, we don't notice this because the function gets
      inlined all the time.  Adding __meminit here fixes the harmless warning
      for the old versions and is generally the correct annotation.
      
      Link: http://lkml.kernel.org/r/20170915193149.901180-1-arnd@arndb.de
      Fixes: 7e18adb4
      
       ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57148a64
    • Vitaly Wool's avatar
      z3fold: fix stale list handling · 35529357
      Vitaly Wool authored
      
      
      Fix the situation when clear_bit() is called for page->private before
      the page pointer is actually assigned.  While at it, remove work_busy()
      check because it is costly and does not give 100% guarantee anyway.
      
      Signed-off-by: default avatarVitaly Wool <vitalywool@gmail.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: <Oleksiy.Avramchenko@sony.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35529357
    • Davidlohr Bueso's avatar
      mm,compaction: serialize waitqueue_active() checks (for real) · 6818600f
      Davidlohr Bueso authored
      Andrea brought to my attention that the L->{L,S} guarantees are
      completely bogus for this case.  I was looking at the diagram, from the
      offending commit, when that _is_ the race, we had the load reordered
      already.
      
      What we need is at least S->L semantics, thus simply use
      wq_has_sleeper() to serialize the call for good.
      
      Link: http://lkml.kernel.org/r/20170914175313.GB811@linux-80c1.suse
      Fixes: 46acef04
      
       (mm,compaction: serialize waitqueue_active() checks)
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6818600f