Skip to content
  1. Jan 05, 2012
  2. Dec 29, 2011
  3. Dec 14, 2011
  4. Dec 11, 2011
  5. Dec 10, 2011
  6. Dec 09, 2011
    • Mandeep Singh Baines's avatar
      sys_getppid: add missing rcu_dereference · 031af165
      Mandeep Singh Baines authored
      
      
      In order to safely dereference current->real_parent inside an
      rcu_read_lock, we need an rcu_dereference.
      
      Signed-off-by: default avatarMandeep Singh Baines <msb@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      031af165
    • Alexandre Bounine's avatar
      rapidio/tsi721: modify PCIe capability settings · 1cee22b7
      Alexandre Bounine authored
      
      
      Modify initialization of PCIe capability registers in Tsi721 mport driver:
       - change Completion Timeout value to avoid unexpected data transfer
         aborts during intensive traffic.
       - replace hardcoded offset of PCIe capability block by making it use the
         common function.
      
      This patch is applicable to kernel versions starting from 3.2-rc1.
      
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1cee22b7
    • Alexandre Bounine's avatar
      rapidio/tsi721: fix mailbox resource reporting · b439e66f
      Alexandre Bounine authored
      
      
      Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
      mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification.  Mailbox
      resources has to be properly reported to allow use of all available
      mailboxes (initial version reports only MBOX0).
      
      This patch is applicable to kernel versions staring from 3.2-rc1.
      
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b439e66f
    • Alexandre Bounine's avatar
      rapidio/tsi721: switch to dma_zalloc_coherent · ceb96398
      Alexandre Bounine authored
      
      
      Replace the pair dma_alloc_coherent()+memset() with the new
      dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2
      
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ceb96398
    • Michal Hocko's avatar
      procfs: do not overflow get_{idle,iowait}_time for nohz · 2a95ea6c
      Michal Hocko authored
      Since commit a25cac51
      
       ("proc: Consider NO_HZ when printing idle and
      iowait times") we are reporting idle/io_wait time also while a CPU is
      tickless.  We rely on get_{idle,iowait}_time functions to retrieve
      proper data.
      
      These functions, however, use usecs_to_cputime to translate micro
      seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
      which reduces the data type from u64 to unsigned int and also checks
      whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
      and returns MAX_JIFFY_OFFSET in that case.
      
      When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
      it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
      >3000s! until we overflow unsigned int.  Just for reference
      CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
      CONFIG_HZ_1000 ~2s.
      
      This results in a bug when people saw [h]top going mad reporting 100%
      CPU usage even though there was basically no CPU load.  The reason was
      simply that /proc/stat stopped reporting idle/io_wait changes (and
      reported MAX_JIFFY_OFFSET) and so the only change happening was for user
      system time.
      
      Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
      to 32b type and it is much more appropriate for cumulative time values
      (unlike usecs_to_jiffies which intended for timeout calculations).
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Tested-by: default avatarArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a95ea6c
    • Mel Gorman's avatar
      mm: vmalloc: check for page allocation failure before vmlist insertion · 1368edf0
      Mel Gorman authored
      Commit f5252e00
      
       ("mm: avoid null pointer access in vm_struct via
      /proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
      it is fully initialised.  Unfortunately, it did not check that
      __vmalloc_area_node() successfully populated the area.  In the event of
      allocation failure, the vmalloc area is freed but the pointer to freed
      memory is inserted into the vmlist leading to a a crash later in
      get_vmalloc_info().
      
      This patch adds a check for ____vmalloc_area_node() failure within
      __vmalloc_node_range.  It does not use "goto fail" as in the previous
      error path as a warning was already displayed by __vmalloc_area_node()
      before it called vfree in its failure path.
      
      Credit goes to Luciano Chavez for doing all the real work of identifying
      exactly where the problem was.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarLuciano Chavez <lnx1138@linux.vnet.ibm.com>
      Tested-by: default avatarLuciano Chavez <lnx1138@linux.vnet.ibm.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>		[3.1.x+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1368edf0
    • Michal Hocko's avatar
      mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks · d0215638
      Michal Hocko authored
      setup_zone_migrate_reserve() expects that zone->start_pfn starts at
      pageblock_nr_pages aligned pfn otherwise we could access beyond an
      existing memblock resulting in the following panic if
      CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:
      
        IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180
        *pdpt = 0000000000000000 *pde = f000ff53f000ff53
        Oops: 0000 [#1] SMP
        Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
        EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0
        EIP is at setup_zone_migrate_reserve+0xcd/0x180
        EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
        ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
        DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
        Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
        Call Trace:
        [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160
        [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20
        [<c08a771c>] init_per_zone_wmark_min+0x27/0x86
        [<c020111b>] do_one_initcall+0x2b/0x160
        [<c086639d>] kernel_init+0xbe/0x157
        [<c05cae26>] kernel_thread_helper+0x6/0xd
        Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
        EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
        CR2: 00000000000012b4
      
      We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
      highstart_pfn = 0x36ffe.
      
      The issue was introduced in 3.0-rc1 by 6d3163ce
      
       ("mm: check if any page
      in a pageblock is reserved before marking it MIGRATE_RESERVE").
      
      Make sure that start_pfn is always aligned to pageblock_nr_pages to
      ensure that pfn_valid s always called at the start of each pageblock.
      Architectures with holes in pageblocks will be correctly handled by
      pfn_valid_within in pageblock_is_reserved.
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarDang Bo <bdang@vmware.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Arve Hjnnevg <arve@android.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[3.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0215638
    • Hillf Danton's avatar
      mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages · 09761333
      Hillf Danton authored
      
      
      Avoid unlocking and unlocked page if we failed to lock it.
      
      Signed-off-by: default avatarHillf Danton <dhillf@gmail.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09761333
    • Youquan Song's avatar
      thp: set compound tail page _count to zero · 58a84aa9
      Youquan Song authored
      Commit 70b50f94
      
       ("mm: thp: tail page refcounting fix") keeps all
      page_tail->_count zero at all times.  But the current kernel does not
      set page_tail->_count to zero if a 1GB page is utilized.  So when an
      IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
      tail page's _count does not equal zero.
      
        kernel BUG at include/linux/mm.h:386!
        invalid opcode: 0000 [#1] SMP
        Call Trace:
          gup_pud_range+0xb8/0x19d
          get_user_pages_fast+0xcb/0x192
          ? trace_hardirqs_off+0xd/0xf
          hva_to_pfn+0x119/0x2f2
          gfn_to_pfn_memslot+0x2c/0x2e
          kvm_iommu_map_pages+0xfd/0x1c1
          kvm_iommu_map_memslots+0x7c/0xbd
          kvm_iommu_map_guest+0xaa/0xbf
          kvm_vm_ioctl_assigned_device+0x2ef/0xa47
          kvm_vm_ioctl+0x36c/0x3a2
          do_vfs_ioctl+0x49e/0x4e4
          sys_ioctl+0x5a/0x7c
          system_call_fastpath+0x16/0x1b
        RIP  gup_huge_pud+0xf2/0x159
      
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58a84aa9
    • Youquan Song's avatar
      thp: add compound tail page _mapcount when mapped · b6999b19
      Youquan Song authored
      
      
      With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
      to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
      failed to be used.
      
      The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
      page calls gup_huge_pmd.  If compound pages are used and the page is a
      tail page, gup_huge_pmd() increases _mapcount to record tail page are
      mapped while gup_huge_pud does not do that.
      
      So when the mapped page is relesed, it will result in kernel oops
      because the page is not marked mapped.
      
      This patch add tail process for compound page in 1GB huge page which
      keeps the same process as 2M page.
      
      Reproduce like:
      1. Add grub boot option: hugepagesz=1G hugepages=8
      2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
      3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
      	-net none -device pci-assign,host=07:00.1
      
        kernel BUG at mm/swap.c:114!
        invalid opcode: 0000 [#1] SMP
        Call Trace:
          put_page+0x15/0x37
          kvm_release_pfn_clean+0x31/0x36
          kvm_iommu_put_pages+0x94/0xb1
          kvm_iommu_unmap_memslots+0x80/0xb6
          kvm_assign_device+0xba/0x117
          kvm_vm_ioctl_assigned_device+0x301/0xa47
          kvm_vm_ioctl+0x36c/0x3a2
          do_vfs_ioctl+0x49e/0x4e4
          sys_ioctl+0x5a/0x7c
          system_call_fastpath+0x16/0x1b
        RIP  put_compound_page+0xd4/0x168
      
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6999b19