Skip to content
  1. Nov 27, 2014
  2. Nov 25, 2014
  3. Nov 11, 2014
  4. Nov 08, 2014
    • Sudeep Holla's avatar
      drivers: base: support cpu cache information interface to userspace via sysfs · 246246cb
      Sudeep Holla authored
      
      
      This patch adds initial support for providing processor cache information
      to userspace through sysfs interface. This is based on already existing
      implementations(x86, ia64, s390 and powerpc) and hence the interface is
      intended to be fully compatible.
      
      The main purpose of this generic support is to avoid further code
      duplication to support new architectures and also to unify all the existing
      different implementations.
      
      This implementation maintains the hierarchy of cache objects which reflects
      the system's cache topology. Cache devices are instantiated as needed as
      CPUs come online. The cache information is replicated per-cpu even if they are
      shared. A per-cpu array of cache information maintained is used mainly for
      sysfs-related book keeping.
      
      It also implements the shared_cpu_map attribute, which is essential for
      enabling both kernel and user-space to discover the system's overall cache
      topology.
      
      This patch also add the missing ABI documentation for the cacheinfo sysfs
      interface already, which is well defined and widely used.
      
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Reviewed-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Tested-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: linux390@de.ibm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: x86@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      246246cb
    • Sudeep Holla's avatar
      drivers: base: add cpu_device_create to support per-cpu devices · 3d52943b
      Sudeep Holla authored
      
      
      This patch adds a new function to create per-cpu devices.
      This helps in:
      1. reusing the device infrastructure to create any cpu related
         attributes and corresponding sysfs instead of creating and
         dealing with raw kobjects directly
      2. retaining the legacy path(/sys/devices/system/cpu/..) to support
         existing sysfs ABI
      3. avoiding to create links in the bus directory pointing to the
         device as there would be per-cpu instance of these devices with
         the same name since dev->bus is not populated to cpu_sysbus on
         purpose
      
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d52943b
    • Sudeep Holla's avatar
      topology: replace custom attribute macros with standard DEVICE_ATTR* · d6ea8d01
      Sudeep Holla authored
      
      
      Currently couple of custom macros are defined to declare the
      device attributes. However there are already standard macros
      defined in device.h that suffice the need and these custom
      macros can be removed.
      
      This patch replaces custom attribute macros with standard
      DEVICE_ATTR_RO attribute
      
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6ea8d01
    • Sudeep Holla's avatar
      cpumask: factor out show_cpumap into separate helper function · 5aaba363
      Sudeep Holla authored
      
      
      Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
      to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
      terminated buffer with newline.
      
      This patch creates a new helper function cpumap_print_to_pagebuf in
      cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
      most of those sysfs functions using the new helper function.
      
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Suggested-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Tested-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Acked-by: default avatar"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: x86@kernel.org
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5aaba363
    • Alex Williamson's avatar
      driver core: Fix unbalanced device reference in drivers_probe · 0372ffb3
      Alex Williamson authored
      
      
      bus_find_device_by_name() acquires a device reference which is never
      released.  This results in an object leak, which on older kernels
      results in failure to release all resources of PCI devices.  libvirt
      uses drivers_probe to re-attach devices to the host after assignment
      and is therefore a common trigger for this leak.
      
      Example:
      
      # cd /sys/bus/pci/
      # dmesg -C
      # echo 1 > devices/0000\:01\:00.0/sriov_numvfs
      # echo 0 > devices/0000\:01\:00.0/sriov_numvfs
      # dmesg | grep 01:10
       pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_cleanup, parent           (null)
       kobject: '0000:01:10.0' (ffff8801d79cd0a8): calling ktype release
       kobject: '0000:01:10.0': free name
      
      [kobject freed as expected]
      
      # dmesg -C
      # echo 1 > devices/0000\:01\:00.0/sriov_numvfs
      # echo 0000:01:10.0 > drivers_probe
      # echo 0 > devices/0000\:01\:00.0/sriov_numvfs
      # dmesg | grep 01:10
       pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
       kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
      
      [no free]
      
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0372ffb3
    • Sergey Klyaus's avatar
      driver core: fix race with userland in device_add() · 0cd75047
      Sergey Klyaus authored
      
      
      bus_add_device() should be called before devtmpfs_create_node(), so when
      userland application opens device from devtmpfs, it wouldn't get ENODEV
      from kernel, because device_add() wasn't completed.
      
      Signed-off-by: default avatarSergey Klyaus <Sergey.Klyaus@Tune-IT.Ru>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cd75047
    • NeilBrown's avatar
      sysfs/kernfs: make read requests on pre-alloc files use the buffer. · 4ef67a8c
      NeilBrown authored
      
      
      To match the previous patch which used the pre-alloc buffer for
      writes, this patch causes reads to use the same buffer.
      This is not strictly necessary as the current seq_read() will allocate
      on first read, so user-space can trigger the required pre-alloc.  But
      consistency is valuable.
      
      The read function is somewhat simpler than seq_read() and, for example,
      does not support reading from an offset into the file: reads must be
      at the start of the file.
      
      As seq_read() does not use the prealloc buffer, ->seq_show is
      incompatible with ->prealloc and caused an EINVAL return from open().
      sysfs code which calls into kernfs always chooses the correct function.
      
      As the buffer is shared with writes and other reads, the mutex is
      extended to cover the copy_to_user.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ef67a8c
    • NeilBrown's avatar
      sysfs/kernfs: allow attributes to request write buffer be pre-allocated. · 2b75869b
      NeilBrown authored
      
      
      md/raid allows metadata management to be performed in user-space.
      A various times, particularly on device failure, the metadata needs
      to be updated before further writes can be permitted.
      This means that the user-space program which updates metadata much
      not block on writeout, and so must not allocate memory.
      
      mlockall(MCL_CURRENT|MCL_FUTURE) and pre-allocation can avoid all
      memory allocation issues for user-memory, but that does not help
      kernel memory.
      Several kernel objects can be pre-allocated.  e.g. files opened before
      any writes to the array are permitted.
      However some kernel allocation happens in places that cannot be
      pre-allocated.
      In particular, writes to sysfs files (to tell md that it can now
      allow writes to the array) allocate a buffer using GFP_KERNEL.
      
      This patch allows attributes to be marked as "PREALLOC".  In that case
      the maximal buffer is allocated when the file is opened, and then used
      on each write instead of allocating a new buffer.
      
      As the same buffer is now shared for all writes on the same file
      description, the mutex is extended to cover full use of the buffer
      including the copy_from_user().
      
      The new __ATTR_PREALLOC() 'or's a new flag in to the 'mode', which is
      inspected by sysfs_add_file_mode_ns() to determine if the file should be
      marked as requiring prealloc.
      
      Despite the comment, we *do* use ->seq_show together with ->prealloc
      in this patch.  The next patch fixes that.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b75869b
    • Vladimir Zapolskiy's avatar
      fs: sysfs: return EGBIG on write if offset is larger than file size · 09368960
      Vladimir Zapolskiy authored
      
      
      According to the user expectations common utilities like dd or sh
      redirection operator > should work correctly over binary files from
      sysfs. At the moment doing excessive write can not be completed:
      
        write(1, "\0\0\0\0\0\0\0\0", 8)         = 4
        write(1, "\0\0\0\0", 4)                 = 0
        write(1, "\0\0\0\0", 4)                 = 0
        write(1, "\0\0\0\0", 4)                 = 0
        ...
      
      Fix the problem by returning EFBIG described in man 2 write.
      
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09368960
    • Pankaj Dubey's avatar
      kobject: fix NULL pointer derefernce in kobj_child_ns_ops · 41fb96a4
      Pankaj Dubey authored
      
      
      We will hit NULL pointer dereference if we call
      platform_device_register_simple or platform_device_add at very early
      stage. I have observed following crash when called platform_device_add
      from "init_irq" hook of machine_desc. This patch fixes this issue and
      let system handle this case gracefully instead of kernel panic.
      
      [0.000000] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000c
      [0.000000] pgd = c0004000
      [0.000000] [0000000c] *pgd=00000000
      [0.000000] Internal error: Oops: 5 [#1] PREEMPT ARM
      [0.000000] Modules linked in:
      [0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W 3.17.0-rc6-00198-ga1603f1-dirty #319
      [0.000000] task: c05b23f0 ti: c05a8000 task.ti: c05a8000
      [0.000000] PC is at kobject_namespace+0x18/0x58
      [0.000000] LR is at kobject_add_internal+0x90/0x2ec
      [snip]
      [0.000000] [<c01b1df0>] (kobject_namespace) from [<c01b2338>] (kobject_add_internal+0x90/0x2ec)
      [0.000000] [<c01b2338>] (kobject_add_internal) from [<c01b2728>] (kobject_add+0x4c/0x98)
      [0.000000] [<c01b2728>] (kobject_add) from [<c0226274>] (device_add+0xe8/0x51c)
      [0.000000] [<c0226274>] (device_add) from [<c0229c70>] (platform_device_add+0xb4/0x214)
      [0.000000] [<c0229c70>] (platform_device_add) from [<c022a338>] (platform_device_register_full+0xb8/0xdc)
      [0.000000] [<c022a338>] (platform_device_register_full) from [<c0570214>] (exynos_init_irq+0x90/0x9c)
      [0.000000] [<c0570214>] (exynos_init_irq) from [<c056c18c>] (init_IRQ+0x2c/0x78)
      [0.000000] [<c056c18c>] (init_IRQ) from [<c0569a54>] (start_kernel+0x22c/0x378)
      [0.000000] [<c0569a54>] (start_kernel) from [<40008070>] (0x40008070)
      [0.000000] Code: e590000c e3500000 0a00000e e5903014 (e593300c)
      
      Signed-off-by: default avatarPankaj Dubey <pankaj.dubey@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41fb96a4
  5. Nov 07, 2014
  6. Nov 04, 2014
  7. Nov 03, 2014
  8. Nov 02, 2014
    • Paolo Bonzini's avatar
      KVM: vmx: defer load of APIC access page address during reset · a73896cb
      Paolo Bonzini authored
      
      
      Most call paths to vmx_vcpu_reset do not hold the SRCU lock.  Defer loading
      the APIC access page to the next vmentry.
      
      This avoids the following lockdep splat:
      
      [ INFO: suspicious RCU usage. ]
      3.18.0-rc2-test2+ #70 Not tainted
      -------------------------------
      include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-x86/2371:
       #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]
      
      stack backtrace:
      CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
      Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
       0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
       ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
       ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
      Call Trace:
       [<ffffffff816f514f>] dump_stack+0x4e/0x71
       [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
       [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
       [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
       [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
       [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
       [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
       [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
       [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
       [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
       [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
       [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
       [<ffffffff8122ee45>] ? __fget+0x5/0x250
       [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
       [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
       [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
      
      Reported-by: default avatarTakashi Iwai <tiwai@suse.de>
      Reported-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Fixes: 38b99173
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a73896cb
    • Jan Kiszka's avatar
      KVM: nVMX: Disable preemption while reading from shadow VMCS · 282da870
      Jan Kiszka authored
      
      
      In order to access the shadow VMCS, we need to load it. At this point,
      vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
      we now get preempted by Linux, vmx_vcpu_put and, on return, the
      vmx_vcpu_load will work against the wrong vmcs. That can cause
      copy_shadow_to_vmcs12 to corrupt the vmcs12 state.
      
      Fix the issue by disabling preemption during the copy operation.
      copy_vmcs12_to_shadow is safe from this issue as it is executed by
      vmx_vcpu_run when preemption is already disabled before vmentry.
      
      This bug is exposed by running Jailhouse within KVM on CPUs with
      shadow VMCS support.  Jailhouse never expects an interrupt pending
      vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
      is preempted, the active VMCS happens to have the virtual interrupt
      pending flag set in the CPU-based execution controls.
      
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      282da870
    • Nadav Amit's avatar
      KVM: x86: Fix far-jump to non-canonical check · 7e46dddd
      Nadav Amit authored
      
      
      Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
      jumps") introduced a bug that caused the fix to be incomplete.  Due to
      incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
      segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
      not trigger #GP.  As we know, this imposes a security problem.
      
      In addition, the condition for two warnings was incorrect.
      
      Fixes: d1442d85
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7e46dddd
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux · 10a8fce8
      Dave Airlie authored
      A critical 3.18 regression fix from Rob, (thanks!)
      A fix to avoid advertizing modes we can't support from Sinclair
        (welcome Sinclair!)
      and a fix for an incorrect  hash key computation from me that is
        completely harmless, but can wait 'til the next merge window if necessary.
        (I can't really bother stable with this one).
      
      * 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux:
        drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
        drm/vmwgfx: Fix hash key computation
        drm/vmwgfx: fix lock breakage
      10a8fce8