Skip to content
  1. Aug 14, 2018
  2. Aug 11, 2018
  3. Aug 10, 2018
  4. Aug 09, 2018
    • Andreas Gustafsson's avatar
    • Andreas Gustafsson's avatar
      bf844d64
    • Greg Kroah-Hartman's avatar
      Linux 4.14.62 · 1aa1166e
      Greg Kroah-Hartman authored
      1aa1166e
    • Shankara Pailoor's avatar
      jfs: Fix inconsistency between memory allocation and ea_buf->max_size · 7d29fb53
      Shankara Pailoor authored
      commit 92d34134
      
       upstream.
      
      The code is assuming the buffer is max_size length, but we weren't
      allocating enough space for it.
      
      Signed-off-by: default avatarShankara Pailoor <shankarapailoor@gmail.com>
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d29fb53
    • Eric Sandeen's avatar
      xfs: don't call xfs_da_shrink_inode with NULL bp · 59f35b98
      Eric Sandeen authored
      commit bb3d48dc upstream.
      
      xfs_attr3_leaf_create may have errored out before instantiating a buffer,
      for example if the blkno is out of range.  In that case there is no work
      to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops
      if we try.
      
      This also seems to fix a flaw where the original error from
      xfs_attr3_leaf_create gets overwritten in the cleanup case, and it
      removes a pointless assignment to bp which isn't used after this.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969
      
      
      Reported-by: default avatarXu, Wen <wen.xu@gatech.edu>
      Tested-by: default avatarXu, Wen <wen.xu@gatech.edu>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59f35b98
    • Dave Chinner's avatar
      xfs: validate cached inodes are free when allocated · 6f021e4e
      Dave Chinner authored
      commit afca6c5b upstream.
      
      A recent fuzzed filesystem image cached random dcache corruption
      when the reproducer was run. This often showed up as panics in
      lookup_slow() on a null inode->i_ops pointer when doing pathwalks.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      ....
      Call Trace:
       lookup_slow+0x44/0x60
       walk_component+0x3dd/0x9f0
       link_path_walk+0x4a7/0x830
       path_lookupat+0xc1/0x470
       filename_lookup+0x129/0x270
       user_path_at_empty+0x36/0x40
       path_listxattr+0x98/0x110
       SyS_listxattr+0x13/0x20
       do_syscall_64+0xf5/0x280
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      but had many different failure modes including deadlocks trying to
      lock the inode that was just allocated or KASAN reports of
      use-after-free violations.
      
      The cause of the problem was a corrupt INOBT on a v4 fs where the
      root inode was marked as free in the inobt record. Hence when we
      allocated an inode, it chose the root inode to allocate, found it in
      the cache and re-initialised it.
      
      We recently fixed a similar inode allocation issue caused by inobt
      record corruption problem in xfs_iget_cache_miss() in commit
      ee457001
      
       ("xfs: catch inode allocation state mismatch
      corruption"). This change adds similar checks to the cache-hit path
      to catch it, and turns the reproducer into a corruption shutdown
      situation.
      
      Reported-by: default avatarWen Xu <wen.xu@gatech.edu>
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix typos in comment]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f021e4e
    • Dave Chinner's avatar
      xfs: catch inode allocation state mismatch corruption · 27c41b17
      Dave Chinner authored
      commit ee457001
      
       upstream.
      
      We recently came across a V4 filesystem causing memory corruption
      due to a newly allocated inode being setup twice and being added to
      the superblock inode list twice. From code inspection, the only way
      this could happen is if a newly allocated inode was not marked as
      free on disk (i.e. di_mode wasn't zero).
      
      Running the metadump on an upstream debug kernel fails during inode
      allocation like so:
      
      XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
      e.c, line: 838
       ------------[ cut here ]------------
      kernel BUG at fs/xfs/xfs_message.c:114!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
      1/2014
      RIP: 0010:assfail+0x28/0x30
      RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
      RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
      RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
      R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
      FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
      000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
      Call Trace:
       xfs_ialloc+0x383/0x570
       xfs_dir_ialloc+0x6a/0x2a0
       xfs_create+0x412/0x670
       xfs_generic_create+0x1f7/0x2c0
       ? capable_wrt_inode_uidgid+0x3f/0x50
       vfs_mkdir+0xfb/0x1b0
       SyS_mkdir+0xcf/0xf0
       do_syscall_64+0x73/0x1a0
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Extracting the inode number we crashed on from an event trace and
      looking at it with xfs_db:
      
      xfs_db> inode 184452204
      xfs_db> p
      core.magic = 0x494e
      core.mode = 0100644
      core.version = 2
      core.format = 2 (extents)
      core.nlinkv2 = 1
      core.onlink = 0
      .....
      
      Confirms that it is not a free inode on disk. xfs_repair
      also trips over this inode:
      
      .....
      zero length extent (off = 0, fsbno = 0) in ino 184452204
      correcting nextents for inode 184452204
      bad attribute fork in inode 184452204, would clear attr fork
      bad nblocks 1 for inode 184452204, would reset to 0
      bad anextents 1 for inode 184452204, would reset to 0
      imap claims in-use inode 184452204 is free, would correct imap
      would have cleared inode 184452204
      .....
      disconnected inode 184452204, would move to lost+found
      
      And so we have a situation where the directory structure and the
      inobt thinks the inode is free, but the inode on disk thinks it is
      still in use. Where this corruption came from is not possible to
      diagnose, but we can detect it and prevent the kernel from oopsing
      on lookup. The reproducer now results in:
      
      $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
      mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
      ists
      mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
      ists
      mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
      re needs cleaning
      mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
      utput error
      mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
      utput error
      ....
      
      And this corruption shutdown:
      
      [   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
       marked free on disk
      [   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
      of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
      [   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
      443
      [   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
      S 1.10.2-1 04/01/2014
      [   54.852859] Call Trace:
      [   54.853531]  dump_stack+0x85/0xc5
      [   54.854385]  xfs_trans_cancel+0x197/0x1c0
      [   54.855421]  xfs_create+0x425/0x670
      [   54.856314]  xfs_generic_create+0x1f7/0x2c0
      [   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
      [   54.858586]  vfs_mkdir+0xfb/0x1b0
      [   54.859458]  SyS_mkdir+0xcf/0xf0
      [   54.860254]  do_syscall_64+0x73/0x1a0
      [   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      [   54.862492] RIP: 0033:0x7fb73bddf547
      [   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
      000000000053
      [   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
      bddf547
      [   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
      a55449a
      [   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
      8670dd0
      [   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
      00001ff
      [   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
      a553500
      [   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
      024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
      [   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutt=
      ing down filesystem
      [   54.884597] XFS (loop0): Please umount the filesystem and rectify the =
      problem(s)
      
      Note that this crash is only possible on v4 filesystemsi or v5
      filesystems mounted with the ikeep mount option. For all other V5
      filesystems, this problem cannot occur because we don't read inodes
      we are allocating from disk - we simply overwrite them with the new
      inode information.
      
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Tested-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27c41b17
    • Len Brown's avatar
      intel_idle: Graceful probe failure when MWAIT is disabled · a3439992
      Len Brown authored
      commit a4c44753
      
       upstream.
      
      When MWAIT is disabled, intel_idle refuses to probe.
      But it may mis-lead the user by blaming this on the model number:
      
      intel_idle: does not run on family 6 modesl 79
      
      So defer the check for MWAIT until after the model# white-list check succeeds,
      and if the MWAIT check fails, tell the user how to fix it:
      
      intel_idle: Please enable MWAIT in BIOS SETUP
      
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3439992
    • James Smart's avatar
      nvmet-fc: fix target sgl list on large transfers · d626ac96
      James Smart authored
      commit d082dc15 upstream.
      
      The existing code to carve up the sg list expected an sg element-per-page
      which can be very incorrect with iommu's remapping multiple memory pages
      to fewer bus addresses. To hit this error required a large io payload
      (greater than 256k) and a system that maps on a per-page basis. It's
      possible that large ios could get by fine if the system condensed the
      sgl list into the first 64 elements.
      
      This patch corrects the sg list handling by specifically walking the
      sg list element by element and attempting to divide the transfer up
      on a per-sg element boundary. While doing so, it still tries to keep
      sequences under 256k, but will exceed that rule if a single sg element
      is larger than 256k.
      
      Fixes: 48fa362b
      
       ("nvmet-fc: simplify sg list handling")
      Cc: <stable@vger.kernel.org> # 4.14
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d626ac96
    • Keith Busch's avatar
      nvme-pci: Fix queue double allocations · 4af9c61a
      Keith Busch authored
      commit 62314e40 upstream.
      
      The queue count says the highest queue that's been allocated, so don't
      reallocate a queue lower than that.
      
      Fixes: 147b27e4
      
       ("nvme-pci: allocate device queues storage space at probe")
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af9c61a
    • Sagi Grimberg's avatar
      nvme-pci: allocate device queues storage space at probe · 12c058df
      Sagi Grimberg authored
      commit 147b27e4
      
       upstream.
      
      It may cause race by setting 'nvmeq' in nvme_init_request()
      because .init_request is called inside switching io scheduler, which
      may happen when the NVMe device is being resetted and its nvme queues
      are being freed and created. We don't have any sync between the two
      pathes.
      
      This patch changes the nvmeq allocation to occur at probe time so
      there is no way we can dereference it at init_request.
      
      [   93.268391] kernel BUG at drivers/nvme/host/pci.c:408!
      [   93.274146] invalid opcode: 0000 [#1] SMP
      [   93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss
      nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat
      intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
      kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt
      intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me
      ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei
      shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod
      mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
      fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3
      megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
      [   93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4
      [   93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
      [   93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1
      [   93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme]
      [   93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246
      [   93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
      [   93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008
      [   93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00
      [   93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00
      [   93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071
      [   93.422969] FS:  00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000
      [   93.431996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0
      [   93.446368] Call Trace:
      [   93.449103]  blk_mq_alloc_rqs+0x231/0x2a0
      [   93.453579]  blk_mq_sched_alloc_tags.isra.8+0x42/0x80
      [   93.459214]  blk_mq_init_sched+0x7e/0x140
      [   93.463687]  elevator_switch+0x5a/0x1f0
      [   93.467966]  ? elevator_get.isra.17+0x52/0xc0
      [   93.472826]  elv_iosched_store+0xde/0x150
      [   93.477299]  queue_attr_store+0x4e/0x90
      [   93.481580]  kernfs_fop_write+0xfa/0x180
      [   93.485958]  __vfs_write+0x33/0x170
      [   93.489851]  ? __inode_security_revalidate+0x4c/0x60
      [   93.495390]  ? selinux_file_permission+0xda/0x130
      [   93.500641]  ? _cond_resched+0x15/0x30
      [   93.504815]  vfs_write+0xad/0x1a0
      [   93.508512]  SyS_write+0x52/0xc0
      [   93.512113]  do_syscall_64+0x61/0x1a0
      [   93.516199]  entry_SYSCALL64_slow_path+0x25/0x25
      [   93.521351] RIP: 0033:0x7f33ce96aab0
      [   93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0
      [   93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001
      [   93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740
      [   93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400
      [   93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000
      [   93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0
      74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f>
      0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89
      [   93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8
      [   93.602273] ---[ end trace 810dde3993e5f14e ]---
      
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12c058df
    • Filipe Manana's avatar
      Btrfs: fix file data corruption after cloning a range and fsync · 0ea7fcfc
      Filipe Manana authored
      commit bd3599a0
      
       upstream.
      
      When we clone a range into a file we can end up dropping existing
      extent maps (or trimming them) and replacing them with new ones if the
      range to be cloned overlaps with a range in the destination inode.
      When that happens we add the new extent maps to the list of modified
      extents in the inode's extent map tree, so that a "fast" fsync (the flag
      BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
      and log corresponding extent items. However, at the end of range cloning
      operation we do truncate all the pages in the affected range (in order to
      ensure future reads will not get stale data). Sometimes this truncation
      will release the corresponding extent maps besides the pages from the page
      cache. If this happens, then a "fast" fsync operation will miss logging
      some extent items, because it relies exclusively on the extent maps being
      present in the inode's extent tree, leading to data loss/corruption if
      the fsync ends up using the same transaction used by the clone operation
      (that transaction was not committed in the meanwhile). An extent map is
      released through the callback btrfs_invalidatepage(), which gets called by
      truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
      later ends up calling try_release_extent_mapping() which will release the
      extent map if some conditions are met, like the file size being greater
      than 16Mb, gfp flags allow blocking and the range not being locked (which
      is the case during the clone operation) nor being the extent map flagged
      as pinned (also the case for cloning).
      
      The following example, turned into a test for fstests, reproduces the
      issue:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
        $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
      
        $ xfs_io -c "fsync" /mnt/bar
        # reflink destination offset corresponds to the size of file bar,
        # 2728Kb minus 4Kb.
        $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
        $ xfs_io -c "fsync" /mnt/bar
      
        $ md5sum /mnt/bar
        95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar
      
        <power fail>
      
        $ mount /dev/sdb /mnt
        $ md5sum /mnt/bar
        207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
        # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
        # power failure
      
      In the above example, the destination offset of the clone operation
      corresponds to the size of the "bar" file minus 4Kb. So during the clone
      operation, the extent map covering the range from 2572Kb to 2728Kb gets
      trimmed so that it ends at offset 2724Kb, and a new extent map covering
      the range from 2724Kb to 11724Kb is created. So at the end of the clone
      operation when we ask to truncate the pages in the range from 2724Kb to
      2724Kb + 15908Kb, the page invalidation callback ends up removing the new
      extent map (through try_release_extent_mapping()) when the page at offset
      2724Kb is passed to that callback.
      
      Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
      map is removed at try_release_extent_mapping(), forcing the next fsync to
      search for modified extents in the fs/subvolume tree instead of relying on
      the presence of extent maps in memory. This way we can continue doing a
      "fast" fsync if the destination range of a clone operation does not
      overlap with an existing range or if any of the criteria necessary to
      remove an extent map at try_release_extent_mapping() is not met (file
      size not bigger then 16Mb or gfp flags do not allow blocking).
      
      CC: stable@vger.kernel.org # 3.16+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ea7fcfc
    • Esben Haabendal's avatar
      i2c: imx: Fix reinit_completion() use · ea464580
      Esben Haabendal authored
      commit 9f9e3e0d
      
       upstream.
      
      Make sure to call reinit_completion() before dma is started to avoid race
      condition where reinit_completion() is called after complete() and before
      wait_for_completion_timeout().
      
      Signed-off-by: default avatarEsben Haabendal <eha@deif.com>
      Fixes: ce1a7884
      
       ("i2c: imx: add DMA support for freescale i2c driver")
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea464580
    • Masami Hiramatsu's avatar
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 60baabc3
      Masami Hiramatsu authored
      commit 73c8d894 upstream.
      
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f
      
       ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60baabc3
    • Vitaly Kuznetsov's avatar
      ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle · ff28e5cc
      Vitaly Kuznetsov authored
      commit a0040c01
      
       upstream.
      
      Hyper-V instances support PCI pass-through which is implemented through PV
      pci-hyperv driver. When a device is passed through, a new root PCI bus is
      created in the guest. The bus sits on top of VMBus and has no associated
      information in ACPI. acpi_pci_add_bus() in this case proceeds all the way
      to acpi_evaluate_dsm(), which reports
      
        ACPI: \: failed to evaluate _DSM (0x1001)
      
      While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected
      against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not
      and gives us the error. It seems the correct fix is to not do anything in
      acpi_pci_add_bus() in such cases.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff28e5cc
    • Theodore Ts'o's avatar
      ext4: fix false negatives *and* false positives in ext4_check_descriptors() · dd69abac
      Theodore Ts'o authored
      commit 44de022c
      
       upstream.
      
      Ext4_check_descriptors() was getting called before s_gdb_count was
      initialized.  So for file systems w/o the meta_bg feature, allocation
      bitmaps could overlap the block group descriptors and ext4 wouldn't
      notice.
      
      For file systems with the meta_bg feature enabled, there was a
      fencepost error which would cause the ext4_check_descriptors() to
      incorrectly believe that the block allocation bitmap overlaps with the
      block group descriptor blocks, and it would reject the mount.
      
      Fix both of these problems.
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Gilbert <bgilbert@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd69abac
    • Dmitry Safonov's avatar
      netlink: Don't shift on 64 for ngroups · 09901e57
      Dmitry Safonov authored
      commit 91874ecf upstream.
      
      It's legal to have 64 groups for netlink_sock.
      
      As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
      only to first 32 groups.
      
      The check for correctness of .bind() userspace supplied parameter
      is done by applying mask made from ngroups shift. Which broke Android
      as they have 64 groups and the shift for mask resulted in an overflow.
      
      Fixes: 61f4b237
      
       ("netlink: Don't shift with UB on nlk->ngroups")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Reported-and-Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09901e57
    • Frederic Weisbecker's avatar
      nohz: Fix missing tick reprogram when interrupting an inline softirq · 2d898915
      Frederic Weisbecker authored
      commit 0a0e0829
      
       upstream.
      
      The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
      a nesting interrupt. This stands as an optimization: whether a hardirq or a
      softirq is interrupted, the tick is going to be reprogrammed when necessary
      at the end of the inner interrupt, with even potential new updates on the
      timer queue.
      
      When soft interrupts are interrupted, it's assumed that they are executing
      on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
      called after softirq processing to take care of the tick reprogramming.
      
      But the assumption is wrong: softirqs can be processed inline as well, ie:
      outside of an interrupt, like in a call to local_bh_enable() or from
      ksoftirqd.
      
      Inline softirqs don't reprogram the tick once they are done, as opposed to
      interrupt tail softirq processing. So if a tick interrupts an inline
      softirq processing, the next timer will neither be reprogrammed from the
      interrupting tick's irq_exit() nor after the interrupted softirq
      processing. This situation may leave the tick unprogrammed while timers are
      armed.
      
      To fix this, simply keep reprogramming the tick even if a softirq has been
      interrupted. That can be optimized further, but for now correctness is more
      important.
      
      Note that new timers enqueued in nohz_full mode after a softirq gets
      interrupted will still be handled just fine through self-IPIs triggered by
      the timer code.
      
      Reported-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org # 4.14+
      Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d898915
    • Anna-Maria Gleixner's avatar
      nohz: Fix local_timer_softirq_pending() · e5bcbeda
      Anna-Maria Gleixner authored
      commit 80d20d35 upstream.
      
      local_timer_softirq_pending() checks whether the timer softirq is
      pending with: local_softirq_pending() & TIMER_SOFTIRQ.
      
      This is wrong because TIMER_SOFTIRQ is the softirq number and not a
      bitmask. So the test checks for the wrong bit.
      
      Use BIT(TIMER_SOFTIRQ) instead.
      
      Fixes: 5d62c183
      
       ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: bigeasy@linutronix.de
      Cc: peterz@infradead.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5bcbeda
    • Thomas Gleixner's avatar
      genirq: Make force irq threading setup more robust · a6d9dacf
      Thomas Gleixner authored
      commit d1f0301b upstream.
      
      The support of force threading interrupts which are set up with both a
      primary and a threaded handler wreckaged the setup of regular requested
      threaded interrupts (primary handler == NULL).
      
      The reason is that it does not check whether the primary handler is set to
      the default handler which wakes the handler thread. Instead it replaces the
      thread handler with the primary handler as it would do with force threaded
      interrupts which have been requested via request_irq(). So both the primary
      and the thread handler become the same which then triggers the warnon that
      the thread handler tries to wakeup a not configured secondary thread.
      
      Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
      when requesting the threaded interrupt, which is normaly caught by the
      sanity checks when force irq threading is disabled.
      
      Fix it by skipping the force threading setup when a regular threaded
      interrupt is requested. As a consequence the interrupt request which lacks
      the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
      it.
      
      Fixes: 2a1d3ab8
      
       ("genirq: Handle force threading of irqs with primary and thread handler")
      Reported-by: default avatarKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6d9dacf
    • Anil Gurumurthy's avatar
      scsi: qla2xxx: Return error when TMF returns · a96feef5
      Anil Gurumurthy authored
      commit b4146c49 upstream.
      
      Propagate the task management completion status properly to avoid
      unnecessary waits for commands to complete.
      
      Fixes: faef62d1
      
       ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAnil Gurumurthy <anil.gurumurthy@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a96feef5
    • Quinn Tran's avatar
      scsi: qla2xxx: Fix ISP recovery on unload · f70766f1
      Quinn Tran authored
      commit b08abbd9 upstream.
      
      During unload process, the chip can encounter problem where a FW dump would
      be captured. For this case, the full reset sequence will be skip to bring
      the chip back to full operational state.
      
      Fixes: e315cd28
      
       ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f70766f1
    • Quinn Tran's avatar
      scsi: qla2xxx: Fix NPIV deletion by calling wait_for_sess_deletion · 01cda405
      Quinn Tran authored
      commit efa93f48 upstream.
      
      Add wait for session deletion to finish before freeing an NPIV scsi host.
      
      Fixes: 726b8548
      
       ("qla2xxx: Add framework for async fabric discovery")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01cda405
    • Quinn Tran's avatar
      scsi: qla2xxx: Fix unintialized List head crash · 43d7c954
      Quinn Tran authored
      commit e3dde080 upstream.
      
      In case of IOCB Queue full or system where memory is low and driver
      receives large number of RSCN storm, the stale sp pointer can stay on
      gpnid_list resulting in page_fault.
      
      This patch fixes this issue by initializing the sp->elem list head and
      removing sp->elem before memory is freed.
      
      Following stack trace is seen
      
       9 [ffff987b37d1bc60] page_fault at ffffffffad516768 [exception RIP: qla24xx_async_gpnid+496]
      10 [ffff987b37d1bd10] qla24xx_async_gpnid at ffffffffc039866d [qla2xxx]
      11 [ffff987b37d1bd80] qla2x00_do_work at ffffffffc036169c [qla2xxx]
      12 [ffff987b37d1be38] qla2x00_do_dpc_all_vps at ffffffffc03adfed [qla2xxx]
      13 [ffff987b37d1be78] qla2x00_do_dpc at ffffffffc036458a [qla2xxx]
      14 [ffff987b37d1bec8] kthread at ffffffffacebae31
      
      Fixes: 2d73ac61
      
       ("scsi: qla2xxx: Serialize GPNID for multiple RSCN")
      Cc: <stable@vger.kernel.org> # v4.17+
      Signed-off-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d7c954
  5. Aug 07, 2018
  6. Aug 06, 2018