Skip to content
  1. Feb 09, 2018
    • Chuck Lever's avatar
      svcrdma: Fix Read chunk round-up · 175e0310
      Chuck Lever authored
      
      
      A single NFSv4 WRITE compound can often have three operations:
      PUTFH, WRITE, then GETATTR.
      
      When the WRITE payload is sent in a Read chunk, the client places
      the GETATTR in the inline part of the RPC/RDMA message, just after
      the WRITE operation (sans payload). The position value in the Read
      chunk enables the receiver to insert the Read chunk at the correct
      place in the received XDR stream; that is between the WRITE and
      GETATTR.
      
      According to RFC 8166, an NFS/RDMA client does not have to add XDR
      round-up to the Read chunk that carries the WRITE payload. The
      receiver adds XDR round-up padding if it is absent and the
      receiver's XDR decoder requires it to be present.
      
      Commit 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      attempted to add support for receiving such a compound so that just
      the WRITE payload appears in rq_arg's page list, and the trailing
      GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
      whole compound into the head iovec and page list, without regard
      to the alignment of the WRITE payload).
      
      The server transport logic also had to accommodate the optional XDR
      round-up of the Read chunk, which it did simply by lengthening the
      tail iovec when round-up was needed. This approach is adequate for
      the NFSv2 and NFSv3 WRITE decoders.
      
      Unfortunately it is not sufficient for nfsd4_decode_write. When the
      Read chunk length is a couple of bytes less than PAGE_SIZE, the
      computation at the end of nfsd4_decode_write allows argp->pagelen to
      go negative, which breaks the logic in read_buf that looks for the
      tail iovec.
      
      The result is that a WRITE operation whose payload length is just
      less than a multiple of a page succeeds, but the subsequent GETATTR
      in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
      decoder can't find it. Clients ignore the error, but they must
      update their attribute cache via a separate round trip.
      
      As nfsd4_decode_write appears to expect the payload itself to always
      have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
      add the Read chunk XDR round-up to the page_len rather than
      lengthening the tail iovec.
      
      Reported-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Fixes: 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      175e0310
    • Arnd Bergmann's avatar
      NFSD: hide unused svcxdr_dupstr() · 2285ae76
      Arnd Bergmann authored
      
      
      There is now only one caller left for svcxdr_dupstr() and this is inside
      of an #ifdef, so we can get a warning when the option is disabled:
      
      fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function]
      
      This changes the remaining caller to use a nicer IS_ENABLED() check,
      which lets the compiler drop the unused code silently.
      
      Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders")
      Suggested-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2285ae76
    • Amir Goldstein's avatar
      nfsd: store stat times in fill_pre_wcc() instead of inode times · 39ca1bf6
      Amir Goldstein authored
      
      
      The time values in stat and inode may differ for overlayfs and stat time
      values are the correct ones to use. This is also consistent with the fact
      that fill_post_wcc() also stores stat time values.
      
      This means introducing a stat call that could fail, where previously we
      were just copying values out of the inode.  To be conservative about
      changing behavior, we fall back to copying values out of the inode in
      the error case.  It might be better just to clear fh_pre_saved (though
      note the BUG_ON in set_change_info).
      
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      39ca1bf6
    • Amir Goldstein's avatar
      nfsd: encode stat->mtime for getattr instead of inode->i_mtime · 76c47948
      Amir Goldstein authored
      
      
      The values of stat->mtime and inode->i_mtime may differ for overlayfs
      and stat->mtime is the correct value to use when encoding getattr.
      This is also consistent with the fact that other attr times are also
      encoded from stat values.
      
      Both callers of lease_get_mtime() already have the value of stat->mtime,
      so the only needed change is that lease_get_mtime() will not overwrite
      this value with inode->i_mtime in case the inode does not have an
      exclusive lease.
      
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      76c47948
    • J. Bruce Fields's avatar
      nfsd: return RESOURCE not GARBAGE_ARGS on too many ops · 0078117c
      J. Bruce Fields authored
      
      
      A client that sends more than a hundred ops in a single compound
      currently gets an rpc-level GARBAGE_ARGS error.
      
      It would be more helpful to return NFS4ERR_RESOURCE, since that gives
      the client a better idea how to recover (for example by splitting up the
      compound into smaller compounds).
      
      This is all a bit academic since we've never actually seen a reason for
      clients to send such long compounds, but we may as well fix it.
      
      While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
      constant we already use in the 4.1 case, instead of hard-coding 100.
      Chances anyone actually uses even 16 ops per compound are small enough
      that I think there's a neglible risk or any regression.
      
      This fixes pynfs test COMP6.
      
      Reported-by: default avatar"Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      0078117c
  2. Feb 06, 2018
  3. Jan 19, 2018
  4. Dec 22, 2017
    • Benjamin Coddington's avatar
      nfsd4: permit layoutget of executable-only files · 66282ec1
      Benjamin Coddington authored
      
      
      Clients must be able to read a file in order to execute it, and for pNFS
      that means the client needs to be able to perform a LAYOUTGET on the file.
      
      This behavior for executable-only files was added for OPEN in commit
      a043226b "nfsd4: permit read opens of executable-only files".
      
      This fixes up xfstests generic/126 on block/scsi layouts.
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      66282ec1
    • Elena Reshetova's avatar
      lockd: convert nlm_rqst.a_count from atomic_t to refcount_t · d9226ec9
      Elena Reshetova authored
      atomic_t variables are currently used to implement reference
      counters with the following properties:
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable nlm_rqst.a_count is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      **Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57
      
       and it is hopefully soon
      in state to be merged to the documentation tree.
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the nlm_rqst.a_count it might make a difference
      in following places:
       - nlmclnt_release_call() and nlmsvc_release_call(): decrement
         in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: default avatarHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      d9226ec9
    • Elena Reshetova's avatar
      lockd: convert nlm_lockowner.count from atomic_t to refcount_t · 8bb3ea77
      Elena Reshetova authored
      atomic_t variables are currently used to implement reference
      counters with the following properties:
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable nlm_lockowner.count is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      **Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57
      
       and it is hopefully soon
      in state to be merged to the documentation tree.
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the nlm_lockowner.count it might make a difference
      in following places:
       - nlm_put_lockowner(): decrement in refcount_dec_and_lock() only
         provides RELEASE ordering, control dependency on success and
         holds a spin lock on success vs. fully ordered atomic counterpart.
         No changes in spin lock guarantees.
      
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: default avatarHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      8bb3ea77
    • Elena Reshetova's avatar
      lockd: convert nsm_handle.sm_count from atomic_t to refcount_t · be819f7b
      Elena Reshetova authored
      atomic_t variables are currently used to implement reference
      counters with the following properties:
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable nsm_handle.sm_count is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      **Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57
      
       and it is hopefully soon
      in state to be merged to the documentation tree.
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the nsm_handle.sm_count it might make a difference
      in following places:
       - nsm_release(): decrement in refcount_dec_and_lock() only
         provides RELEASE ordering, control dependency on success
         and holds a spin lock on success vs. fully ordered atomic
         counterpart. No change for the spin lock guarantees.
      
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: default avatarHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      be819f7b
  5. Nov 28, 2017
  6. Nov 27, 2017
    • Linus Torvalds's avatar
      Linux 4.15-rc1 · 4fbd8d19
      Linus Torvalds authored
      v4.15-rc1
      4fbd8d19
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · bbecb1cf
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
      
       - LPAE fixes for kernel-readonly regions
      
       - Fix for get_user_pages_fast on LPAE systems
      
       - avoid tying decompressor to a particular platform if DEBUG_LL is
         enabled
      
       - BUG if we attempt to return to userspace but the to-be-restored PSR
         value keeps us in privileged mode (defeating an issue that ftracetest
         found)
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: BUG if jumping to usermode address in kernel mode
        ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
        ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
        ARM: make decompressor debug output user selectable
        ARM: fix get_user_pages_fast
      bbecb1cf
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dec0029a
      Linus Torvalds authored
      Pull irq fixes from Thomas Glexiner:
      
       - unbreak the irq trigger type check for legacy platforms
      
       - a handful fixes for ARM GIC v3/4 interrupt controllers
      
       - a few trivial fixes all over the place
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/matrix: Make - vs ?: Precedence explicit
        irqchip/imgpdc: Use resource_size function on resource object
        irqchip/qcom: Fix u32 comparison with value less than zero
        irqchip/exiu: Fix return value check in exiu_init()
        irqchip/gic-v3-its: Remove artificial dependency on PCI
        irqchip/gic-v4: Add forward definition of struct irq_domain_ops
        irqchip/gic-v3: pr_err() strings should end with newlines
        irqchip/s3c24xx: pr_err() strings should end with newlines
        irqchip/gic-v3: Fix ppi-partitions lookup
        irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
        genirq: Track whether the trigger type has been set
      dec0029a
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02fc87b1
      Linus Torvalds authored
      Pull misc x86 fixes from Ingo Molnar:
       - topology enumeration fixes
       - KASAN fix
       - two entry fixes (not yet the big series related to KASLR)
       - remove obsolete code
       - instruction decoder fix
       - better /dev/mem sanity checks, hopefully working better this time
       - pkeys fixes
       - two ACPI fixes
       - 5-level paging related fixes
       - UMIP fixes that should make application visible faults more debuggable
       - boot fix for weird virtualization environment
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
        x86/decoder: Add new TEST instruction pattern
        x86/PCI: Remove unused HyperTransport interrupt support
        x86/umip: Fix insn_get_code_seg_params()'s return value
        x86/boot/KASLR: Remove unused variable
        x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
        x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
        x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
        x86/pkeys/selftests: Fix protection keys write() warning
        x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
        x86/mpx/selftests: Fix up weird arrays
        x86/pkeys: Update documentation about availability
        x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
        x86/smpboot: Fix __max_logical_packages estimate
        x86/topology: Avoid wasting 128k for package id array
        perf/x86/intel/uncore: Cache logical pkg id in uncore driver
        x86/acpi: Reduce code duplication in mp_override_legacy_irq()
        x86/acpi: Handle SCI interrupts above legacy space gracefully
        x86/boot: Fix boot failure when SMP MP-table is based at 0
        x86/mm: Limit mmap() of /dev/mem to valid physical addresses
        x86/selftests: Add test for mapping placement for 5-level paging
        ...
      02fc87b1