Skip to content
  1. Oct 30, 2020
    • Laurent Vivier's avatar
      vdpasim: allow to assign a MAC address · 0c86d774
      Laurent Vivier authored
      
      
      Add macaddr parameter to the module to set the MAC address to use
      
      Signed-off-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Link: https://lore.kernel.org/r/20201029122050.776445-3-lvivier@redhat.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      0c86d774
    • Laurent Vivier's avatar
      vdpasim: fix MAC address configuration · 4a6a42db
      Laurent Vivier authored
      vdpa_sim generates a ramdom MAC address but it is never used by upper
      layers because the VIRTIO_NET_F_MAC bit is not set in the features list.
      
      Because of that, virtio-net always regenerates a random MAC address each
      time it is loaded whereas the address should only change on vdpa_sim
      load/unload.
      
      Fix that by adding VIRTIO_NET_F_MAC in the features list of vdpa_sim.
      
      Fixes: 2c53d0f6
      
       ("vdpasim: vDPA device simulator")
      Cc: jasowang@redhat.com
      Signed-off-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Link: https://lore.kernel.org/r/20201029122050.776445-2-lvivier@redhat.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      4a6a42db
    • Zhu Lingshan's avatar
      vdpa: handle irq bypass register failure case · e01afe36
      Zhu Lingshan authored
      
      
      LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as
      a unused variable, so suggest we remove it. Actually it stores
      return value of irq_bypass_register_producer(), but we did not
      check it, we should handle the failure case.
      
      This commit will print a message if irq bypass register producer
      fail, in this case, vqs still remain functional.
      
      Signed-off-by: default avatarZhu Lingshan <lingshan.zhu@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      e01afe36
    • Laurent Vivier's avatar
      vdpa_sim: Fix DMA mask · 1eca16b2
      Laurent Vivier authored
      Since commit f959dcd6
      ("dma-direct: Fix potential NULL pointer dereference")
      an error is reported when we load vdpa_sim and virtio-vdpa:
      
      [  129.351207] net eth0: Unexpected TXQ (0) queue failure: -12
      
      It seems that dma_mask is not initialized.
      
      This patch initializes dma_mask() and calls dma_set_mask_and_coherent()
      to fix the problem.
      
      Full log:
      
      [  128.548628] ------------[ cut here ]------------
      [  128.553268] WARNING: CPU: 23 PID: 1105 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x14c/0x1d0
      [  128.562139] Modules linked in: virtio_net net_failover failover virtio_vdpa vdpa_sim vringh vhost_iotlb vdpa xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfkill intel_rapl_msr intel_rapl_common isst_if_common sunrpc skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit irqbypass drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel iTCO_wdt sysfillrect iTCO_vendor_support sysimgblt rapl fb_sys_fops dcdbas intel_cstate drm acpi_ipmi ipmi_si mei_me dell_smbios intel_uncore ipmi_devintf mei i2c_i801 dell_wmi_descriptor wmi_bmof pcspkr lpc_ich i2c_smbus ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg ahci libahci libata megaraid_sas tg3 crc32c_intel wmi dm_mirror dm_region_hash dm_log
      [  128.562188]  dm_mod
      [  128.651334] CPU: 23 PID: 1105 Comm: NetworkManager Tainted: G S        I       5.10.0-rc1+ #59
      [  128.659939] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.8.1 06/30/2020
      [  128.667419] RIP: 0010:dma_map_page_attrs+0x14c/0x1d0
      [  128.672384] Code: 1c 25 28 00 00 00 0f 85 97 00 00 00 48 83 c4 10 5b 5d 41 5c 41 5d c3 4c 89 da eb d7 48 89 f2 48 2b 50 18 48 89 d0 eb 8d 0f 0b <0f> 0b 48 c7 c0 ff ff ff ff eb c3 48 89 d9 48 8b 40 40 e8 2d a0 aa
      [  128.691131] RSP: 0018:ffffae0f0151f3c8 EFLAGS: 00010246
      [  128.696357] RAX: ffffffffc06b7400 RBX: 00000000000005fa RCX: 0000000000000000
      [  128.703488] RDX: 0000000000000040 RSI: ffffcee3c7861200 RDI: ffff9e2bc16cd000
      [  128.710620] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      [  128.717754] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9e472cb291f8
      [  128.724886] R13: ffff9e2bc14da780 R14: ffff9e472bc20000 R15: ffff9e2bc1b14940
      [  128.732020] FS:  00007f887bae23c0(0000) GS:ffff9e4ac01c0000(0000) knlGS:0000000000000000
      [  128.740105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  128.745852] CR2: 0000562bc09de998 CR3: 00000003c156c006 CR4: 00000000007706e0
      [  128.752982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  128.760114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  128.767247] PKRU: 55555554
      [  128.769961] Call Trace:
      [  128.772418]  virtqueue_add+0x81e/0xb00
      [  128.776176]  virtqueue_add_inbuf_ctx+0x26/0x30
      [  128.780625]  try_fill_recv+0x3a2/0x6e0 [virtio_net]
      [  128.785509]  virtnet_open+0xf9/0x180 [virtio_net]
      [  128.790217]  __dev_open+0xe8/0x180
      [  128.793620]  __dev_change_flags+0x1a7/0x210
      [  128.797808]  dev_change_flags+0x21/0x60
      [  128.801646]  do_setlink+0x328/0x10e0
      [  128.805227]  ? __nla_validate_parse+0x121/0x180
      [  128.809757]  ? __nla_parse+0x21/0x30
      [  128.813338]  ? inet6_validate_link_af+0x5c/0xf0
      [  128.817871]  ? cpumask_next+0x17/0x20
      [  128.821535]  ? __snmp6_fill_stats64.isra.54+0x6b/0x110
      [  128.826676]  ? __nla_validate_parse+0x47/0x180
      [  128.831120]  __rtnl_newlink+0x541/0x8e0
      [  128.834962]  ? __nla_reserve+0x38/0x50
      [  128.838713]  ? security_sock_rcv_skb+0x2a/0x40
      [  128.843158]  ? netlink_deliver_tap+0x2c/0x1e0
      [  128.847518]  ? netlink_attachskb+0x1d8/0x220
      [  128.851793]  ? skb_queue_tail+0x1b/0x50
      [  128.855641]  ? fib6_clean_node+0x43/0x170
      [  128.859652]  ? _cond_resched+0x15/0x30
      [  128.863406]  ? kmem_cache_alloc_trace+0x3a3/0x420
      [  128.868110]  rtnl_newlink+0x43/0x60
      [  128.871602]  rtnetlink_rcv_msg+0x12c/0x380
      [  128.875701]  ? rtnl_calcit.isra.39+0x110/0x110
      [  128.880147]  netlink_rcv_skb+0x50/0x100
      [  128.883987]  netlink_unicast+0x1a5/0x280
      [  128.887913]  netlink_sendmsg+0x23d/0x470
      [  128.891839]  sock_sendmsg+0x5b/0x60
      [  128.895331]  ____sys_sendmsg+0x1ef/0x260
      [  128.899255]  ? copy_msghdr_from_user+0x5c/0x90
      [  128.903702]  ___sys_sendmsg+0x7c/0xc0
      [  128.907369]  ? dev_forward_change+0x130/0x130
      [  128.911731]  ? sysctl_head_finish.part.29+0x24/0x40
      [  128.916616]  ? new_sync_write+0x11f/0x1b0
      [  128.920628]  ? mntput_no_expire+0x47/0x240
      [  128.924727]  __sys_sendmsg+0x57/0xa0
      [  128.928309]  do_syscall_64+0x33/0x40
      [  128.931887]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  128.936937] RIP: 0033:0x7f88792e3857
      [  128.940518] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
      [  128.959263] RSP: 002b:00007ffdca60dea0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [  128.966827] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f88792e3857
      [  128.973960] RDX: 0000000000000000 RSI: 00007ffdca60def0 RDI: 000000000000000c
      [  128.981095] RBP: 00007ffdca60def0 R08: 0000000000000000 R09: 0000000000000000
      [  128.988224] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000000
      [  128.995357] R13: 0000000000000000 R14: 00007ffdca60e0a8 R15: 00007ffdca60e09c
      [  129.002492] CPU: 23 PID: 1105 Comm: NetworkManager Tainted: G S        I       5.10.0-rc1+ #59
      [  129.011093] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.8.1 06/30/2020
      [  129.018571] Call Trace:
      [  129.021027]  dump_stack+0x57/0x6a
      [  129.024346]  __warn.cold.14+0xe/0x3d
      [  129.027925]  ? dma_map_page_attrs+0x14c/0x1d0
      [  129.032283]  report_bug+0xbd/0xf0
      [  129.035602]  handle_bug+0x44/0x80
      [  129.038922]  exc_invalid_op+0x13/0x60
      [  129.042589]  asm_exc_invalid_op+0x12/0x20
      [  129.046602] RIP: 0010:dma_map_page_attrs+0x14c/0x1d0
      [  129.051566] Code: 1c 25 28 00 00 00 0f 85 97 00 00 00 48 83 c4 10 5b 5d 41 5c 41 5d c3 4c 89 da eb d7 48 89 f2 48 2b 50 18 48 89 d0 eb 8d 0f 0b <0f> 0b 48 c7 c0 ff ff ff ff eb c3 48 89 d9 48 8b 40 40 e8 2d a0 aa
      [  129.070311] RSP: 0018:ffffae0f0151f3c8 EFLAGS: 00010246
      [  129.075536] RAX: ffffffffc06b7400 RBX: 00000000000005fa RCX: 0000000000000000
      [  129.082669] RDX: 0000000000000040 RSI: ffffcee3c7861200 RDI: ffff9e2bc16cd000
      [  129.089803] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      [  129.096936] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9e472cb291f8
      [  129.104068] R13: ffff9e2bc14da780 R14: ffff9e472bc20000 R15: ffff9e2bc1b14940
      [  129.111200]  virtqueue_add+0x81e/0xb00
      [  129.114952]  virtqueue_add_inbuf_ctx+0x26/0x30
      [  129.119399]  try_fill_recv+0x3a2/0x6e0 [virtio_net]
      [  129.124280]  virtnet_open+0xf9/0x180 [virtio_net]
      [  129.128984]  __dev_open+0xe8/0x180
      [  129.132390]  __dev_change_flags+0x1a7/0x210
      [  129.136575]  dev_change_flags+0x21/0x60
      [  129.140415]  do_setlink+0x328/0x10e0
      [  129.143994]  ? __nla_validate_parse+0x121/0x180
      [  129.148528]  ? __nla_parse+0x21/0x30
      [  129.152107]  ? inet6_validate_link_af+0x5c/0xf0
      [  129.156639]  ? cpumask_next+0x17/0x20
      [  129.160306]  ? __snmp6_fill_stats64.isra.54+0x6b/0x110
      [  129.165443]  ? __nla_validate_parse+0x47/0x180
      [  129.169890]  __rtnl_newlink+0x541/0x8e0
      [  129.173731]  ? __nla_reserve+0x38/0x50
      [  129.177483]  ? security_sock_rcv_skb+0x2a/0x40
      [  129.181928]  ? netlink_deliver_tap+0x2c/0x1e0
      [  129.186286]  ? netlink_attachskb+0x1d8/0x220
      [  129.190560]  ? skb_queue_tail+0x1b/0x50
      [  129.194401]  ? fib6_clean_node+0x43/0x170
      [  129.198411]  ? _cond_resched+0x15/0x30
      [  129.202163]  ? kmem_cache_alloc_trace+0x3a3/0x420
      [  129.206869]  rtnl_newlink+0x43/0x60
      [  129.210361]  rtnetlink_rcv_msg+0x12c/0x380
      [  129.214462]  ? rtnl_calcit.isra.39+0x110/0x110
      [  129.218908]  netlink_rcv_skb+0x50/0x100
      [  129.222747]  netlink_unicast+0x1a5/0x280
      [  129.226672]  netlink_sendmsg+0x23d/0x470
      [  129.230599]  sock_sendmsg+0x5b/0x60
      [  129.234090]  ____sys_sendmsg+0x1ef/0x260
      [  129.238015]  ? copy_msghdr_from_user+0x5c/0x90
      [  129.242461]  ___sys_sendmsg+0x7c/0xc0
      [  129.246128]  ? dev_forward_change+0x130/0x130
      [  129.250487]  ? sysctl_head_finish.part.29+0x24/0x40
      [  129.255368]  ? new_sync_write+0x11f/0x1b0
      [  129.259381]  ? mntput_no_expire+0x47/0x240
      [  129.263478]  __sys_sendmsg+0x57/0xa0
      [  129.267058]  do_syscall_64+0x33/0x40
      [  129.270639]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  129.275689] RIP: 0033:0x7f88792e3857
      [  129.279268] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
      [  129.298015] RSP: 002b:00007ffdca60dea0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [  129.305581] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f88792e3857
      [  129.312712] RDX: 0000000000000000 RSI: 00007ffdca60def0 RDI: 000000000000000c
      [  129.319846] RBP: 00007ffdca60def0 R08: 0000000000000000 R09: 0000000000000000
      [  129.326978] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000000
      [  129.334109] R13: 0000000000000000 R14: 00007ffdca60e0a8 R15: 00007ffdca60e09c
      [  129.341249] ---[ end trace c551e8028fbaf59d ]---
      [  129.351207] net eth0: Unexpected TXQ (0) queue failure: -12
      [  129.360445] net eth0: Unexpected TXQ (0) queue failure: -12
      [  129.824428] net eth0: Unexpected TXQ (0) queue failure: -12
      
      Fixes: 2c53d0f6
      
       ("vdpasim: vDPA device simulator")
      Signed-off-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Link: https://lore.kernel.org/r/20201027175914.689278-1-lvivier@redhat.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      1eca16b2
    • Michael S. Tsirkin's avatar
      Revert "vhost-vdpa: fix page pinning leakage in error path" · 5e1a3149
      Michael S. Tsirkin authored
      This reverts commit 7ed9e3d9.
      
      The patch creates a DoS risk since it can result in a high order memory
      allocation.
      
      Fixes: 7ed9e3d9
      
       ("vhost-vdpa: fix page pinning leakage in error path")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      5e1a3149
    • Jing Xiangfeng's avatar
      vdpa/mlx5: Fix error return in map_direct_mr() · 7ba08e81
      Jing Xiangfeng authored
      Fix to return the variable "err" from the error handling case instead
      of "ret".
      
      Fixes: 94abbccd
      
       ("vdpa/mlx5: Add shared memory registration code")
      Signed-off-by: default avatarJing Xiangfeng <jingxiangfeng@huawei.com>
      Link: https://lore.kernel.org/r/20201026070637.164321-1-jingxiangfeng@huawei.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarEli Cohen <elic@nvidia.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      7ba08e81
    • Dan Carpenter's avatar
      vhost_vdpa: Return -EFAULT if copy_from_user() fails · 7922460e
      Dan Carpenter authored
      The copy_to/from_user() functions return the number of bytes which we
      weren't able to copy but the ioctl should return -EFAULT if they fail.
      
      Fixes: a127c5bb
      
       ("vhost-vdpa: fix backend feature ioctls")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20201023120853.GI282278@mwanda
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      7922460e
  2. Oct 23, 2020
  3. Oct 21, 2020
  4. Oct 12, 2020
  5. Oct 11, 2020
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · da690031
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some more driver bugfixes for I2C. Including a revert - the updated
        series for it will come during the next merge window"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: owl: Clear NACK and BUS error bits
        Revert "i2c: imx: Fix reset of I2SR_IAL flag"
        i2c: meson: fixup rate calculation with filter delay
        i2c: meson: keep peripheral clock enabled
        i2c: meson: fix clock setting overwrite
        i2c: imx: Fix reset of I2SR_IAL flag
      da690031
    • Vladimir Zapolskiy's avatar
      cifs: Fix incomplete memory allocation on setxattr path · 64b7f674
      Vladimir Zapolskiy authored
      On setxattr() syscall path due to an apprent typo the size of a dynamically
      allocated memory chunk for storing struct smb2_file_full_ea_info object is
      computed incorrectly, to be more precise the first addend is the size of
      a pointer instead of the wanted object size. Coincidentally it makes no
      difference on 64-bit platforms, however on 32-bit targets the following
      memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
      
        =============================================================================
        BUG kmalloc-16 (Not tainted): Redzone overwritten
        -----------------------------------------------------------------------------
      
        Disabling lock debugging due to kernel taint
        INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
        INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
        INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
      
        Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
        Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
        Redzone 79e69a6f: 73 68 32 0a                                      sh2.
        Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
         dump_stack+0x54/0x6e
         print_trailer+0x12c/0x134
         check_bytes_and_report.cold+0x3e/0x69
         check_object+0x18c/0x250
         free_debug_processing+0xfe/0x230
         __slab_free+0x1c0/0x300
         kfree+0x1d3/0x220
         smb2_set_ea+0x27d/0x540
         cifs_xattr_set+0x57f/0x620
         __vfs_setxattr+0x4e/0x60
         __vfs_setxattr_noperm+0x4e/0x100
         __vfs_setxattr_locked+0xae/0xd0
         vfs_setxattr+0x4e/0xe0
         setxattr+0x12c/0x1a0
         path_setxattr+0xa4/0xc0
         __ia32_sys_lsetxattr+0x1d/0x20
         __do_fast_syscall_32+0x40/0x70
         do_fast_syscall_32+0x29/0x60
         do_SYSENTER_32+0x15/0x20
         entry_SYSENTER_32+0x9f/0xf2
      
      Fixes: 5517554e
      
       ("cifs: Add support for writing attributes on SMB2+")
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir@tuxera.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b7f674
    • Hugh Dickins's avatar
      mm/khugepaged: fix filemap page_to_pgoff(page) != offset · 033b5d77
      Hugh Dickins authored
      
      
      There have been elusive reports of filemap_fault() hitting its
      VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
      with CONFIG_READ_ONLY_THP_FOR_FS=y.
      
      Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
      CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
      without NUMA reuses the same huge page after collapse_file() failed
      (whereas NUMA targets its allocation to the respective node each time).
      And most of us were usually testing with CONFIG_NUMA=y kernels.
      
      collapse_file(old start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=old offset
        new_page->mapping = mapping
        xas_store(&xas, new_page)
      
                                filemap_fault
                                  page = find_get_page(mapping, offset)
                                  // if offset falls inside hpage then
                                  // compound_head(page) == hpage
                                  lock_page_maybe_drop_mmap()
                                    __lock_page(page)
      
        // collapse fails
        xas_store(&xas, old page)
        new_page->mapping = NULL
        unlock_page(new_page)
      
      collapse_file(new start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=new offset
        new_page->mapping = mapping // mapping becomes valid again
      
                                  // since compound_head(page) == hpage
                                  // page_to_pgoff(page) got changed
                                  VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
      
      An initial patch replaced __SetPageLocked() by lock_page(), which did
      fix the race which Suren illustrates above.  But testing showed that it's
      not good enough: if the racing task's __lock_page() gets delayed long
      after its find_get_page(), then it may follow collapse_file(new start)'s
      successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
      
      It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
      check and retry (as is done for mapping), with similar relaxations in
      find_lock_entry() and pagecache_get_page(): but it's not obvious what
      else might get caught out; and khugepaged non-NUMA appears to be unique
      in exposing a page to page cache, then revoking, without going through
      a full cycle of freeing before reuse.
      
      Instead, non-NUMA khugepaged_prealloc_page() release the old page
      if anyone else has a reference to it (1% of cases when I tested).
      
      Although never reported on huge tmpfs, I believe its find_lock_entry()
      has been at similar risk; but huge tmpfs does not rely on khugepaged
      for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
      
      Reported-by: default avatarDenis Lisov <dennis.lissov@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
      Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
      
      
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
      
      
      Reported-and-analyzed-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Fixes: 87c460a0
      
       ("mm/khugepaged: collapse_shmem() without freezing new_page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v4.9+
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      033b5d77
  6. Oct 10, 2020