Skip to content
  1. Jan 10, 2020
  2. Jan 09, 2020
  3. Jan 08, 2020
    • Zahari Petkov's avatar
      leds: pca963x: Fix open-drain initialization · c8e1eab1
      Zahari Petkov authored
      commit 69752909 upstream.
      
      Before commit bb29b9cc ("leds: pca963x: Add bindings to invert
      polarity") Mode register 2 was initialized directly with either 0x01
      or 0x05 for open-drain or totem pole (push-pull) configuration.
      
      Afterwards, MODE2 initialization started using bitwise operations on
      top of the default MODE2 register value (0x05). Using bitwise OR for
      setting OUTDRV with 0x01 and 0x05 does not produce correct results.
      When open-drain is used, instead of setting OUTDRV to 0, the driver
      keeps it as 1:
      
      Open-drain: 0x05 | 0x01 -> 0x05 (0b101 - incorrect)
      Totem pole: 0x05 | 0x05 -> 0x05 (0b101 - correct but still wrong)
      
      Now OUTDRV setting uses correct bitwise operations for initialization:
      
      Open-drain: 0x05 & ~0x04 -> 0x01 (0b001 - correct)
      Totem pole: 0x05 | 0x04 -> 0x05 (0b101 - correct)
      
      Additional MODE2 register definitions are introduced now as well.
      
      Fixes: bb29b9cc
      
       ("leds: pca963x: Add bindings to invert polarity")
      Signed-off-by: default avatarZahari Petkov <zahari@balena.io>
      Signed-off-by: default avatarPavel Machek <pavel@ucw.cz>
      c8e1eab1
  4. Jan 07, 2020
  5. Jan 06, 2020
  6. Jan 05, 2020
    • Greg Kroah-Hartman's avatar
      Linux 4.19.93 · 3d40d711
      Greg Kroah-Hartman authored
      3d40d711
    • Christophe Leroy's avatar
      spi: fsl: use platform_get_irq() instead of of_irq_to_resource() · 947e8372
      Christophe Leroy authored
      commit 63aa6a69
      
       upstream.
      
      Unlike irq_of_parse_and_map() which has a dummy definition on SPARC,
      of_irq_to_resource() hasn't.
      
      But as platform_get_irq() can be used instead and is generic, use it.
      
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Suggested-by: default avatarMark Brown <broonie@kernel.org>
      Fixes: 	3194d253
      
       ("spi: fsl: don't map irq during probe")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Link: https://lore.kernel.org/r/091a277fd0b3356dca1e29858c1c96983fc9cb25.1576172743.git.christophe.leroy@c-s.fr
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      947e8372
    • Hans de Goede's avatar
      pinctrl: baytrail: Really serialize all register accesses · bbe414a1
      Hans de Goede authored
      [ Upstream commit 40ecab55 ]
      
      Commit 39ce8150 ("pinctrl: baytrail: Serialize all register access")
      added a spinlock around all register accesses because:
      
      "There is a hardware issue in Intel Baytrail where concurrent GPIO register
       access might result reads of 0xffffffff and writes might get dropped
       completely."
      
      Testing has shown that this does not catch all cases, there are still
      2 problems remaining
      
      1) The original fix uses a spinlock per byt_gpio device / struct,
      additional testing has shown that this is not sufficient concurent
      accesses to 2 different GPIO banks also suffer from the same problem.
      
      This commit fixes this by moving to a single global lock.
      
      2) The original fix did not add a lock around the register accesses in
      the suspend/resume handling.
      
      Since pinctrl-baytrail.c is using normal suspend/resume handlers,
      interrupts are still enabled during suspend/resume handling. Nothing
      should be using the GPIOs when they are being taken down, _but_ the
      GPIOs themselves may still cause interrupts, which are likely to
      use (read) the triggering GPIO. So we need to protect against
      concurrent GPIO register accesses in the suspend/resume handlers too.
      
      This commit fixes this by adding the missing spin_lock / unlock calls.
      
      The 2 fixes together fix the Acer Switch 10 SW5-012 getting completely
      confused after a suspend resume. The DSDT for this device has a bug
      in its _LID method which reprograms the home and power button trigger-
      flags requesting both high and low _level_ interrupts so the IRQs for
      these 2 GPIOs continuously fire. This combined with the saving of
      registers during suspend, triggers concurrent GPIO register accesses
      resulting in saving 0xffffffff as pconf0 value during suspend and then
      when restoring this on resume the pinmux settings get all messed up,
      resulting in various I2C busses being stuck, the wifi no longer working
      and often the tablet simply not coming out of suspend at all.
      
      Cc: stable@vger.kernel.org
      Fixes: 39ce8150
      
       ("pinctrl: baytrail: Serialize all register access")
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bbe414a1
    • David Engraf's avatar
      tty/serial: atmel: fix out of range clock divider handling · 3184e03c
      David Engraf authored
      [ Upstream commit cb47b9f8 ]
      
      Use MCK_DIV8 when the clock divider is > 65535. Unfortunately the mode
      register was already written thus the clock selection is ignored.
      
      Fix by doing the baud rate calulation before setting the mode.
      
      Fixes: 5bf5635a
      
       ("tty/serial: atmel: add fractional baud rate support")
      Signed-off-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Acked-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20191216085403.17050-1-david.engraf@sysgo.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3184e03c
    • Christophe Leroy's avatar
      spi: fsl: don't map irq during probe · aae4cd90
      Christophe Leroy authored
      [ Upstream commit 3194d253 ]
      
      With lastest kernel, the following warning is observed at startup:
      
      [    1.500609] ------------[ cut here ]------------
      [    1.505225] remove_proc_entry: removing non-empty directory 'irq/22', leaking at least 'fsl_spi'
      [    1.514234] WARNING: CPU: 0 PID: 1 at fs/proc/generic.c:682 remove_proc_entry+0x198/0x1c0
      [    1.522403] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-s3k-dev-02248-g93532430a4ff #2564
      [    1.530724] NIP:  c0197694 LR: c0197694 CTR: c0050d80
      [    1.535762] REGS: df4a5af0 TRAP: 0700   Not tainted  (5.4.0-02248-g93532430a4ff)
      [    1.543818] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 22028222  XER: 00000000
      [    1.550524]
      [    1.550524] GPR00: c0197694 df4a5ba8 df4a0000 00000054 00000000 00000000 00004a38 00000010
      [    1.550524] GPR08: c07c5a30 00000800 00000000 00001032 22000208 00000000 c0004b14 00000000
      [    1.550524] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0830000 c07fc078
      [    1.550524] GPR24: c08e8ca0 df665d10 df60ea98 c07c9db8 00000001 df5d5ae3 df5d5a80 df43f8e3
      [    1.585327] NIP [c0197694] remove_proc_entry+0x198/0x1c0
      [    1.590628] LR [c0197694] remove_proc_entry+0x198/0x1c0
      [    1.595829] Call Trace:
      [    1.598280] [df4a5ba8] [c0197694] remove_proc_entry+0x198/0x1c0 (unreliable)
      [    1.605321] [df4a5bd8] [c0067acc] unregister_irq_proc+0x5c/0x70
      [    1.611238] [df4a5bf8] [c005fbc4] free_desc+0x3c/0x80
      [    1.616286] [df4a5c18] [c005fe2c] irq_free_descs+0x70/0xa8
      [    1.621778] [df4a5c38] [c033d3fc] of_fsl_spi_probe+0xdc/0x3cc
      [    1.627525] [df4a5c88] [c02f0f64] platform_drv_probe+0x44/0xa4
      [    1.633350] [df4a5c98] [c02eee44] really_probe+0x1ac/0x418
      [    1.638829] [df4a5cc8] [c02ed3e8] bus_for_each_drv+0x64/0xb0
      [    1.644481] [df4a5cf8] [c02ef950] __device_attach+0xd4/0x128
      [    1.650132] [df4a5d28] [c02ed61c] bus_probe_device+0xa0/0xbc
      [    1.655783] [df4a5d48] [c02ebbe8] device_add+0x544/0x74c
      [    1.661096] [df4a5d88] [c0382b78] of_platform_device_create_pdata+0xa4/0x100
      [    1.668131] [df4a5da8] [c0382cf4] of_platform_bus_create+0x120/0x20c
      [    1.674474] [df4a5df8] [c0382d50] of_platform_bus_create+0x17c/0x20c
      [    1.680818] [df4a5e48] [c0382e88] of_platform_bus_probe+0x9c/0xf0
      [    1.686907] [df4a5e68] [c0751404] __machine_initcall_cmpcpro_cmpcpro_declare_of_platform_devices+0x74/0x1a4
      [    1.696629] [df4a5e98] [c072a4cc] do_one_initcall+0x8c/0x1d4
      [    1.702282] [df4a5ef8] [c072a768] kernel_init_freeable+0x154/0x204
      [    1.708455] [df4a5f28] [c0004b2c] kernel_init+0x18/0x110
      [    1.713769] [df4a5f38] [c00122ac] ret_from_kernel_thread+0x14/0x1c
      [    1.719926] Instruction dump:
      [    1.722889] 2c030000 4182004c 3863ffb0 3c80c05f 80e3005c 388436a0 3c60c06d 7fa6eb78
      [    1.730630] 7fe5fb78 38840280 38634178 4be8c611 <0fe00000> 4bffff6c 3c60c071 7fe4fb78
      [    1.738556] ---[ end trace 05d0720bf2e352e2 ]---
      
      The problem comes from the error path which calls
      irq_dispose_mapping() while the IRQ has been requested with
      devm_request_irq().
      
      IRQ doesn't need to be mapped with irq_of_parse_and_map(). The only
      need is to get the IRQ virtual number. For that, use
      of_irq_to_resource() instead of the
      irq_of_parse_and_map()/irq_dispose_mapping() pair.
      
      Fixes: 500a32ab
      
       ("spi: fsl: Call irq_dispose_mapping in err path")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Link: https://lore.kernel.org/r/518cfb83347d5372748e7fe72f94e2e9443d0d4a.1575905123.git.christophe.leroy@c-s.fr
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aae4cd90
    • Taehee Yoo's avatar
      gtp: avoid zero size hashtable · cd245822
      Taehee Yoo authored
      [ Upstream commit 6a902c0f ]
      
      GTP default hashtable size is 1024 and userspace could set specific
      hashtable size with IFLA_GTP_PDP_HASHSIZE. If hashtable size is set to 0
      from userspace,  hashtable will not work and panic will occur.
      
      Fixes: 459aa660
      
       ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd245822
    • Taehee Yoo's avatar
      gtp: fix an use-after-free in ipv4_pdp_find() · 043a283d
      Taehee Yoo authored
      [ Upstream commit 94dc550a ]
      
      ipv4_pdp_find() is called in TX packet path of GTP.
      ipv4_pdp_find() internally uses gtp->tid_hash to lookup pdp context.
      In the current code, gtp->tid_hash and gtp->addr_hash are freed by
      ->dellink(), which is gtp_dellink().
      But gtp_dellink() would be called while packets are processing.
      So, gtp_dellink() should not free gtp->tid_hash and gtp->addr_hash.
      Instead, dev->priv_destructor() would be used because this callback
      is called after all packet processing safely.
      
      Test commands:
          ip link add veth1 type veth peer name veth2
          ip a a 172.0.0.1/24 dev veth1
          ip link set veth1 up
          ip a a 172.99.0.1/32 dev lo
      
          gtp-link add gtp1 &
      
          gtp-tunnel add gtp1 v1 200 100 172.99.0.2 172.0.0.2
          ip r a  172.99.0.2/32 dev gtp1
          ip link set gtp1 mtu 1500
      
          ip netns add ns2
          ip link set veth2 netns ns2
          ip netns exec ns2 ip a a 172.0.0.2/24 dev veth2
          ip netns exec ns2 ip link set veth2 up
          ip netns exec ns2 ip a a 172.99.0.2/32 dev lo
          ip netns exec ns2 ip link set lo up
      
          ip netns exec ns2 gtp-link add gtp2 &
          ip netns exec ns2 gtp-tunnel add gtp2 v1 100 200 172.99.0.1 172.0.0.1
          ip netns exec ns2 ip r a 172.99.0.1/32 dev gtp2
          ip netns exec ns2 ip link set gtp2 mtu 1500
      
          hping3 172.99.0.2 -2 --flood &
          ip link del gtp1
      
      Splat looks like:
      [   72.568081][ T1195] BUG: KASAN: use-after-free in ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.568916][ T1195] Read of size 8 at addr ffff8880b9a35d28 by task hping3/1195
      [   72.569631][ T1195]
      [   72.569861][ T1195] CPU: 2 PID: 1195 Comm: hping3 Not tainted 5.5.0-rc1 #199
      [   72.570547][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   72.571438][ T1195] Call Trace:
      [   72.571764][ T1195]  dump_stack+0x96/0xdb
      [   72.572171][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.572761][ T1195]  print_address_description.constprop.5+0x1be/0x360
      [   72.573400][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.573971][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.574544][ T1195]  __kasan_report+0x12a/0x16f
      [   72.575014][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.575593][ T1195]  kasan_report+0xe/0x20
      [   72.576004][ T1195]  ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.576577][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
      [ ... ]
      [   72.647671][ T1195] BUG: unable to handle page fault for address: ffff8880b9a35d28
      [   72.648512][ T1195] #PF: supervisor read access in kernel mode
      [   72.649158][ T1195] #PF: error_code(0x0000) - not-present page
      [   72.649849][ T1195] PGD a6c01067 P4D a6c01067 PUD 11fb07067 PMD 11f939067 PTE 800fffff465ca060
      [   72.652958][ T1195] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   72.653834][ T1195] CPU: 2 PID: 1195 Comm: hping3 Tainted: G    B             5.5.0-rc1 #199
      [   72.668062][ T1195] RIP: 0010:ipv4_pdp_find.isra.12+0x86/0x170 [gtp]
      [ ... ]
      [   72.679168][ T1195] Call Trace:
      [   72.679603][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
      [   72.681915][ T1195]  ? ipv4_pdp_find.isra.12+0x170/0x170 [gtp]
      [   72.682513][ T1195]  ? lock_acquire+0x164/0x3b0
      [   72.682966][ T1195]  ? gtp_dev_xmit+0x35e/0x890 [gtp]
      [   72.683481][ T1195]  gtp_dev_xmit+0x3c2/0x890 [gtp]
      [ ... ]
      
      Fixes: 459aa660
      
       ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      043a283d
    • Taehee Yoo's avatar
      gtp: fix wrong condition in gtp_genl_dump_pdp() · 83e00efe
      Taehee Yoo authored
      [ Upstream commit 94a6d9fb ]
      
      gtp_genl_dump_pdp() is ->dumpit() callback of GTP module and it is used
      to dump pdp contexts. it would be re-executed because of dump packet size.
      
      If dump packet size is too big, it saves current dump pointer
      (gtp interface pointer, bucket, TID value) then it restarts dump from
      last pointer.
      Current GTP code allows adding zero TID pdp context but dump code
      ignores zero TID value. So, last dump pointer will not be found.
      
      In addition, this patch adds missing rcu_read_lock() in
      gtp_genl_dump_pdp().
      
      Fixes: 459aa660
      
       ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83e00efe
    • Eric Dumazet's avatar
      tcp: do not send empty skb from tcp_write_xmit() · d3f384f5
      Eric Dumazet authored
      [ Upstream commit 1f85e626 ]
      
      Backport of commit fdfc5c85 ("tcp: remove empty skb from
      write queue in error cases") in linux-4.14 stable triggered
      various bugs. One of them has been fixed in commit ba2ddb43f270
      ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
      we still have crashes in some occasions.
      
      Root-cause is that when tcp_sendmsg() has allocated a fresh
      skb and could not append a fragment before being blocked
      in sk_stream_wait_memory(), tcp_write_xmit() might be called
      and decide to send this fresh and empty skb.
      
      Sending an empty packet is not only silly, it might have caused
      many issues we had in the past with tp->packets_out being
      out of sync.
      
      Fixes: c65f7f00
      
       ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3f384f5
    • Eric Dumazet's avatar
      tcp/dccp: fix possible race __inet_lookup_established() · 28f0d54d
      Eric Dumazet authored
      commit 8dbd76e7 upstream.
      
      Michal Kubecek and Firo Yang did a very nice analysis of crashes
      happening in __inet_lookup_established().
      
      Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
      (via a close()/socket()/listen() cycle) without a RCU grace period,
      I should not have changed listeners linkage in their hash table.
      
      They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
      so that a lookup can detect a socket in a hash list was moved in
      another one.
      
      Since we added code in commit d296ba60 ("soreuseport: Resolve
      merge conflict for v4/v6 ordering fix"), we have to add
      hlist_nulls_add_tail_rcu() helper.
      
      Fixes: 3b24d854
      
       ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reported-by: default avatarFiro Yang <firo.yang@suse.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
      
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      [stable-4.19: we also need to update code in __inet_lookup_listener() and
       inet6_lookup_listener() which has been removed in 5.0-rc1.]
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28f0d54d
    • Russell King's avatar
      net: marvell: mvpp2: phylink requires the link interrupt · 3182439a
      Russell King authored
      [ Upstream commit f3f2364e ]
      
      phylink requires the MAC to report when its link status changes when
      operating in inband modes.  Failure to report link status changes
      means that phylink has no idea when the link events happen, which
      results in either the network interface's carrier remaining up or
      remaining permanently down.
      
      For example, with a fiber module, if the interface is brought up and
      link is initially established, taking the link down at the far end
      will cut the optical power.  The SFP module's LOS asserts, we
      deactivate the link, and the network interface reports no carrier.
      
      When the far end is brought back up, the SFP module's LOS deasserts,
      but the MAC may be slower to establish link.  If this happens (which
      in my tests is a certainty) then phylink never hears that the MAC
      has established link with the far end, and the network interface is
      stuck reporting no carrier.  This means the interface is
      non-functional.
      
      Avoiding the link interrupt when we have phylink is basically not
      an option, so remove the !port->phylink from the test.
      
      Fixes: 4bb04326
      
       ("net: mvpp2: phylink support")
      Tested-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Tested-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3182439a
    • Taehee Yoo's avatar
      gtp: do not allow adding duplicate tid and ms_addr pdp context · 013b344d
      Taehee Yoo authored
      [ Upstream commit 6b01b1d9 ]
      
      GTP RX packet path lookups pdp context with TID. If duplicate TID pdp
      contexts are existing in the list, it couldn't select correct pdp context.
      So, TID value  should be unique.
      GTP TX packet path lookups pdp context with ms_addr. If duplicate ms_addr pdp
      contexts are existing in the list, it couldn't select correct pdp context.
      So, ms_addr value should be unique.
      
      Fixes: 459aa660
      
       ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      013b344d
    • Hangbin Liu's avatar
      net/dst: do not confirm neighbor for vxlan and geneve pmtu update · a36b275f
      Hangbin Liu authored
      [ Upstream commit f081042d ]
      
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      So disable the neigh confirm for vxlan and geneve pmtu update.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
      Fixes: 52a589d5
      
       ("geneve: update skb dst pmtu on tx path")
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Tested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a36b275f
    • Hangbin Liu's avatar
      sit: do not confirm neighbor when do pmtu update · d7bd38cf
      Hangbin Liu authored
      [ Upstream commit 4d42df46
      
       ]
      
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7bd38cf
    • Hangbin Liu's avatar
      vti: do not confirm neighbor when do pmtu update · 47353db5
      Hangbin Liu authored
      [ Upstream commit 8247a79e
      
       ]
      
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      Although vti and vti6 are immune to this problem because they are IFF_NOARP
      interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
      here.
      
      v5: Update commit description.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47353db5
    • Hangbin Liu's avatar
      tunnel: do not confirm neighbor when do pmtu update · 2c062783
      Hangbin Liu authored
      [ Upstream commit 7a1592bc ]
      
      When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No Change.
      v4: Update commit description
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Fixes: 0dec879f
      
       ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Tested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c062783
    • Hangbin Liu's avatar
      net/dst: add new function skb_dst_update_pmtu_no_confirm · dba3ed56
      Hangbin Liu authored
      [ Upstream commit 07dc35c6
      
       ]
      
      Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
      update pmtu but should not do neighbor confirm.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dba3ed56
    • Hangbin Liu's avatar
      gtp: do not confirm neighbor when do pmtu update · 9ef0de68
      Hangbin Liu authored
      [ Upstream commit 6e9105c7
      
       ]
      
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not
      call dst_confirm_neigh(), we still set it to false to keep consistency with
      IPv6 code.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ef0de68
    • Hangbin Liu's avatar
      ip6_gre: do not confirm neighbor when do pmtu update · 2e9f6ff2
      Hangbin Liu authored
      [ Upstream commit 675d76ad
      
       ]
      
      When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
      This will cause the neigh cache be refreshed and set to REACHABLE before
      xmit.
      
      But if the remote mac address changed, e.g. device is deleted and recreated,
      we will not able to notice this and still use the old mac address as the neigh
      cache is REACHABLE.
      
      Fix this by disable neigh confirm when do pmtu update
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e9f6ff2
    • Hangbin Liu's avatar
      net: add bool confirm_neigh parameter for dst_ops.update_pmtu · 8bf95f28
      Hangbin Liu authored
      [ Upstream commit bd085ef6
      
       ]
      
      The MTU update code is supposed to be invoked in response to real
      networking events that update the PMTU. In IPv6 PMTU update function
      __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
      confirmed time.
      
      But for tunnel code, it will call pmtu before xmit, like:
        - tnl_update_pmtu()
          - skb_dst_update_pmtu()
            - ip6_rt_update_pmtu()
              - __ip6_rt_update_pmtu()
                - dst_confirm_neigh()
      
      If the tunnel remote dst mac address changed and we still do the neigh
      confirm, we will not be able to update neigh cache and ping6 remote
      will failed.
      
      So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
      should not be invoking dst_confirm_neigh() as we have no evidence
      of successful two-way communication at this point.
      
      On the other hand it is also important to keep the neigh reachability fresh
      for TCP flows, so we cannot remove this dst_confirm_neigh() call.
      
      To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
      to choose whether we should do neigh update or not. I will add the parameter
      in this patch and set all the callers to true to comply with the previous
      way, and fix the tunnel code one by one on later patches.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Suggested-by: default avatarDavid Miller <davem@davemloft.net>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bf95f28
    • Stefano Garzarella's avatar
      vhost/vsock: accept only packets with the right dst_cid · c483ef73
      Stefano Garzarella authored
      [ Upstream commit 8a3cc29c
      
       ]
      
      When we receive a new packet from the guest, we check if the
      src_cid is correct, but we forgot to check the dst_cid.
      
      The host should accept only packets where dst_cid is
      equal to the host CID.
      
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c483ef73
    • Antonio Messina's avatar
      udp: fix integer overflow while computing available space in sk_rcvbuf · 4840b6a7
      Antonio Messina authored
      [ Upstream commit feed8a4f
      
       ]
      
      When the size of the receive buffer for a socket is close to 2^31 when
      computing if we have enough space in the buffer to copy a packet from
      the queue to the buffer we might hit an integer overflow.
      
      When an user set net.core.rmem_default to a value close to 2^31 UDP
      packets are dropped because of this overflow. This can be visible, for
      instance, with failure to resolve hostnames.
      
      This can be fixed by casting sk_rcvbuf (which is an int) to unsigned
      int, similarly to how it is done in TCP.
      
      Signed-off-by: default avatarAntonio Messina <amessina@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4840b6a7
    • Cambda Zhu's avatar
      tcp: Fix highest_sack and highest_sack_seq · 6ed6ab5c
      Cambda Zhu authored
      [ Upstream commit 85369750 ]
      
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d
      
       ("tcp: highest_sack fix")
      Signed-off-by: default avatarCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ed6ab5c
    • Vladis Dronov's avatar
      ptp: fix the race between the release of ptp_clock and cdev · 0393b872
      Vladis Dronov authored
      [ Upstream commit a33121e5 ]
      
      In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
      device is removed, closing this file leads to a race. This reproduces
      easily in a kvm virtual machine:
      
      ts# cat openptp0.c
      int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
      ts# uname -r
      5.5.0-rc3-46cf053e
      ts# cat /proc/cmdline
      ... slub_debug=FZP
      ts# modprobe ptp_kvm
      ts# ./openptp0 &
      [1] 670
      opened /dev/ptp0, sleeping 10s...
      ts# rmmod ptp_kvm
      ts# ls /dev/ptp*
      ls: cannot access '/dev/ptp*': No such file or directory
      ts# ...woken up
      [   48.010809] general protection fault: 0000 [#1] SMP
      [   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
      [   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
      [   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
      [   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
      [   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
      [   48.019470] ...                                              ^^^ a slub poison
      [   48.023854] Call Trace:
      [   48.024050]  __fput+0x21f/0x240
      [   48.024288]  task_work_run+0x79/0x90
      [   48.024555]  do_exit+0x2af/0xab0
      [   48.024799]  ? vfs_write+0x16a/0x190
      [   48.025082]  do_group_exit+0x35/0x90
      [   48.025387]  __x64_sys_exit_group+0xf/0x10
      [   48.025737]  do_syscall_64+0x3d/0x130
      [   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   48.026479] RIP: 0033:0x7f53b12082f6
      [   48.026792] ...
      [   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
      [   48.045001] Fixing recursive fault but reboot is needed!
      
      This happens in:
      
      static void __fput(struct file *file)
      {   ...
          if (file->f_op->release)
              file->f_op->release(inode, file); <<< cdev is kfree'd here
          if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
                   !(mode & FMODE_PATH))) {
              cdev_put(inode->i_cdev); <<< cdev fields are accessed here
      
      Namely:
      
      __fput()
        posix_clock_release()
          kref_put(&clk->kref, delete_clock) <<< the last reference
            delete_clock()
              delete_ptp_clock()
                kfree(ptp) <<< cdev is embedded in ptp
        cdev_put
          module_put(p->owner) <<< *p is kfree'd, bang!
      
      Here cdev is embedded in posix_clock which is embedded in ptp_clock.
      The race happens because ptp_clock's lifetime is controlled by two
      refcounts: kref and cdev.kobj in posix_clock. This is wrong.
      
      Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
      created especially for such cases. This way the parent device with its
      ptp_clock is not released until all references to the cdev are released.
      This adds a requirement that an initialized but not exposed struct
      device should be provided to posix_clock_register() by a caller instead
      of a simple dev_t.
      
      This approach was adopted from the commit 72139dfa ("watchdog: Fix
      the race between the release of watchdog_core_data and cdev"). See
      details of the implementation in the commit 233ed09d ("chardev: add
      helper function to register char devs with a struct device").
      
      Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
      
      
      Analyzed-by: default avatarStephen Johnston <sjohnsto@redhat.com>
      Analyzed-by: default avatarVern Lovejoy <vlovejoy@redhat.com>
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0393b872
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs · bd52fa84
      Martin Blumenstingl authored
      [ Upstream commit bd6f4854 ]
      
      GXBB and newer SoCs use the fixed FCLK_DIV2 (1GHz) clock as input for
      the m250_sel clock. Meson8b and Meson8m2 use MPLL2 instead, whose rate
      can be adjusted at runtime.
      
      So far we have been running MPLL2 with ~250MHz (and the internal
      m250_div with value 1), which worked enough that we could transfer data
      with an TX delay of 4ns. Unfortunately there is high packet loss with
      an RGMII PHY when transferring data (receiving data works fine though).
      Odroid-C1's u-boot is running with a TX delay of only 2ns as well as
      the internal m250_div set to 2 - no lost (TX) packets can be observed
      with that setting in u-boot.
      
      Manual testing has shown that the TX packet loss goes away when using
      the following settings in Linux (the vendor kernel uses the same
      settings):
      - MPLL2 clock set to ~500MHz
      - m250_div set to 2
      - TX delay set to 2ns on the MAC side
      
      Update the m250_div divider settings to only accept dividers greater or
      equal 2 to fix the TX delay generated by the MAC.
      
      iperf3 results before the change:
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec   182 MBytes   153 Mbits/sec  514      sender
      [  5]   0.00-10.00  sec   182 MBytes   152 Mbits/sec           receiver
      
      iperf3 results after the change (including an updated TX delay of 2ns):
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-10.00  sec   927 MBytes   778 Mbits/sec    0      sender
      [  5]   0.00-10.01  sec   927 MBytes   777 Mbits/sec           receiver
      
      Fixes: 4f6a71b8
      
       ("net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration")
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd52fa84
    • Vladyslav Tarasiuk's avatar
      net/mlxfw: Fix out-of-memory error in mfa2 flash burning · 6538eea7
      Vladyslav Tarasiuk authored
      [ Upstream commit a5bcd72e ]
      
      The burning process requires to perform internal allocations of large
      chunks of memory. This memory doesn't need to be contiguous and can be
      safely allocated by vzalloc() instead of kzalloc(). This patch changes
      such allocation to avoid possible out-of-memory failure.
      
      Fixes: 410ed13c
      
       ("Add the mlxfw module for Mellanox firmware flash process")
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Tested-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6538eea7
    • Netanel Belgazal's avatar
      net: ena: fix napi handler misbehavior when the napi budget is zero · 3990d690
      Netanel Belgazal authored
      [ Upstream commit 24dee0c7 ]
      
      In netpoll the napi handler could be called with budget equal to zero.
      Current ENA napi handler doesn't take that into consideration.
      
      The napi handler handles Rx packets in a do-while loop.
      Currently, the budget check happens only after decrementing the
      budget, therefore the napi handler, in rare cases, could run over
      MAX_INT packets.
      
      In addition to that, this moves all budget related variables to int
      calculation and stop mixing u32 to avoid ambiguity
      
      Fixes: 1738cd3e
      
       ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3990d690
    • Eric Dumazet's avatar
      hrtimer: Annotate lockless access to timer->state · 779807c7
      Eric Dumazet authored
      commit 56144737
      
       upstream.
      
      syzbot reported various data-race caused by hrtimer_is_queued() reading
      timer->state. A READ_ONCE() is required there to silence the warning.
      
      Also add the corresponding WRITE_ONCE() when timer->state is set.
      
      In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
      loading timer->state twice.
      
      KCSAN reported these cases:
      
      BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check
      
      write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
       __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
       __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
       __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
       hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
       tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
       tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
       tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
       tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
       tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
       tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
       sk_backlog_rcv include/net/sock.h:945 [inline]
       __release_sock+0x135/0x1e0 net/core/sock.c:2435
       release_sock+0x61/0x160 net/core/sock.c:2951
       sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
       tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
       tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
       inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
      
      BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check
      
      write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
       __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
       __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
       __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
       hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0xbb/0xe0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
      
      read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
       __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
       tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
       tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
       sk_backlog_rcv include/net/sock.h:945 [inline]
       __release_sock+0x135/0x1e0 net/core/sock.c:2435
       release_sock+0x61/0x160 net/core/sock.c:2951
       sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
       tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
       tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
       inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       __sys_sendto+0x21f/0x320 net/socket.c:1952
       __do_sys_sendto net/socket.c:1964 [inline]
       __se_sys_sendto net/socket.c:1960 [inline]
       __x64_sys_sendto+0x89/0xb0 net/socket.c:1960
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      [ tglx: Added comments ]
      
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      779807c7
    • Eric Dumazet's avatar
      net: icmp: fix data-race in cmp_global_allow() · 0c78a0e6
      Eric Dumazet authored
      commit bbab7ef2 upstream.
      
      This code reads two global variables without protection
      of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
      avoid load/store-tearing and better document the intent.
      
      KCSAN reported :
      BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow
      
      read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
       icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
      
      write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
       icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 4cdf507d
      
       ("icmp: add a global rate limitation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c78a0e6
    • Eric Dumazet's avatar
      net: add a READ_ONCE() in skb_peek_tail() · 9079830e
      Eric Dumazet authored
      commit f8cc62ca
      
       upstream.
      
      skb_peek_tail() can be used without protection of a lock,
      as spotted by KCSAN [1]
      
      In order to avoid load-stearing, add a READ_ONCE()
      
      Note that the corresponding WRITE_ONCE() are already there.
      
      [1]
      BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail
      
      read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1:
       skb_peek_tail include/linux/skbuff.h:1784 [inline]
       sk_wait_data+0x15b/0x250 net/core/sock.c:2477
       kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103
       kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145
       kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206
       kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370
       __strp_recv+0x348/0xf50 net/strparser/strparser.c:309
       strp_recv+0x84/0xa0 net/strparser/strparser.c:343
       tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639
       strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366
       do_strp_work net/strparser/strparser.c:414 [inline]
       strp_work+0x9a/0xe0 net/strparser/strparser.c:423
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kstrp strp_work
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9079830e
    • Eric Dumazet's avatar
      inetpeer: fix data-race in inet_putpeer / inet_putpeer · 610f01f1
      Eric Dumazet authored
      commit 71685eb4 upstream.
      
      We need to explicitely forbid read/store tearing in inet_peer_gc()
      and inet_putpeer().
      
      The following syzbot report reminds us about inet_putpeer()
      running without a lock held.
      
      BUG: KCSAN: data-race in inet_putpeer / inet_putpeer
      
      write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 0:
       inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
       ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
       inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
       __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
       rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
       rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
       rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0xbb/0xe0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
       native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
       arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
       default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
       cpuidle_idle_call kernel/sched/idle.c:154 [inline]
       do_idle+0x1af/0x280 kernel/sched/idle.c:263
      
      write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 1:
       inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
       ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
       inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
       __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
       rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
       rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
       rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 4b9d9be8
      
       ("inetpeer: remove unused list")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      610f01f1
    • Eric Dumazet's avatar
      netfilter: bridge: make sure to pull arp header in br_nf_forward_arp() · 2ad86afc
      Eric Dumazet authored
      commit 56042858 upstream.
      
      syzbot is kind enough to remind us we need to call skb_may_pull()
      
      BUG: KMSAN: uninit-value in br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
      CPU: 1 PID: 11631 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
       __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
       br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
       nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
       nf_hook_slow+0x18b/0x3f0 net/netfilter/core.c:512
       nf_hook include/linux/netfilter.h:260 [inline]
       NF_HOOK include/linux/netfilter.h:303 [inline]
       __br_forward+0x78f/0xe30 net/bridge/br_forward.c:109
       br_flood+0xef0/0xfe0 net/bridge/br_forward.c:234
       br_handle_frame_finish+0x1a77/0x1c20 net/bridge/br_input.c:162
       nf_hook_bridge_pre net/bridge/br_input.c:245 [inline]
       br_handle_frame+0xfb6/0x1eb0 net/bridge/br_input.c:348
       __netif_receive_skb_core+0x20b9/0x51a0 net/core/dev.c:4830
       __netif_receive_skb_one_core net/core/dev.c:4927 [inline]
       __netif_receive_skb net/core/dev.c:5043 [inline]
       process_backlog+0x610/0x13c0 net/core/dev.c:5874
       napi_poll net/core/dev.c:6311 [inline]
       net_rx_action+0x7a6/0x1aa0 net/core/dev.c:6379
       __do_softirq+0x4a1/0x83a kernel/softirq.c:293
       do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1091
       </IRQ>
       do_softirq kernel/softirq.c:338 [inline]
       __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:190
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
       __dev_queue_xmit+0x38e8/0x4200 net/core/dev.c:3819
       dev_queue_xmit+0x4b/0x60 net/core/dev.c:3825
       packet_snd net/packet/af_packet.c:2959 [inline]
       packet_sendmsg+0x8234/0x9100 net/packet/af_packet.c:2984
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       __sys_sendto+0xc44/0xc70 net/socket.c:1952
       __do_sys_sendto net/socket.c:1964 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1960
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1960
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45a679
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0a3c9e5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 000000000045a679
      RDX: 000000000000000e RSI: 0000000020000200 RDI: 0000000000000003
      RBP: 000000000075bf20 R08: 00000000200000c0 R09: 0000000000000014
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0a3c9e66d4
      R13: 00000000004c8ec1 R14: 00000000004dfe28 R15: 00000000ffffffff
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
       kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
       kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
       slab_alloc_node mm/slub.c:2773 [inline]
       __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       alloc_skb_with_frags+0x18c/0xa80 net/core/skbuff.c:5662
       sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2244
       packet_alloc_skb net/packet/af_packet.c:2807 [inline]
       packet_snd net/packet/af_packet.c:2902 [inline]
       packet_sendmsg+0x63a6/0x9100 net/packet/af_packet.c:2984
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       __sys_sendto+0xc44/0xc70 net/socket.c:1952
       __do_sys_sendto net/socket.c:1964 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1960
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1960
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: c4e70a87
      
       ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ad86afc