Skip to content
  1. Feb 17, 2021
  2. Feb 13, 2021
    • Greg Kroah-Hartman's avatar
    • Phillip Lougher's avatar
      squashfs: add more sanity checks in xattr id lookup · bddcce15
      Phillip Lougher authored
      commit 506220d2 upstream.
      
      Sysbot has reported a warning where a kmalloc() attempt exceeds the
      maximum limit.  This has been identified as corruption of the xattr_ids
      count when reading the xattr id lookup table.
      
      This patch adds a number of additional sanity checks to detect this
      corruption and others.
      
      1. It checks for a corrupted xattr index read from the inode.  This could
         be because the metadata block is uncompressed, or because the
         "compression" bit has been corrupted (turning a compressed block
         into an uncompressed block).  This would cause an out of bounds read.
      
      2. It checks against corruption of the xattr_ids count.  This can either
         lead to the above kmalloc failure, or a smaller than expected
         table to be read.
      
      3. It checks the contents of the index table for corruption.
      
      [phillip@squashfs.org.uk: fix checkpatch issue]
        Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk
      
      Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk
      
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reported-by: default avatar <syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bddcce15
    • Phillip Lougher's avatar
      squashfs: add more sanity checks in inode lookup · 5e22b39b
      Phillip Lougher authored
      commit eabac19e upstream.
      
      Sysbot has reported an "slab-out-of-bounds read" error which has been
      identified as being caused by a corrupted "ino_num" value read from the
      inode.  This could be because the metadata block is uncompressed, or
      because the "compression" bit has been corrupted (turning a compressed
      block into an uncompressed block).
      
      This patch adds additional sanity checks to detect this, and the
      following corruption.
      
      1. It checks against corruption of the inodes count.  This can either
         lead to a larger table to be read, or a smaller than expected
         table to be read.
      
         In the case of a too large inodes count, this would often have been
         trapped by the existing sanity checks, but this patch introduces
         a more exact check, which can identify too small values.
      
      2. It checks the contents of the index table for corruption.
      
      [phillip@squashfs.org.uk: fix checkpatch issue]
        Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk
      
      Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk
      
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reported-by: default avatar <syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e22b39b
    • Phillip Lougher's avatar
      squashfs: add more sanity checks in id lookup · 6634147f
      Phillip Lougher authored
      commit f37aa4c7 upstream.
      
      Sysbot has reported a number of "slab-out-of-bounds reads" and
      "use-after-free read" errors which has been identified as being caused
      by a corrupted index value read from the inode.  This could be because
      the metadata block is uncompressed, or because the "compression" bit has
      been corrupted (turning a compressed block into an uncompressed block).
      
      This patch adds additional sanity checks to detect this, and the
      following corruption.
      
      1. It checks against corruption of the ids count.  This can either
         lead to a larger table to be read, or a smaller than expected
         table to be read.
      
         In the case of a too large ids count, this would often have been
         trapped by the existing sanity checks, but this patch introduces
         a more exact check, which can identify too small values.
      
      2. It checks the contents of the index table for corruption.
      
      Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk
      
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reported-by: default avatar <syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6634147f
    • Phillip Lougher's avatar
      squashfs: avoid out of bounds writes in decompressors · ff3a75bd
      Phillip Lougher authored
      commit e812cbbb upstream.
      
      Patch series "Squashfs: fix BIO migration regression and add sanity checks".
      
      Patch [1/4] fixes a regression introduced by the "migrate from
      ll_rw_block usage to BIO" patch, which has produced a number of
      Sysbot/Syzkaller reports.
      
      Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
      issues which have produced Sysbot reports in the id, inode and xattr
      lookup code.
      
      Each patch has been tested against the Sysbot reproducers using the
      given kernel configuration.  They have the appropriate "Reported-by:"
      lines added.
      
      Additionally, all of the reproducer filesystems are indirectly fixed by
      patch [4/4] due to the fact they all have xattr corruption which is now
      detected there.
      
      Additional testing with other configurations and architectures (32bit,
      big endian), and normal filesystems has also been done to trap any
      inadvertent regressions caused by the additional sanity checks.
      
      This patch (of 4):
      
      This is a regression introduced by the patch "migrate from ll_rw_block
      usage to BIO".
      
      Sysbot/Syskaller has reported a number of "out of bounds writes" and
      "unable to handle kernel paging request in squashfs_decompress" errors
      which have been identified as a regression introduced by the above
      patch.
      
      Specifically, the patch removed the following sanity check
      
              if (length < 0 || length > output->length ||
      		(index + length) > msblk->bytes_used)
      
      This check did two things:
      
      1. It ensured any reads were not beyond the end of the filesystem
      
      2. It ensured that the "length" field read from the filesystem
         was within the expected maximum length.  Without this any
         corrupted values can over-run allocated buffers.
      
      Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
      Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
      Fixes: 93e72b3c
      
       ("squashfs: migrate from ll_rw_block usage to BIO")
      Reported-by: default avatar <syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Cc: Philippe Liard <pliard@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff3a75bd
    • Johannes Weiner's avatar
      Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" · dd0a41bc
      Johannes Weiner authored
      commit e82553c1 upstream.
      
      This reverts commit 536d3bf2, as it can
      cause writers to memory.high to get stuck in the kernel forever,
      performing page reclaim and consuming excessive amounts of CPU cycles.
      
      Before the patch, a write to memory.high would first put the new limit
      in place for the workload, and then reclaim the requested delta.  After
      the patch, the kernel tries to reclaim the delta before putting the new
      limit into place, in order to not overwhelm the workload with a sudden,
      large excess over the limit.  However, if reclaim is actively racing
      with new allocations from the uncurbed workload, it can keep the write()
      working inside the kernel indefinitely.
      
      This is causing problems in Facebook production.  A privileged
      system-level daemon that adjusts memory.high for various workloads
      running on a host can get unexpectedly stuck in the kernel and
      essentially turn into a sort of involuntary kswapd for one of the
      workloads.  We've observed that daemon busy-spin in a write() for
      minutes at a time, neglecting its other duties on the system, and
      expending privileged system resources on behalf of a workload.
      
      To remedy this, we have first considered changing the reclaim logic to
      break out after a couple of loops - whether the workload has converged
      to the new limit or not - and bound the write() call this way.  However,
      the root cause that inspired the sequence change in the first place has
      been fixed through other means, and so a revert back to the proven
      limit-setting sequence, also used by memory.max, is preferable.
      
      The sequence was changed to avoid extreme latencies in the workload when
      the limit was lowered: the sudden, large excess created by the limit
      lowering would erroneously trigger the penalty sleeping code that is
      meant to throttle excessive growth from below.  Allocating threads could
      end up sleeping long after the write() had already reclaimed the delta
      for which they were being punished.
      
      However, erroneous throttling also caused problems in other scenarios at
      around the same time.  This resulted in commit b3ff9291 ("mm, memcg:
      reclaim more aggressively before high allocator throttling"), included
      in the same release as the offending commit.  When allocating threads
      now encounter large excess caused by a racing write() to memory.high,
      instead of entering punitive sleeps, they will simply be tasked with
      helping reclaim down the excess, and will be held no longer than it
      takes to accomplish that.  This is in line with regular limit
      enforcement - i.e.  if the workload allocates up against or over an
      otherwise unchanged limit from below.
      
      With the patch breaking userspace, and the root cause addressed by other
      means already, revert it again.
      
      Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
      Fixes: 536d3bf2
      
       ("mm: memcontrol: avoid workload stalls when lowering memory.high")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd0a41bc
    • Joachim Henke's avatar
      nilfs2: make splice write available again · 237ee288
      Joachim Henke authored
      commit a35d8f01 upstream.
      
      Since 5.10, splice() or sendfile() to NILFS2 return EINVAL.  This was
      caused by commit 36e2c742 ("fs: don't allow splice read/write
      without explicit ops").
      
      This patch initializes the splice_write field in file_operations, like
      most file systems do, to restore the functionality.
      
      Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com
      
      
      Signed-off-by: default avatarJoachim Henke <joachim.henke@t-systems.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>	[5.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      237ee288
    • Ville Syrjälä's avatar
      drm/i915: Skip vswing programming for TBT · 4e78c338
      Ville Syrjälä authored
      commit eaf5bfe3
      
       upstream.
      
      In thunderbolt mode the PHY is owned by the thunderbolt controller.
      We are not supposed to touch it. So skip the vswing programming
      as well (we already skipped the other steps not applicable to TBT).
      
      Touching this stuff could supposedly interfere with the PHY
      programming done by the thunderbolt controller.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-1-ville.syrjala@linux.intel.com
      
      
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      (cherry picked from commit f8c6b615
      
      )
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e78c338
    • Ville Syrjälä's avatar
      drm/i915: Fix ICL MG PHY vswing handling · 43f39b85
      Ville Syrjälä authored
      commit a2a5f562 upstream.
      
      The MH PHY vswing table does have all the entries these days. Get
      rid of the old hacks in the code which claim otherwise.
      
      This hack was totally bogus anyway. The correct way to handle the
      lack of those two entries would have been to declare our max
      vswing and pre-emph to both be level 2.
      
      Cc: José Roberto de Souza <jose.souza@intel.com>
      Cc: Clinton Taylor <clinton.a.taylor@intel.com>
      Fixes: 9f7ffa29
      
       ("drm/i915/tc/icl: Update TC vswing tables")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201207203512.1718-1-ville.syrjala@linux.intel.com
      
      
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarJosé Roberto de Souza <jose.souza@intel.com>
      (cherry picked from commit 5ec34647
      
      )
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43f39b85
    • Daniel Borkmann's avatar
      bpf: Fix verifier jsgt branch analysis on max bound · 67afdc7d
      Daniel Borkmann authored
      commit ee114dd6 upstream.
      
      Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
      code for both will tell the caller whether a given conditional jump is taken
      or not, e.g. 1 means branch will be taken [for the involved registers] and the
      goto target will be executed, 0 means branch will not be taken and instead we
      fall-through to the next insn, and last but not least a -1 denotes that it is
      not known at verification time whether a branch will be taken or not. Now while
      the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
      branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
      since the branch will also be taken for reg->s32_max_value == sval. The jgt
      branch analysis, for example, gets this right.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Fixes: 4f7b3e82
      
       ("bpf: improve verifier branch analysis")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67afdc7d
    • Daniel Borkmann's avatar
      bpf: Fix 32 bit src register truncation on div/mod · 1d16cc21
      Daniel Borkmann authored
      commit e88b2c6e upstream.
      
      While reviewing a different fix, John and I noticed an oddity in one of the
      BPF program dumps that stood out, for example:
      
        # bpftool p d x i 13
         0: (b7) r0 = 808464450
         1: (b4) w4 = 808464432
         2: (bc) w0 = w0
         3: (15) if r0 == 0x0 goto pc+1
         4: (9c) w4 %= w0
        [...]
      
      In line 2 we noticed that the mov32 would 32 bit truncate the original src
      register for the div/mod operation. While for the two operations the dst
      register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
      the src register is not, and thus verifier keeps tracking original bounds,
      simplified:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (b7) r0 = -1
        1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (b7) r1 = -1
        2: R0_w=invP-1 R1_w=invP-1 R10=fp0
        2: (3c) w0 /= w1
        3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
        3: (77) r1 >>= 32
        4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
        4: (bf) r0 = r1
        5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
        5: (95) exit
        processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
      
      Runtime result of r0 at exit is 0 instead of expected -1. Remove the
      verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
      instead. After the fix, we result in the following code generation when
      having dividend r1 and divisor r6:
      
        div, 64 bit:                             div, 32 bit:
      
         0: (b7) r6 = 8                           0: (b7) r6 = 8
         1: (b7) r1 = 8                           1: (b7) r1 = 8
         2: (55) if r6 != 0x0 goto pc+2           2: (56) if w6 != 0x0 goto pc+2
         3: (ac) w1 ^= w1                         3: (ac) w1 ^= w1
         4: (05) goto pc+1                        4: (05) goto pc+1
         5: (3f) r1 /= r6                         5: (3c) w1 /= w6
         6: (b7) r0 = 0                           6: (b7) r0 = 0
         7: (95) exit                             7: (95) exit
      
        mod, 64 bit:                             mod, 32 bit:
      
         0: (b7) r6 = 8                           0: (b7) r6 = 8
         1: (b7) r1 = 8                           1: (b7) r1 = 8
         2: (15) if r6 == 0x0 goto pc+1           2: (16) if w6 == 0x0 goto pc+1
         3: (9f) r1 %= r6                         3: (9c) w1 %= w6
         4: (b7) r0 = 0                           4: (b7) r0 = 0
         5: (95) exit                             5: (95) exit
      
      x86 in particular can throw a 'divide error' exception for div
      instruction not only for divisor being zero, but also for the case
      when the quotient is too large for the designated register. For the
      edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
      since we always zero edx (rdx). Hence really the only protection
      needed is against divisor being zero.
      
      Fixes: 68fda450
      
       ("bpf: fix 32-bit divide by zero")
      Co-developed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d16cc21
    • Daniel Borkmann's avatar
      bpf: Fix verifier jmp32 pruning decision logic · 569033c0
      Daniel Borkmann authored
      commit fd675184 upstream.
      
      Anatoly has been fuzzing with kBdysch harness and reported a hang in
      one of the outcomes:
      
        func#0 @0
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (b7) r0 = 808464450
        1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (b4) w4 = 808464432
        2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
        2: (9c) w4 %= w0
        3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
        3: (66) if w4 s> 0x30303030 goto pc+0
         R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
        4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
        4: (7f) r0 >>= r0
        5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
        5: (9c) w4 %= w0
        6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        6: (66) if w0 s> 0x3030 goto pc+0
         R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
        7: (d6) if w0 s<= 0x303030 goto pc+1
        9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
        9: (95) exit
        propagating r0
      
        from 6 to 7: safe
        4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
        4: (7f) r0 >>= r0
        5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
        5: (9c) w4 %= w0
        6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        6: (66) if w0 s> 0x3030 goto pc+0
         R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        propagating r0
        7: safe
        propagating r0
      
        from 6 to 7: safe
        processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
      
      The underlying program was xlated as follows:
      
        # bpftool p d x i 10
         0: (b7) r0 = 808464450
         1: (b4) w4 = 808464432
         2: (bc) w0 = w0
         3: (15) if r0 == 0x0 goto pc+1
         4: (9c) w4 %= w0
         5: (66) if w4 s> 0x30303030 goto pc+0
         6: (7f) r0 >>= r0
         7: (bc) w0 = w0
         8: (15) if r0 == 0x0 goto pc+1
         9: (9c) w4 %= w0
        10: (66) if w0 s> 0x3030 goto pc+0
        11: (d6) if w0 s<= 0x303030 goto pc+1
        12: (05) goto pc-1
        13: (95) exit
      
      The verifier rewrote original instructions it recognized as dead code with
      'goto pc-1', but reality differs from verifier simulation in that we are
      actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
      
      Taking a closer look at the verifier analysis, the reason is that it misjudges
      its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
      is that while both old/cur registers are marked as precise, they get misjudged
      for the jmp32 case as range_within() yields true, meaning that the prior
      verification path with a wider register bound could be verified successfully
      and therefore the current path with a narrower register bound is deemed safe
      as well whereas in reality it's not. R0 old/cur path's bounds compare as
      follows:
      
        old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
        cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)
      
        old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
        cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff
      
      The 64 bit bounds generally look okay and while the information that got
      propagated from 32 to 64 bit looks correct as well, it's not precise enough
      for judging a conditional jmp32. Given the latter only operates on subregisters
      we also need to take these into account as well for a range_within() probe
      in order to be able to prune paths. Extending the range_within() constraint
      to both bounds will be able to tell us that the old signed 32 bit bounds are
      not wider than the cur signed 32 bit bounds.
      
      With the fix in place, the program will now verify the 'goto' branch case as
      it should have been:
      
        [...]
        6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        6: (66) if w0 s> 0x3030 goto pc+0
         R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
        7: (d6) if w0 s<= 0x303030 goto pc+1
        9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
        9: (95) exit
      
        7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        7: (d6) if w0 s<= 0x303030 goto pc+1
         R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
        8: (30) r0 = *(u8 *)skb[808464432]
        BPF_LD_[ABS|IND] uses reserved fields
        processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1
      
      The bug is quite subtle in the sense that when verifier would determine that
      a given branch is dead code, it would (here: wrongly) remove these instructions
      from the program and hard-wire the taken branch for privileged programs instead
      of the 'goto pc-1' rewrites which will cause hard to debug problems.
      
      Fixes: 3f50f132
      
       ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      569033c0