Skip to content
  1. Mar 24, 2021
    • Sagi Grimberg's avatar
      nvmet: don't check iosqes,iocqes for discovery controllers · 09537286
      Sagi Grimberg authored
      commit d218a8a3 upstream.
      
      From the base spec, Figure 78:
      
        "Controller Configuration, these fields are defined as parameters to
         configure an "I/O Controller (IOC)" and not to configure a "Discovery
         Controller (DC).
      
         ...
         If the controller does not support I/O queues, then this field shall
         be read-only with a value of 0h
      
      Just perform this check for I/O controllers.
      
      Fixes: a07b4970
      
       ("nvmet: add a generic NVMe target")
      Reported-by: default avatarBelanger, Martin <Martin.Belanger@dell.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09537286
    • Filipe Manana's avatar
      btrfs: fix race when cloning extent buffer during rewind of an old root · 0fbf4100
      Filipe Manana authored
      commit dbcc7d57
      
       upstream.
      
      While resolving backreferences, as part of a logical ino ioctl call or
      fiemap, we can end up hitting a BUG_ON() when replaying tree mod log
      operations of a root, triggering a stack trace like the following:
      
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.c:1210!
        invalid opcode: 0000 [#1] SMP KASAN PTI
        CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G        W         5.11.0-2d11c0084b02-misc-next+ #89
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
        Code: 05 48 8d 74 10 (...)
        RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297
        RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6
        RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c
        RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20
        R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee
        R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c
        FS:  00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0
        Call Trace:
         btrfs_search_old_slot+0x265/0x10d0
         ? lock_acquired+0xbb/0x600
         ? btrfs_search_slot+0x1090/0x1090
         ? free_extent_buffer.part.61+0xd7/0x140
         ? free_extent_buffer+0x13/0x20
         resolve_indirect_refs+0x3e9/0xfc0
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? add_prelim_ref.part.11+0x150/0x150
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? lock_acquired+0xbb/0x600
         ? __kasan_check_write+0x14/0x20
         ? do_raw_spin_unlock+0xa8/0x140
         ? rb_insert_color+0x30/0x360
         ? prelim_ref_insert+0x12d/0x430
         find_parent_nodes+0x5c3/0x1830
         ? resolve_indirect_refs+0xfc0/0xfc0
         ? lock_release+0xc8/0x620
         ? fs_reclaim_acquire+0x67/0xf0
         ? lock_acquire+0xc7/0x510
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x160/0x210
         ? lock_release+0xc8/0x620
         ? fs_reclaim_acquire+0x67/0xf0
         ? lock_acquire+0xc7/0x510
         ? poison_range+0x38/0x40
         ? unpoison_range+0x14/0x40
         ? trace_hardirqs_on+0x55/0x120
         btrfs_find_all_roots_safe+0x142/0x1e0
         ? find_parent_nodes+0x1830/0x1830
         ? btrfs_inode_flags_to_xflags+0x50/0x50
         iterate_extent_inodes+0x20e/0x580
         ? tree_backref_for_extent+0x230/0x230
         ? lock_downgrade+0x3d0/0x3d0
         ? read_extent_buffer+0xdd/0x110
         ? lock_downgrade+0x3d0/0x3d0
         ? __kasan_check_read+0x11/0x20
         ? lock_acquired+0xbb/0x600
         ? __kasan_check_write+0x14/0x20
         ? _raw_spin_unlock+0x22/0x30
         ? __kasan_check_write+0x14/0x20
         iterate_inodes_from_logical+0x129/0x170
         ? iterate_inodes_from_logical+0x129/0x170
         ? btrfs_inode_flags_to_xflags+0x50/0x50
         ? iterate_extent_inodes+0x580/0x580
         ? __vmalloc_node+0x92/0xb0
         ? init_data_container+0x34/0xb0
         ? init_data_container+0x34/0xb0
         ? kvmalloc_node+0x60/0x80
         btrfs_ioctl_logical_to_ino+0x158/0x230
         btrfs_ioctl+0x205e/0x4040
         ? __might_sleep+0x71/0xe0
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? getrusage+0x4b6/0x9c0
         ? __kasan_check_read+0x11/0x20
         ? lock_release+0xc8/0x620
         ? __might_fault+0x64/0xd0
         ? lock_acquire+0xc7/0x510
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? __kasan_check_read+0x11/0x20
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? lock_downgrade+0x3d0/0x3d0
         ? lockdep_hardirqs_on_prepare+0x210/0x210
         ? __kasan_check_read+0x11/0x20
         ? lock_release+0xc8/0x620
         ? __task_pid_nr_ns+0xd3/0x250
         ? lock_acquire+0xc7/0x510
         ? __fget_files+0x160/0x230
         ? __fget_light+0xf2/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fd1976e2427
        Code: 00 00 90 48 8b 05 (...)
        RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427
        RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004
        RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120
        R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004
        R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8
        Modules linked in:
        ---[ end trace ec8931a1c36e57be ]---
      
        (gdb) l *(__tree_mod_log_rewind+0x3b1)
        0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210).
        1205                     * the modification. as we're going backwards, we do the
        1206                     * opposite of each operation here.
        1207                     */
        1208                    switch (tm->op) {
        1209                    case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
        1210                            BUG_ON(tm->slot < n);
        1211                            fallthrough;
        1212                    case MOD_LOG_KEY_REMOVE_WHILE_MOVING:
        1213                    case MOD_LOG_KEY_REMOVE:
        1214                            btrfs_set_node_key(eb, &tm->key, tm->slot);
      
      Here's what happens to hit that BUG_ON():
      
      1) We have one tree mod log user (through fiemap or the logical ino ioctl),
         with a sequence number of 1, so we have fs_info->tree_mod_seq == 1;
      
      2) Another task is at ctree.c:balance_level() and we have eb X currently as
         the root of the tree, and we promote its single child, eb Y, as the new
         root.
      
         Then, at ctree.c:balance_level(), we call:
      
            tree_mod_log_insert_root(eb X, eb Y, 1);
      
      3) At tree_mod_log_insert_root() we create tree mod log elements for each
         slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each
         with a ->logical pointing to ebX->start. These are placed in an array
         named tm_list.
         Lets assume there are N elements (N pointers in eb X);
      
      4) Then, still at tree_mod_log_insert_root(), we create a tree mod log
         element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to
         ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set
         to the level of eb X and ->generation set to the generation of eb X;
      
      5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
         tm_list as argument. After that, tree_mod_log_free_eb() calls
         __tree_mod_log_insert() for each member of tm_list in reverse order,
         from highest slot in eb X, slot N - 1, to slot 0 of eb X;
      
      6) __tree_mod_log_insert() sets the sequence number of each given tree mod
         log operation - it increments fs_info->tree_mod_seq and sets
         fs_info->tree_mod_seq as the sequence number of the given tree mod log
         operation.
      
         This means that for the tm_list created at tree_mod_log_insert_root(),
         the element corresponding to slot 0 of eb X has the highest sequence
         number (1 + N), and the element corresponding to the last slot has the
         lowest sequence number (2);
      
      7) Then, after inserting tm_list's elements into the tree mod log rbtree,
         the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest
         sequence number, which is N + 2;
      
      8) Back to ctree.c:balance_level(), we free eb X by calling
         btrfs_free_tree_block() on it. Because eb X was created in the current
         transaction, has no other references and writeback did not happen for
         it, we add it back to the free space cache/tree;
      
      9) Later some other task T allocates the metadata extent from eb X, since
         it is marked as free space in the space cache/tree, and uses it as a
         node for some other btree;
      
      10) The tree mod log user task calls btrfs_search_old_slot(), which calls
          get_old_root(), and finally that calls __tree_mod_log_oldest_root()
          with time_seq == 1 and eb_root == eb Y;
      
      11) First iteration of the while loop finds the tree mod log element with
          sequence number N + 2, for the logical address of eb Y and of type
          MOD_LOG_ROOT_REPLACE;
      
      12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out
          of the loop, and set root_logical to point to tm->old_root.logical
          which corresponds to the logical address of eb X;
      
      13) On the next iteration of the while loop, the call to
          tree_mod_log_search_oldest() returns the smallest tree mod log element
          for the logical address of eb X, which has a sequence number of 2, an
          operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to
          the old slot N - 1 of eb X (eb X had N items in it before being freed);
      
      14) We then break out of the while loop and return the tree mod log operation
          of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of
          eb X, to get_old_root();
      
      15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation
          and set "logical" to the logical address of eb X, which was the old
          root. We then call tree_mod_log_search() passing it the logical
          address of eb X and time_seq == 1;
      
      16) Then before calling tree_mod_log_search(), task T adds a key to eb X,
          which results in adding a tree mod log operation of type
          MOD_LOG_KEY_ADD to the tree mod log - this is done at
          ctree.c:insert_ptr() - but after adding the tree mod log operation
          and before updating the number of items in eb X from 0 to 1...
      
      17) The task at get_old_root() calls tree_mod_log_search() and gets the
          tree mod log operation of type MOD_LOG_KEY_ADD just added by task T.
          Then it enters the following if branch:
      
          if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
             (...)
          } (...)
      
          Calls read_tree_block() for eb X, which gets a reference on eb X but
          does not lock it - task T has it locked.
          Then it clones eb X while it has nritems set to 0 in its header, before
          task T sets nritems to 1 in eb X's header. From hereupon we use the
          clone of eb X which no other task has access to;
      
      18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD
          mod log operation we just got from tree_mod_log_search() in the
          previous step and the cloned version of eb X;
      
      19) At __tree_mod_log_rewind(), we set the local variable "n" to the number
          of items set in eb X's clone, which is 0. Then we enter the while loop,
          and in its first iteration we process the MOD_LOG_KEY_ADD operation,
          which just decrements "n" from 0 to (u32)-1, since "n" is declared with
          a type of u32. At the end of this iteration we call rb_next() to find the
          next tree mod log operation for eb X, that gives us the mod log operation
          of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence
          number of N + 1 (steps 3 to 6);
      
      20) Then we go back to the top of the while loop and trigger the following
          BUG_ON():
      
              (...)
              switch (tm->op) {
              case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                       BUG_ON(tm->slot < n);
                       fallthrough;
              (...)
      
          Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0.
      
      Fix this by taking a read lock on the extent buffer before cloning it at
      ctree.c:get_old_root(). This should be done regardless of the extent
      buffer having been freed and reused, as a concurrent task might be
      modifying it (while holding a write lock on it).
      
      Reported-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/
      Fixes: 834328a8
      
       ("Btrfs: tree mod log's old roots could still be part of the tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fbf4100
    • Arnaldo Carvalho de Melo's avatar
      tools build feature: Check if pthread_barrier_t is available · bb125663
      Arnaldo Carvalho de Melo authored
      commit 25ab5abf
      
       upstream.
      
      As 'perf bench futex wake-parallel" will use this, which is not
      available in older systems such as versions of the android NDK used in
      my container build tests (r12b and r15c at the moment).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Yang <james.yang@arm.com
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-1i7iv54in4wj08lwo55b0pzv@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb125663
    • Changbin Du's avatar
      perf: Make perf able to build with latest libbfd · 635a0021
      Changbin Du authored
      commit 0ada120c
      
       upstream.
      
      libbfd has changed the bfd_section_* macros to inline functions
      bfd_section_<field> since 2019-09-18. See below two commits:
        o http://www.sourceware.org/ml/gdb-cvs/2019-09/msg00064.html
        o https://www.sourceware.org/ml/gdb-cvs/2019-09/msg00072.html
      
      This fix make perf able to build with both old and new libbfd.
      
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200128152938.31413-1-changbin.du@gmail.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      635a0021
    • Arnaldo Carvalho de Melo's avatar
      tools build: Check if gettid() is available before providing helper · aa8a3376
      Arnaldo Carvalho de Melo authored
      commit 4541a8bb
      
       upstream.
      
      Laura reported that the perf build failed in fedora when we got a glibc
      that provides gettid(), which I reproduced using fedora rawhide with the
      glibc-devel-2.29.9000-26.fc31.x86_64 package.
      
      Add a feature check to avoid providing a gettid() helper in such
      systems.
      
      On a fedora rawhide system with this patch applied we now get:
      
        [root@7a5f55352234 perf]# grep gettid /tmp/build/perf/FEATURE-DUMP
        feature-gettid=1
        [root@7a5f55352234 perf]# cat /tmp/build/perf/feature/test-gettid.make.output
        [root@7a5f55352234 perf]# ldd /tmp/build/perf/feature/test-gettid.bin
                linux-vdso.so.1 (0x00007ffc6b1f6000)
                libc.so.6 => /lib64/libc.so.6 (0x00007f04e0a74000)
                /lib64/ld-linux-x86-64.so.2 (0x00007f04e0c47000)
        [root@7a5f55352234 perf]# nm /tmp/build/perf/feature/test-gettid.bin | grep -w gettid
                         U gettid@@GLIBC_2.30
        [root@7a5f55352234 perf]#
      
      While on a fedora:29 system:
      
        [acme@quaco perf]$ grep gettid /tmp/build/perf/FEATURE-DUMP
        feature-gettid=0
        [acme@quaco perf]$ cat /tmp/build/perf/feature/test-gettid.make.output
        test-gettid.c: In function ‘main’:
        test-gettid.c:8:9: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
          return gettid();
                 ^~~~~~
                 getgid
        cc1: all warnings being treated as errors
        [acme@quaco perf]$
      
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Tested-by: default avatarLaura Abbott <labbott@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lkml.kernel.org/n/tip-yfy3ch53agmklwu9o7rlgf9c@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa8a3376
    • Arnaldo Carvalho de Melo's avatar
      tools build feature: Check if eventfd() is available · 45c4df37
      Arnaldo Carvalho de Melo authored
      commit 11c6cbe7
      
       upstream.
      
      A new 'perf bench epoll' will use this, and to disable it for older
      systems, add a feature test for this API.
      
      This is just a simple program that if successfully compiled, means that
      the feature is present, at least at the library level, in a build that
      sets the output directory to /tmp/build/perf (using O=/tmp/build/perf),
      we end up with:
      
        $ ls -la /tmp/build/perf/feature/test-eventfd*
        -rwxrwxr-x. 1 acme acme 8176 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.bin
        -rw-rw-r--. 1 acme acme  588 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.d
        -rw-rw-r--. 1 acme acme    0 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.make.output
        $ ldd /tmp/build/perf/feature/test-eventfd.bin
      	  linux-vdso.so.1 (0x00007fff3bf3f000)
      	  libc.so.6 => /lib64/libc.so.6 (0x00007fa984061000)
      	  /lib64/ld-linux-x86-64.so.2 (0x00007fa984417000)
        $ grep eventfd -A 2 -B 2 /tmp/build/perf/FEATURE-DUMP
        feature-dwarf=1
        feature-dwarf_getlocations=1
        feature-eventfd=1
        feature-fortify-source=1
        feature-sync-compare-and-swap=1
        $
      
      The main thing here is that in the end we'll have -DHAVE_EVENTFD in
      CFLAGS, and then the 'perf bench' entry needing that API can be
      selectively pruned.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-wkeldwob7dpx6jvtuzl8164k@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45c4df37
    • Arnaldo Carvalho de Melo's avatar
      tools build feature: Check if get_current_dir_name() is available · e95a2735
      Arnaldo Carvalho de Melo authored
      commit 8feb8efe
      
       upstream.
      
      As the namespace support code will use this, which is not available in
      some non _GNU_SOURCE libraries such as Android's bionic used in my
      container build tests (r12b and r15c at the moment).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-x56ypm940pwclwu45d7jfj47@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e95a2735
    • Jiri Olsa's avatar
      perf tools: Use %define api.pure full instead of %pure-parser · 3c9decd4
      Jiri Olsa authored
      commit fc8c0a99
      
       upstream.
      
      bison deprecated the "%pure-parser" directive in favor of "%define
      api.pure full".
      
      The api.pure got introduced in bison 2.3 (Oct 2007), so it seems safe to
      use it without any version check.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200112192259.GA35080@krava
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c9decd4
    • Rafael J. Wysocki's avatar
      Revert "PM: runtime: Update device status before letting suppliers suspend" · 866f2901
      Rafael J. Wysocki authored
      commit 0cab893f upstream.
      
      Revert commit 44cc89f7 ("PM: runtime: Update device status
      before letting suppliers suspend") that introduced a race condition
      into __rpm_callback() which allowed a concurrent rpm_resume() to
      run and resume the device prematurely after its status had been
      changed to RPM_SUSPENDED by __rpm_callback().
      
      Fixes: 44cc89f7
      
       ("PM: runtime: Update device status before letting suppliers suspend")
      Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/
      Reported-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      866f2901
    • Piotr Krysiuk's avatar
      bpf: Prohibit alu ops for pointer types not defining ptr_limit · c49e70a5
      Piotr Krysiuk authored
      commit f232326f upstream.
      
      The purpose of this patch is to streamline error propagation and in particular
      to propagate retrieve_ptr_limit() errors for pointer types that are not defining
      a ptr_limit such that register-based alu ops against these types can be rejected.
      
      The main rationale is that a gap has been identified by Piotr in the existing
      protection against speculatively out-of-bounds loads, for example, in case of
      ctx pointers, unprivileged programs can still perform pointer arithmetic. This
      can be abused to execute speculatively out-of-bounds loads without restrictions
      and thus extract contents of kernel memory.
      
      Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
      on unprotected pointer types. The two affected ones are pointer to ctx as well
      as pointer to map. Field access to a modified ctx' pointer is rejected at a
      later point in time in the verifier, and 7c696732 ("bpf: Permit map_ptr
      arithmetic with opcode add and offset 0") only relevant for root-only use cases.
      Risk of unprivileged program breakage is considered very low.
      
      Fixes: 7c696732 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0")
      Fixes: b2157399
      
       ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c49e70a5
    • Florian Fainelli's avatar
      net: dsa: b53: Support setting learning on port · 4d8880bd
      Florian Fainelli authored
      commit f9b3827e
      
       upstream.
      
      Add support for being able to set the learning attribute on port, and
      make sure that the standalone ports start up with learning disabled.
      
      We can remove the code in bcm_sf2 that configured the ports learning
      attribute because we want the standalone ports to have learning disabled
      by default and port 7 cannot be bridged, so its learning attribute will
      not change past its initial configuration.
      
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d8880bd
    • Piotr Krysiuk's avatar
      bpf: Add sanity check for upper ptr_limit · b4aa37d9
      Piotr Krysiuk authored
      commit 1b1597e6
      
       upstream.
      
      Given we know the max possible value of ptr_limit at the time of retrieving
      the latter, add basic assertions, so that the verifier can bail out if
      anything looks odd and reject the program. Nothing triggered this so far,
      but it also does not hurt to have these.
      
      Signed-off-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4aa37d9
    • Piotr Krysiuk's avatar
      bpf: Simplify alu_limit masking for pointer arithmetic · 59ce8e5e
      Piotr Krysiuk authored
      commit b5871dca
      
       upstream.
      
      Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
      operation to retrieve_ptr_limit() instead to simplify the logic and to
      allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
      This avoids in future that at the time of the verifier masking rewrite
      we'd run into an underflow which would not sign extend due to the nature
      of mov32 instruction.
      
      Signed-off-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59ce8e5e
    • Piotr Krysiuk's avatar
      bpf: Fix off-by-one for area size in creating mask to left · 92df5a17
      Piotr Krysiuk authored
      commit 10d2bb2e upstream.
      
      retrieve_ptr_limit() computes the ptr_limit for registers with stack and
      map_value type. ptr_limit is the size of the memory area that is still
      valid / in-bounds from the point of the current position and direction
      of the operation (add / sub). This size will later be used for masking
      the operation such that attempting out-of-bounds access in the speculative
      domain is redirected to remain within the bounds of the current map value.
      
      When masking to the right the size is correct, however, when masking to
      the left, the size is off-by-one which would lead to an incorrect mask
      and thus incorrect arithmetic operation in the non-speculative domain.
      Piotr found that if the resulting alu_limit value is zero, then the
      BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
      0xffffffff into AX instead of sign-extending to the full 64 bit range,
      and as a result, this allows abuse for executing speculatively out-of-
      bounds loads against 4GB window of address space and thus extracting the
      contents of kernel memory via side-channel.
      
      Fixes: 979d63d5
      
       ("bpf: prevent out of bounds speculation on pointer arithmetic")
      Signed-off-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92df5a17
    • Jan Kara's avatar
      ext4: check journal inode extents more carefully · 95080b44
      Jan Kara authored
      commit ce9f24cc upstream.
      
      Currently, system zones just track ranges of block, that are "important"
      fs metadata (bitmaps, group descriptors, journal blocks, etc.). This
      however complicates how extent tree (or indirect blocks) can be checked
      for inodes that actually track such metadata - currently the journal
      inode but arguably we should be treating quota files or resize inode
      similarly. We cannot run __ext4_ext_check() on such metadata inodes when
      loading their extents as that would immediately trigger the validity
      checks and so we just hack around that and special-case the journal
      inode. This however leads to a situation that a journal inode which has
      extent tree of depth at least one can have invalid extent tree that gets
      unnoticed until ext4_cache_extents() crashes.
      
      To overcome this limitation, track inode number each system zone belongs
      to (0 is used for zones not belonging to any inode). We can then verify
      inode number matches the expected one when verifying extent tree and
      thus avoid the false errors. With this there's no need to to
      special-case journal inode during extent tree checking anymore so remove
      it.
      
      Fixes: 0a944e8a
      
       ("ext4: don't perform block validity checks on the journal inode")
      Reported-by: default avatarWolfgang Frisch <wolfgang.frisch@suse.com>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-4-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95080b44
    • Jan Kara's avatar
      ext4: don't allow overlapping system zones · 495729f8
      Jan Kara authored
      commit bf9a379d
      
       upstream.
      
      Currently, add_system_zone() just silently merges two added system zones
      that overlap. However the overlap should not happen and it generally
      suggests that some unrelated metadata overlap which indicates the fs is
      corrupted. We should have caught such problems earlier (e.g. in
      ext4_check_descriptors()) but add this check as another line of defense.
      In later patch we also use this for stricter checking of journal inode
      extent tree.
      
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-3-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      495729f8
    • Jan Kara's avatar
      ext4: handle error of ext4_setup_system_zone() on remount · da493317
      Jan Kara authored
      commit d176b1f6
      
       upstream.
      
      ext4_setup_system_zone() can fail. Handle the failure in ext4_remount().
      
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da493317
  2. Mar 17, 2021