Skip to content
  1. Feb 17, 2021
  2. Feb 10, 2021
  3. Dec 01, 2020
    • Nathan Chancellor's avatar
      kbuild: Hoist '--orphan-handling' into Kconfig · 59612b24
      Nathan Chancellor authored
      Currently, '--orphan-handling=warn' is spread out across four different
      architectures in their respective Makefiles, which makes it a little
      unruly to deal with in case it needs to be disabled for a specific
      linker version (in this case, ld.lld 10.0.1).
      
      To make it easier to control this, hoist this warning into Kconfig and
      the main Makefile so that disabling it is simpler, as the warning will
      only be enabled in a couple places (main Makefile and a couple of
      compressed boot folders that blow away LDFLAGS_vmlinx) and making it
      conditional is easier due to Kconfig syntax. One small additional
      benefit of this is saving a call to ld-option on incremental builds
      because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.
      
      To keep the list of supported architectures the same, introduce
      CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
      gain this automatically after all of the sections are specified and size
      asserted. A special thanks to Kees Cook for the help text on this
      config.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1187
      
      
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      59612b24
  4. Sep 03, 2020
  5. Jul 24, 2020
  6. Jul 07, 2020
  7. Apr 22, 2020
  8. Apr 08, 2020
  9. Mar 05, 2020
  10. Aug 28, 2019
    • Linus Torvalds's avatar
      x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning · 42e0e954
      Linus Torvalds authored
      One of the very few warnings I have in the current build comes from
      arch/x86/boot/edd.c, where I get the following with a gcc9 build:
      
         arch/x86/boot/edd.c: In function ‘query_edd’:
         arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
           148 |  mbrptr = boot_params.edd_mbr_sig_buffer;
               |           ^~~~~~~~~~~
      
      This warning triggers because we throw away all the CFLAGS and then make
      a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
      added in the following commit is not present:
      
        6f303d60
      
       ("gcc-9: silence 'address-of-packed-member' warning")
      
      The simplest solution for now is to adjust the warning for this version
      of CFLAGS as well, but it would definitely make sense to examine whether
      REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
      in the compiler flags environment automatically.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      42e0e954
  11. Apr 05, 2019
  12. Mar 28, 2019
    • Daniel Borkmann's avatar
      x86/retpolines: Disable switch jump tables when retpolines are enabled · a9d57ef1
      Daniel Borkmann authored
      Commit ce02ef06 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
      
      
      
      Suggested-by: default avatarMartin Liska <mliska@suse.cz>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net
      a9d57ef1
  13. Feb 28, 2019
    • Daniel Borkmann's avatar
      x86, retpolines: Raise limit for generating indirect calls from switch-case · ce02ef06
      Daniel Borkmann authored
      From networking side, there are numerous attempts to get rid of indirect
      calls in fast-path wherever feasible in order to avoid the cost of
      retpolines, for example, just to name a few:
      
        * 283c16a2 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
        * aaa5d90b ("net: use indirect call wrappers at GRO network layer")
        * 028e0a47 ("net: use indirect call wrappers at GRO transport layer")
        * 356da6d0 ("dma-mapping: bypass indirect calls for dma-direct")
        * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
        * 10870dd8 ("netfilter: nf_tables: add direct calls for all builtin expressions")
        [...]
      
      Recent work on XDP from Björn and Magnus additionally found that manually
      transforming the XDP return code switch statement with more than 5 cases
      into if-else combination would result in a considerable speedup in XDP
      layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
      builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
      observed [0]. Aside from XDP, there are many other places later in the
      networking stack's critical path with similar switch-case
      processing. Rather than fixing every XDP-enabled driver and locations in
      stack by hand, it would be good to instead raise the limit where gcc would
      emit expensive indirect calls from the switch under retpolines and stick
      with the default as-is in case of !retpoline configured kernels. This would
      also have the advantage that for archs where this is not necessary, we let
      compiler select the underlying target optimization for these constructs and
      avoid potential slow-downs by if-else hand-rewrite.
      
      In case of gcc, this setting is controlled by case-values-threshold which
      has an architecture global default that selects 4 or 5 (latter if target
      does not have a case insn that compares the bounds) where some arch back
      ends like arm64 or s390 override it with their own target hooks, for
      example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
      branches") the threshold pretty much disables jump tables by limit of 20
      under retpoline builds.  Comparing gcc's and clang's default code
      generation on x86-64 under O2 level with retpoline build results in the
      following outcome for 5 switch cases:
      
      * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400be0 <+0>:     cmp    $0x4,%edi
         0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
         0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
         0x0000000000400bec <+12>:    mov    %edi,%edi
         0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
         0x0000000000400bf2 <+18>:    add    %rdx,%rax
         0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
         0x0000000000400bfa <+26>:    pause
         0x0000000000400bfc <+28>:    lfence
         0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
         0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
         0x0000000000400c05 <+37>:    retq
         0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
         0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
         0x0000000000400c15 <+53>:    nopl   (%rax)
         0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
         0x0000000000400c1d <+61>:    nopl   (%rax)
         0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
         0x0000000000400c25 <+69>:    nopl   (%rax)
         0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
         0x0000000000400c2d <+77>:    nopl   (%rax)
         0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
         0x0000000000400c35 <+85>:    push   %rax
         0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
        End of assembler dump.
      
      * clang with -mretpoline emitting search tree:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:     cmp    $0x1,%edi
         0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
         0x0000000000400b35 <+5>:     cmp    $0x2,%edi
         0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
         0x0000000000400b3a <+10>:    cmp    $0x3,%edi
         0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
         0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
         0x0000000000400b44 <+20>:    test   %edi,%edi
         0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
         0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
         0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
         0x0000000000400b52 <+34>:    cmp    $0x4,%edi
         0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
         0x0000000000400b5c <+44>:    cmp    $0x1,%edi
         0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
         0x0000000000400b66 <+54>:    push   %rax
         0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
        End of assembler dump.
      
        For sake of comparison, clang without -mretpoline:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:	cmp    $0x4,%edi
         0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
         0x0000000000400b35 <+5>:	mov    %edi,%eax
         0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
         0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
         0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
         0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
         0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
         0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
         0x0000000000400b57 <+39>:	push   %rax
         0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
        End of assembler dump.
      
      Raising the cases to a high number (e.g. 100) will still result in similar
      code generation pattern with clang and gcc as above, in other words clang
      generally turns off jump table emission by having an extra expansion pass
      under retpoline build to turn indirectbr instructions from their IR into
      switch instructions as a built-in -mno-jump-table lowering of a switch (in
      this case, even if IR input already contained an indirect branch).
      
      For gcc, adding --param=case-values-threshold=20 as in similar fashion as
      s390 in order to raise the limit for x86 retpoline enabled builds results
      in a small vmlinux size increase of only 0.13% (before=18,027,528
      after=18,051,192). For clang this option is ignored due to i) not being
      needed as mentioned and ii) not having above cmdline
      parameter. Non-retpoline-enabled builds with gcc continue to use the
      default case-values-threshold setting, so nothing changes here.
      
      [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
          and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
      
      
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net
      ce02ef06
  14. Jan 22, 2019
  15. Jan 06, 2019
    • Masahiro Yamada's avatar
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada authored
      
      
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  16. Dec 19, 2018
  17. Dec 09, 2018
  18. Dec 05, 2018
  19. Nov 28, 2018
  20. Nov 05, 2018
  21. Oct 04, 2018
    • Nadav Amit's avatar
      kbuild/Makefile: Prepare for using macros in inline assembly code to work... · 77b0bf55
      Nadav Amit authored
      
      kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs
      
      Using macros in inline assembly allows us to work around bugs
      in GCC's inlining decisions.
      
      Compile macros.S and use it to assemble all C files.
      Currently only x86 will use it.
      
      Background:
      
      The inlining pass of GCC doesn't include an assembler, so it's not aware
      of basic properties of the generated code, such as its size in bytes,
      or that there are such things as discontiuous blocks of code and data
      due to the newfangled linker feature called 'sections' ...
      
      Instead GCC uses a lazy and fragile heuristic: it does a linear count of
      certain syntactic and whitespace elements in inlined assembly block source
      code, such as a count of new-lines and semicolons (!), as a poor substitute
      for "code size and complexity".
      
      Unsurprisingly this heuristic falls over and breaks its neck whith certain
      common types of kernel code that use inline assembly, such as the frequent
      practice of putting useful information into alternative sections.
      
      As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
      are effectively disabled for inlined functions that make use of such asm()
      blocks, because GCC thinks those sections of code are "large" - when in
      reality they are often result in just a very low number of machine
      instructions.
      
      This absolute lack of inlining provess when GCC comes across such asm()
      blocks both increases generated kernel code size and causes performance
      overhead, which is particularly noticeable on paravirt kernels, which make
      frequent use of these inlining facilities in attempt to stay out of the
      way when running on baremetal hardware.
      
      Instead of fixing the compiler we use a workaround: we set an assembly macro
      and call it from the inlined assembly block. As a result GCC considers the
      inline assembly block as a single instruction. (Which it often isn't but I digress.)
      
      This uglifies and bloats the source code - for example just the refcount
      related changes have this impact:
      
       Makefile                 |    9 +++++++--
       arch/x86/Makefile        |    7 +++++++
       arch/x86/kernel/macros.S |    7 +++++++
       scripts/Kbuild.include   |    4 +++-
       scripts/mod/Makefile     |    2 ++
       5 files changed, 26 insertions(+), 3 deletions(-)
      
      Yay readability and maintainability, it's not like assembly code is hard to read
      and maintain ...
      
      We also hope that GCC will eventually get fixed, but we are not holding
      our breath for that. Yet we are optimistic, it might still happen, any decade now.
      
      [ mingo: Wrote new changelog describing the background. ]
      
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kbuild@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      77b0bf55
  22. Oct 01, 2018
  23. Aug 31, 2018
  24. Aug 30, 2018
  25. Aug 24, 2018
  26. Jul 16, 2018
  27. Jun 21, 2018
  28. Jun 01, 2018
  29. Mar 31, 2018
  30. Mar 20, 2018
  31. Feb 21, 2018
  32. Feb 20, 2018