Skip to content
  1. Aug 10, 2017
  2. Aug 09, 2017
    • Naftali Goldstein's avatar
      iwlwifi: mvm: send delba upon rx ba session timeout · 20fc690f
      Naftali Goldstein authored
      
      
      When an RX block-ack session times out, the firmware, which offloads
      RX reordering but not the BA session negotiation, stops the session
      but doesn't send a DELBA.  This causes the the session to remain
      active in the remote device, so no more BA sessions will be
      established, causing a severe throughput degradation due to the lack
      of aggregation.
      
      Use the new ieee80211_rx_ba_timer_expired API when the ba session timer
      expires, since this will tear down the ba session and also send a delba.
      
      The previous API used is intended for drivers that offload the
      addba/delba negotiation, but not the rx reordering, while our driver
      does the opposite.
      
      This patch depends on "mac80211: add api to start ba session timer
      expired flow".
      
      Signed-off-by: default avatarNaftali Goldstein <naftali.goldstein@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      20fc690f
    • Naftali Goldstein's avatar
      mac80211: add api to start ba session timer expired flow · 04c2cf34
      Naftali Goldstein authored
      
      
      Some drivers handle rx buffer reordering internally (and by extension
      handle also the rx ba session timer internally), but do not ofload the
      addba/delba negotiation.
      Add an api for these drivers to properly tear-down the ba session,
      including sending a delba.
      
      Signed-off-by: default avatarNaftali Goldstein <naftali.goldstein@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      04c2cf34
    • Emmanuel Grumbach's avatar
      iwlwifi: mvm: don't WARN when a legit race happens in A-MPDU · a600852a
      Emmanuel Grumbach authored
      When we start an Rx A-MPDU session, we first get the AddBA
      request, then we send an ADD_STA command to the firmware
      that will reply with a BAID which is a hardware resource
      that tracks the BA session.
      This BAID will appear on each and every frame that we get
      from the firwmare until the A-MPDU session is torn down.
      In the Rx path, we look at this BAID to manage the
      reordering buffer.
      
      This flow is inherently racy since the hardware will start
      to put the BAID in the frames it receives even if the
      firmware hasn't sent the response to the ADD_STA command.
      This basically means that the driver can get frames with
      a valid BAID that it doesn't know yet.
      When that happens, the driver used to WARN.
      Fix this by simply not WARN in this case. When the driver
      will know abou the BAID, it will initialise the relevant
      states and the next frame with a valid BAID will refresh
      them.
      
      Fixes: b915c101
      
       ("iwlwifi: mvm: add reorder buffer per queue")
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      a600852a
    • Avraham Stern's avatar
      iwlwifi: mvm: start mac queues when deferred tx frames are purged · 7e39a00d
      Avraham Stern authored
      In AP mode, if a station is removed just as it is adding a new stream,
      the queue in question will remain stopped and no more TX will happen
      in this queue, leading to connection failures and other problems.
      
      This is because under DQA, when tx is deferred because a queue needs
      to be allocated, the mac queue for that TID is stopped until the new
      stream is added.  If at this point the station that this stream
      belongs to is removed, all the deferred tx frames are purged, but the
      mac queue is not restarted. As a result, all following tx on this
      queue will not be transmitted.
      
      Fix this by starting the relevant mac queues when the deferred tx
      frames are purged.
      
      Fixes: 24afba76
      
       ("iwlwifi: mvm: support bss dynamic alloc/dealloc of queues")
      Signed-off-by: default avatarAvraham Stern <avraham.stern@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      7e39a00d
  3. Aug 06, 2017
  4. Aug 03, 2017
    • Kalle Valo's avatar
      Merge tag 'iwlwifi-for-kalle-2017-08-02' of... · 368bd88e
      Kalle Valo authored
      Merge tag 'iwlwifi-for-kalle-2017-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes
      
      Some fixes in iwlwifi for 4.13
      
      * Some simple PCI HW ID fix-ups and additions for family 9000;
      * A couple of bugzilla fixes:
        - Remove a bogus warning message with new FWs (196915)
        - Don't allow illegal channel options to be used (195299)
      * A fix for checksum offload in family 9000;
      * A fix serious throughput degradation in 11ac with multiple streams;
      * An old bug in SMPS where the firmware was not aware of SMPS changes;
      368bd88e
  5. Aug 01, 2017
  6. Jul 27, 2017
  7. Jul 21, 2017
    • Kalle Valo's avatar
      Merge tag 'iwlwifi-for-kalle-2017-07-21' of... · d755cbc2
      Kalle Valo authored
      Merge tag 'iwlwifi-for-kalle-2017-07-21' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes
      
      Some iwlwifi fixes for 4.13:
      
      * A few NULL pointer dereferences in the recovery flow;
      * A small but important fix for IBSS;
      * A one-liner fix for tracing, which was including too much data;
      * Some of these are bugzilla bug fixes;
      d755cbc2
    • Johannes Berg's avatar
      iwlwifi: mvm: defer setting IWL_MVM_STATUS_IN_HW_RESTART · bf8b286f
      Johannes Berg authored
      A hardware/firmware error may happen at any point in time. In
      particular, it might happen while mac80211 is in the middle of
      a flow. We observed the following situation:
       * mac80211 is in authentication flow, in ieee80211_prep_connection()
       * iwlwifi firmware crashes, but no error can be reported at this
         precise point (mostly because the driver method is void, but even
         if it wasn't we'd just shift to a race condition)
       * mac80211 continues the flow, trying to add the AP station
       * iwlwifi has already set its internal restart flag, and so thinks
         that adding the station is part of the restart and already set up,
         so it uses the information that's supposed to already be in the
         struct
      
      This can happen with any flow in mac80211 and with any information
      we try to preserve across hardware restarts.
      
      To fix this, only set a new HW_RESTART_REQUESTED flag and translate
      that to IN_HW_RESTART once mac80211 actually starts the restart by
      calling our start() method. As a consequence, any mac80211 flow in
      progress at the time of the restart will properly finish (certainly
      with errors), before the restart is attempted.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195299
      
      .
      
      Reported-by: default avatardjagoo <dev@djagoo.io>
      Reported-by: default avatarŁukasz Siudut <lsiudut@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      bf8b286f
    • Luca Coelho's avatar
      iwlwifi: mvm: handle IBSS probe_queue in a few missing places · 7b758a11
      Luca Coelho authored
      When IBSS was implemented for DQA, we missid a few places where it
      should be handled in the same way as AP.
      
      Fixes: ee48b722
      
       ("iwlwifi: mvm: support ibss in dqa mode")
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      7b758a11
    • Emmanuel Grumbach's avatar
      iwlwifi: fix tracing when tx only is enabled · 5462bcd8
      Emmanuel Grumbach authored
      iwl_trace_data is somewhat confusing. It returns a bool
      that tells if the payload of the skb should be added to
      the tx_data event. If it returns false, then the payload
      of the skb is added to the tx event.
      
      The purpose is to be able to start tracing with
      -e iwlwifi
      and record non-data packets only which saves bandwidth.
      
      Since EAPOLs are important, seldom and not real data
      packet (despite being WiFi data packets), they are
      included in tx event and thus iwl_trace_data returns false
      on those. This last part was buggy, and because of that,
      all the data packets were included in the tx event.
      
      Fix that.
      
      Fixes: 0c4cb731
      
       ("iwlwifi: tracing: decouple from mac80211")
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      5462bcd8
    • Dan Carpenter's avatar
      iwlwifi: missing error code in iwl_trans_pcie_alloc() · 2388bd7b
      Dan Carpenter authored
      We don't set the error code here so we end up returning ERR_PTR(0) which
      is NULL.  The caller doesn't expect that so it results in a NULL
      dereference.
      
      Fixes: 2e5d4a8f
      
       ("iwlwifi: pcie: Add new configuration to enable MSIX")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      2388bd7b
    • Emmanuel Grumbach's avatar
      iwlwifi: mvm: fix a NULL pointer dereference of error in recovery · 61dd8a8a
      Emmanuel Grumbach authored
      Sometimes, we can have an firmware crash while trying to
      recover from a previous firmware problem.
      When that happens, lots of things can go wrong. For example
      the stations don't get added properly to mvm->fw_id_to_mac_id.
      
      Mac80211 tries to stop A-MPDU upon reconfig but in case of
      a firmware crash we will bail out fairly early and in the
      end, we won't delete the A-MPDU Rx timeout.
      When that timer expired after a double firmware crash,
      we end up dereferencing mvm->fw_id_to_mac_id[sta_id]
      which is NULL.
      
      Fixes: 10b2b201
      
       ("iwlwifi: mvm: add infrastructure for tracking BA session in driver")
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      61dd8a8a
    • Mordechai Goodstein's avatar
      iwlwifi: pcie: fix unused txq NULL pointer dereference · f6eac740
      Mordechai Goodstein authored
      
      
      Before TVQM, all TX queues were allocated straight at init.
      With TVQM, queues are allocated on demand and hence we need
      to check if a queue exists before dereferencing it.
      
      Fixes: 66128fa08806 ("iwlwifi: move to TVQM mode")
      Signed-off-by: default avatarMordechai Goodstein <mordechay.goodstein@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      f6eac740
    • Emmanuel Grumbach's avatar
      iwlwifi: dvm: prevent an out of bounds access · 0b0f934e
      Emmanuel Grumbach authored
      
      
      iwlagn_check_ratid_empty takes the tid as a parameter, but
      it doesn't check that it is not IWL_TID_NON_QOS.
      Since IWL_TID_NON_QOS = 8 and iwl_priv::tid_data is an array
      with 8 entries, accessing iwl_priv::tid_data[IWL_TID_NON_QOS]
      is a bad idea.
      This happened in iwlagn_rx_reply_tx. Since
      iwlagn_check_ratid_empty is relevant only to check whether
      we can open A-MPDU, this flow is irrelevant if tid is
      IWL_TID_NON_QOS. Call iwlagn_check_ratid_empty only inside
      the
      	if (tid != IWL_TID_NON_QOS)
      
      a few lines earlier in the function.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSeraphime Kirkovski <kirkseraph@gmail.com>
      Tested-by: default avatarSeraphime Kirkovski <kirkseraph@gmail.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      0b0f934e
    • Larry Finger's avatar
      Revert "rtlwifi: btcoex: rtl8723be: fix ant_sel not work" · 271612d7
      Larry Finger authored
      This reverts commit f95d95a7.
      
      With commit f95d95a7 ("rtlwifi: btcoex: rtl8723be: fix ant_sel not
      work"), the kernel has a NULL pointer dereference oops. This content and
      the proper fix will be included in a later patch.
      
      Fixes: f95d95a7
      
       ("rtlwifi: btcoex: rtl8723be: fix ant_sel not work")
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Ping-Ke Shih <pkshih@realtek.com>
      Cc: Yan-Hsuan Chuang <yhchuang@realtek.com>
      Cc: Birming Chiu <birming@realtek.com>
      Cc: Shaofu <shaofu@realtek.com>
      Cc: Steven Ting <steventing@realtek.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      271612d7
    • Arend Van Spriel's avatar
      brcmfmac: fix regression in brcmf_sdio_txpkt_hdalign() · 0a166282
      Arend Van Spriel authored
      Recent change in brcmf_sdio_txpkt_hdalign() changed the
      behavior and now always returns 0. This resulted in a
      regression which basically renders the device useless.
      
      Fixes: 270a6c1f
      
       ("brcmfmac: rework headroom check in .start_xmit()")
      Reported-by: default avatarS. Gilles <sgilles@math.umd.edu>
      Tested-by: default avatarS. Gilles <sgilles@math.umd.edu>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      0a166282
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 96080f69
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) BPF verifier signed/unsigned value tracking fix, from Daniel
          Borkmann, Edward Cree, and Josef Bacik.
      
       2) Fix memory allocation length when setting up calls to
          ->ndo_set_mac_address, from Cong Wang.
      
       3) Add a new cxgb4 device ID, from Ganesh Goudar.
      
       4) Fix FIB refcount handling, we have to set it's initial value before
          the configure callback (which can bump it). From David Ahern.
      
       5) Fix double-free in qcom/emac driver, from Timur Tabi.
      
       6) A bunch of gcc-7 string format overflow warning fixes from Arnd
          Bergmann.
      
       7) Fix link level headroom tests in ip_do_fragment(), from Vasily
          Averin.
      
       8) Fix chunk walking in SCTP when iterating over error and parameter
          headers. From Alexander Potapenko.
      
       9) TCP BBR congestion control fixes from Neal Cardwell.
      
      10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.
      
      11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
          Wang.
      
      12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
          from Gao Feng.
      
      13) Cannot release skb->dst in UDP if IP options processing needs it.
          From Paolo Abeni.
      
      14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
          Levin and myself.
      
      15) Revert some rtnetlink notification changes that are causing
          regressions, from David Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
        net: bonding: Fix transmit load balancing in balance-alb mode
        rds: Make sure updates to cp_send_gen can be observed
        net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
        ipv4: initialize fib_trie prior to register_netdev_notifier call.
        rtnetlink: allocate more memory for dev_set_mac_address()
        net: dsa: b53: Add missing ARL entries for BCM53125
        bpf: more tests for mixed signed and unsigned bounds checks
        bpf: add test for mixed signed and unsigned bounds checks
        bpf: fix up test cases with mixed signed/unsigned bounds
        bpf: allow to specify log level and reduce it for test_verifier
        bpf: fix mixed signed/unsigned derived min/max value bounds
        ipv6: avoid overflow of offset in ip6_find_1stfragopt
        net: tehuti: don't process data if it has not been copied from userspace
        Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
        net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
        dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
        NET: dwmac: Make dwmac reset unconditional
        net: Zero terminate ifr_name in dev_ifname().
        wireless: wext: terminate ifr name coming from userspace
        netfilter: fix netfilter_net_init() return
        ...
      96080f69
    • Kosuke Tatsukawa's avatar
      net: bonding: Fix transmit load balancing in balance-alb mode · cbf5ecb3
      Kosuke Tatsukawa authored
      balance-alb mode used to have transmit dynamic load balancing feature
      enabled by default.  However, transmit dynamic load balancing no longer
      works in balance-alb after commit 8b426dc5 ("bonding: remove
      hardcoded value").
      
      Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
      send packets.  This function uses the parameter tlb_dynamic_lb.
      tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
      now the value is set to 0 except in balance-tlb.
      
      Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
      for balance-alb similar to balance-tlb.
      
      Fixes: 8b426dc5
      
       ("bonding: remove hardcoded value")
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Acked-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbf5ecb3
    • Håkon Bugge's avatar
      rds: Make sure updates to cp_send_gen can be observed · e623a48e
      Håkon Bugge authored
      
      
      cp->cp_send_gen is treated as a normal variable, although it may be
      used by different threads.
      
      This is fixed by using {READ,WRITE}_ONCE when it is incremented and
      READ_ONCE when it is read outside the {acquire,release}_in_xmit
      protection.
      
      Normative reference from the Linux-Kernel Memory Model:
      
          Loads from and stores to shared (but non-atomic) variables should
          be protected with the READ_ONCE(), WRITE_ONCE(), and
          ACCESS_ONCE().
      
      Clause 5.1.2.4/25 in the C standard is also relevant.
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: default avatarKnut Omang <knut.omang@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e623a48e
    • Keerthy's avatar
      net: ethernet: ti: cpsw: Push the request_irq function to the end of probe · 070f9c65
      Keerthy authored
      
      
      Push the request_irq function to the end of probe so as
      to ensure all the required fields are populated in the event
      of an ISR getting executed right after requesting the irq.
      
      Currently while loading the crash kernel a crash was seen as
      soon as devm_request_threaded_irq was called. This was due to
      n->poll being NULL which is called as part of net_rx_action
      function.
      
      Suggested-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarKeerthy <j-keerthy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      070f9c65
    • Mahesh Bandewar's avatar
      ipv4: initialize fib_trie prior to register_netdev_notifier call. · 8799a221
      Mahesh Bandewar authored
      Net stack initialization currently initializes fib-trie after the
      first call to netdevice_notifier() call. In fact fib_trie initialization
      needs to happen before first rtnl_register(). It does not cause any problem
      since there are no devices UP at this moment, but trying to bring 'lo'
      UP at initialization would make this assumption wrong and exposes the issue.
      
      Fixes following crash
      
       Call Trace:
        ? alternate_node_alloc+0x76/0xa0
        fib_table_insert+0x1b7/0x4b0
        fib_magic.isra.17+0xea/0x120
        fib_add_ifaddr+0x7b/0x190
        fib_netdev_event+0xc0/0x130
        register_netdevice_notifier+0x1c1/0x1d0
        ip_fib_init+0x72/0x85
        ip_rt_init+0x187/0x1e9
        ip_init+0xe/0x1a
        inet_init+0x171/0x26c
        ? ipv4_offload_init+0x66/0x66
        do_one_initcall+0x43/0x160
        kernel_init_freeable+0x191/0x219
        ? rest_init+0x80/0x80
        kernel_init+0xe/0x150
        ret_from_fork+0x22/0x30
       Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
       RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
       CR2: 0000000000000014
      
      Fixes: 7b1a74fd ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
      Fixes: 7f9b8052
      
       ("[IPV4]: fib hash|trie initialization")
      
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8799a221
    • WANG Cong's avatar
      rtnetlink: allocate more memory for dev_set_mac_address() · 153711f9
      WANG Cong authored
      
      
      virtnet_set_mac_address() interprets mac address as struct
      sockaddr, but upper layer only allocates dev->addr_len
      which is ETH_ALEN + sizeof(sa_family_t) in this case.
      
      We lack a unified definition for mac address, so just fix
      the upper layer, this also allows drivers to interpret it
      to struct sockaddr freely.
      
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      153711f9
    • Florian Fainelli's avatar
      net: dsa: b53: Add missing ARL entries for BCM53125 · be35e8c5
      Florian Fainelli authored
      The BCM53125 entry was missing an arl_entries member which would
      basically prevent the ARL search from terminating properly. This switch
      has 4 ARL entries, so add that.
      
      Fixes: 1da6df85
      
       ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be35e8c5
    • David S. Miller's avatar
      Merge branch 'BPF-map-value-adjust-fix' · 5067f4cf
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      BPF map value adjust fix
      
      First patch in the series is the actual fix and the remaining
      patches are just updates to selftests.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5067f4cf
    • Daniel Borkmann's avatar
      bpf: more tests for mixed signed and unsigned bounds checks · 86412502
      Daniel Borkmann authored
      
      
      Add a couple of more test cases to BPF selftests that are related
      to mixed signed and unsigned checks.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86412502
    • Edward Cree's avatar
      bpf: add test for mixed signed and unsigned bounds checks · b712296a
      Edward Cree authored
      
      
      These failed due to a bug in verifier bounds handling.
      
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b712296a
    • Daniel Borkmann's avatar
      bpf: fix up test cases with mixed signed/unsigned bounds · a1502132
      Daniel Borkmann authored
      
      
      Fix the few existing test cases that used mixed signed/unsigned
      bounds and switch them only to one flavor. Reason why we need this
      is that proper boundaries cannot be derived from mixed tests.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1502132
    • Daniel Borkmann's avatar
      bpf: allow to specify log level and reduce it for test_verifier · d6554904
      Daniel Borkmann authored
      
      
      For the test_verifier case, it's quite hard to parse log level 2 to
      figure out what's causing an issue when used to log level 1. We do
      want to use bpf_verify_program() in order to simulate some of the
      tests with strict alignment. So just add an argument to pass the level
      and put it to 1 for test_verifier.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6554904
    • Daniel Borkmann's avatar
      bpf: fix mixed signed/unsigned derived min/max value bounds · 4cabc5b1
      Daniel Borkmann authored
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135
      
       ("bpf: allow access into map value arrays")
      Reported-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cabc5b1