Skip to content
  1. Nov 15, 2011
    • Matti Vaittinen's avatar
      IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag · 4a287eba
      Matti Vaittinen authored
      
      
      The support for NLM_F_* flags at IPv6 routing requests.
      
      If NLM_F_CREATE flag is not defined for RTM_NEWROUTE request,
      warning is printed, but no error is returned. Instead new route is
      added. Later NLM_F_CREATE may be required for
      new route creation.
      
      Exception is when NLM_F_REPLACE flag is given without NLM_F_CREATE, and
      no matching route is found. In this case it should be safe to assume
      that the request issuer is familiar with NLM_F_* flags, and does really
      not want route to be created.
      
      Specifying NLM_F_REPLACE flag will now make the kernel to search for
      matching route, and replace it with new one. If no route is found and
      NLM_F_CREATE is specified as well, then new route is created.
      
      Also, specifying NLM_F_EXCL will yield returning of error if matching
      route is found.
      
      Patch created against linux-3.2-rc1
      
      Signed-off-by: default avatarMatti Vaittinen <Mazziesaccount@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a287eba
    • Matti Vaittinen's avatar
      IPv6 routing, NLM_F_* flag support: warn if new route is created without NLM_F_CREATE · d71314b4
      Matti Vaittinen authored
      
      
      The support for NLM_F_* flags at IPv6 routing requests.
      
      Warn if NLM_F_CREATE flag is not defined for RTM_NEWROUTE request,
      creating new table. Later NLM_F_CREATE may be required for
      new route creation.
      
      Patch created against linux-3.2-rc1
      
      Signed-off-by: default avatarMatti Vaittinen <Mazziesaccount@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d71314b4
    • Wolfgang Grandegger's avatar
      net/can/mscan: Fix buggy listen only mode setting · abbd00b8
      Wolfgang Grandegger authored
      This patch fixes an issue introduced recently with commit
      452448f9
      
      .
      
      CC: Marc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarWolfgang Grandegger <wg@grandegger.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abbd00b8
    • Rick Jones's avatar
      Sweep the last of the active .get_drvinfo floors under ethernet/ · 612a94d6
      Rick Jones authored
      
      
      This round of floor sweeping converts strncpy calls in various .get_drvinfo
      routines to the preferred strlcpy.  It also does a modicum of other
      cleaning in those routines.
      
      Signed-off-by: default avatarRick Jones <rick.jones2@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      612a94d6
    • Eric Dumazet's avatar
      bnx2x: uses build_skb() in receive path · e52fcb24
      Eric Dumazet authored
      
      
      bnx2x uses following formula to compute its rx_buf_sz :
      
      dev->mtu + 2*L1_CACHE_BYTES + 14 + 8 + 8 + 2
      
      Then core network adds NET_SKB_PAD and SKB_DATA_ALIGN(sizeof(struct
      skb_shared_info))
      
      Final allocated size for skb head on x86_64 (L1_CACHE_BYTES = 64,
      MTU=1500) : 2112 bytes : SLUB/SLAB round this to 4096 bytes.
      
      Since skb truesize is then bigger than SK_MEM_QUANTUM, we have lot of
      false sharing because of mem_reclaim in UDP stack.
      
      One possible way to half truesize is to reduce the need by 64 bytes
      (2112 -> 2048 bytes)
      
      Instead of allocating a full cache line at the end of packet for
      alignment, we can use the fact that skb_shared_info sits at the end of
      skb->head, and we can use this room, if we convert bnx2x to new
      build_skb() infrastructure.
      
      skb_shared_info will be initialized after hardware finished its
      transfert, so we can eventually overwrite the final padding.
      
      Using build_skb() also reduces cache line misses in the driver, since we
      use cache hot skb instead of cold ones. Number of in-flight sk_buff
      structures is lower, they are recycled while still hot.
      
      Performance results :
      
      (820.000 pps on a rx UDP monothread benchmark, instead of 720.000 pps)
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Eilon Greenstein <eilong@broadcom.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Jamal Hadi Salim <hadi@mojatatu.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Thomas Graf <tgraf@infradead.org>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Acked-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e52fcb24
    • Eric Dumazet's avatar
      net: introduce build_skb() · b2b5ce9d
      Eric Dumazet authored
      
      
      One of the thing we discussed during netdev 2011 conference was the idea
      to change some network drivers to allocate/populate their skb at RX
      completion time, right before feeding the skb to network stack.
      
      In old days, we allocated skbs when populating the RX ring.
      
      This means bringing into cpu cache sk_buff and skb_shared_info cache
      lines (since we clear/initialize them), then 'queue' skb->data to NIC.
      
      By the time NIC fills a frame in skb->data buffer and host can process
      it, cpu probably threw away the cache lines from its caches, because lot
      of things happened between the allocation and final use.
      
      So the deal would be to allocate only the data buffer for the NIC to
      populate its RX ring buffer. And use build_skb() at RX completion to
      attach a data buffer (now filled with an ethernet frame) to a new skb,
      initialize the skb_shared_info portion, and give the hot skb to network
      stack.
      
      build_skb() is the function to allocate an skb, caller providing the
      data buffer that should be attached to it. Drivers are expected to call
      skb_reserve() right after build_skb() to adjust skb->data to the
      Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
      drivers might add a hardware provided alignment)
      
      Data provided to build_skb() MUST have been allocated by a prior
      kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
      skb_shared_info)) bytes at the end of the data without corrupting
      incoming frame.
      
      data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 +
                     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
      	       GFP_ATOMIC);
      ...
      skb = build_skb(data);
      if (!skb) {
      	recycle_data(data);
      } else {
      	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
      	...
      }
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Eilon Greenstein <eilong@broadcom.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Jamal Hadi Salim <hadi@mojatatu.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Thomas Graf <tgraf@infradead.org>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2b5ce9d
  2. Nov 14, 2011
  3. Nov 13, 2011
  4. Nov 10, 2011
    • Eric Dumazet's avatar
      ipv4: PKTINFO doesnt need dst reference · d826eb14
      Eric Dumazet authored
      
      
      Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :
      
      > At least, in recent kernels we dont change dst->refcnt in forwarding
      > patch (usinf NOREF skb->dst)
      >
      > One particular point is the atomic_inc(dst->refcnt) we have to perform
      > when queuing an UDP packet if socket asked PKTINFO stuff (for example a
      > typical DNS server has to setup this option)
      >
      > I have one patch somewhere that stores the information in skb->cb[] and
      > avoid the atomic_{inc|dec}(dst->refcnt).
      >
      
      OK I found it, I did some extra tests and believe its ready.
      
      [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
      
      When a socket uses IP_PKTINFO notifications, we currently force a dst
      reference for each received skb. Reader has to access dst to get needed
      information (rt_iif & rt_spec_dst) and must release dst reference.
      
      We also forced a dst reference if skb was put in socket backlog, even
      without IP_PKTINFO handling. This happens under stress/load.
      
      We can instead store the needed information in skb->cb[], so that only
      softirq handler really access dst, improving cache hit ratios.
      
      This removes two atomic operations per packet, and false sharing as
      well.
      
      On a benchmark using a mono threaded receiver (doing only recvmsg()
      calls), I can reach 720.000 pps instead of 570.000 pps.
      
      IP_PKTINFO is typically used by DNS servers, and any multihomed aware
      UDP application.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d826eb14
    • Eric Dumazet's avatar
      ipv4: reduce percpu needs for icmpmsg mibs · acb32ba3
      Eric Dumazet authored
      
      
      Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
      (can be ~88000 us).
      
      This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
      for all possible cpus can read 16 Mbytes of memory.
      
      ICMP messages are not considered as fast path on a typical server, and
      eventually few cpus handle them anyway. We can afford an atomic
      operation instead of using percpu data.
      
      This saves 4096 bytes per cpu and per network namespace.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acb32ba3
  5. Nov 09, 2011