Skip to content
  1. Sep 21, 2018
    • Roman Gushchin's avatar
      mm: slowly shrink slabs with a relatively small number of objects · 172b06c3
      Roman Gushchin authored
      9092c71b ("mm: use sc->priority for slab shrink targets") changed the
      way that the target slab pressure is calculated and made it
      priority-based:
      
          delta = freeable >> priority;
          delta *= 4;
          do_div(delta, shrinker->seeks);
      
      The problem is that on a default priority (which is 12) no pressure is
      applied at all, if the number of potentially reclaimable objects is less
      than 4096 (1<<12).
      
      This causes the last objects on slab caches of no longer used cgroups to
      (almost) never get reclaimed.  It's obviously a waste of memory.
      
      It can be especially painful, if these stale objects are holding a
      reference to a dying cgroup.  Slab LRU lists are reparented on memcg
      offlining, but corresponding objects are still holding a reference to the
      dying cgroup.  If we don't scan these objects, the dying cgroup can't go
      away.  Most likely, the parent cgroup hasn't any directly charged objects,
      only remaining objects from dying children cgroups.  So it can easily hold
      a reference to hundreds of dying cgroups.
      
      If there are no big spikes in memory pressure, and new memory cgroups are
      created and destroyed periodically, this causes the number of dying
      cgroups grow steadily, causing a slow-ish and hard-to-detect memory
      "leak".  It's not a real leak, as the memory can be eventually reclaimed,
      but it could not happen in a real life at all.  I've seen hosts with a
      steadily climbing number of dying cgroups, which doesn't show any signs of
      a decline in months, despite the host is loaded with a production
      workload.
      
      It is an obvious waste of memory, and to prevent it, let's apply a minimal
      pressure even on small shrinker lists.  E.g.  if there are freeable
      objects, let's scan at least min(freeable, scan_batch) objects.
      
      This fix significantly improves a chance of a dying cgroup to be
      reclaimed, and together with some previous patches stops the steady growth
      of the dying cgroups number on some of our hosts.
      
      Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
      
      
      Fixes: 9092c71b ("mm: use sc->priority for slab shrink targets")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172b06c3
    • YueHaibing's avatar
    • Joel Fernandes (Google)'s avatar
      mm: shmem.c: Correctly annotate new inodes for lockdep · b45d71fb
      Joel Fernandes (Google) authored
      Directories and inodes don't necessarily need to be in the same lockdep
      class.  For ex, hugetlbfs splits them out too to prevent false positives
      in lockdep.  Annotate correctly after new inode creation.  If its a
      directory inode, it will be put into a different class.
      
      This should fix a lockdep splat reported by syzbot:
      
      > ======================================================
      > WARNING: possible circular locking dependency detected
      > 4.18.0-rc8-next-20180810+ #36 Not tainted
      > ------------------------------------------------------
      > syz-executor900/4483 is trying to acquire lock:
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
      > include/linux/fs.h:765 [inline]
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
      > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >
      > but task is already holding lock:
      > 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
      > drivers/staging/android/ashmem.c:448
      >
      > which lock already depends on the new lock.
      >
      > -> #2 (ashmem_mutex){+.+.}:
      >        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
      >        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
      >        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
      >        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
      >        call_mmap include/linux/fs.h:1844 [inline]
      >        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
      >        do_mmap+0xa10/0x1220 mm/mmap.c:1535
      >        do_mmap_pgoff include/linux/mm.h:2298 [inline]
      >        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
      >        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
      >        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
      >        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
      >        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #1 (&mm->mmap_sem){++++}:
      >        __might_fault+0x155/0x1e0 mm/memory.c:4568
      >        _copy_to_user+0x30/0x110 lib/usercopy.c:25
      >        copy_to_user include/linux/uaccess.h:155 [inline]
      >        filldir+0x1ea/0x3a0 fs/readdir.c:196
      >        dir_emit_dot include/linux/fs.h:3464 [inline]
      >        dir_emit_dots include/linux/fs.h:3475 [inline]
      >        dcache_readdir+0x13a/0x620 fs/libfs.c:193
      >        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
      >        __do_sys_getdents fs/readdir.c:231 [inline]
      >        __se_sys_getdents fs/readdir.c:212 [inline]
      >        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #0 (&sb->s_type->i_mutex_key#9){++++}:
      >        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
      >        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
      >        inode_lock include/linux/fs.h:765 [inline]
      >        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
      >        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
      >        vfs_ioctl fs/ioctl.c:46 [inline]
      >        file_ioctl fs/ioctl.c:501 [inline]
      >        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
      >        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
      >        __do_sys_ioctl fs/ioctl.c:709 [inline]
      >        __se_sys_ioctl fs/ioctl.c:707 [inline]
      >        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
      >
      >  Possible unsafe locking scenario:
      >
      >        CPU0                    CPU1
      >        ----                    ----
      >   lock(ashmem_mutex);
      >                                lock(&mm->mmap_sem);
      >                                lock(ashmem_mutex);
      >   lock(&sb->s_type->i_mutex_key#9);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by syz-executor900/4483:
      >  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
      > ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448
      
      Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
      
      
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Suggested-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b45d71fb
    • Dominique Martinet's avatar
      fs/proc/kcore.c: fix invalid memory access in multi-page read optimization · a1b3d2f2
      Dominique Martinet authored
      The 'm' kcore_list item could point to kclist_head, and it is incorrect to
      look at m->addr / m->size in this case.
      
      There is no choice but to run through the list of entries for every
      address if we did not find any entry in the previous iteration
      
      Reset 'm' to NULL in that case at Omar Sandoval's suggestion.
      
      [akpm@linux-foundation.org: add comment]
      Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
      
      
      Fixes: bf991c22 ("proc/kcore: optimize multiple page reads")
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@osandov.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1b3d2f2
    • Pasha Tatashin's avatar
      mm: disable deferred struct page for 32-bit arches · 889c695d
      Pasha Tatashin authored
      Deferred struct page init is needed only on systems with large amount of
      physical memory to improve boot performance.  32-bit systems do not
      benefit from this feature.
      
      Jiri reported a problem where deferred struct pages do not work well with
      x86-32:
      
      [    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
      [    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
      [    0.036269] Initializing CPU#0
      [    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
      [    0.038459] page:f6780000 is uninitialized and poisoned
      [    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
      [    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
      [    0.040038] ------------[ cut here ]------------
      [    0.040399] kernel BUG at include/linux/page-flags.h:293!
      [    0.040823] invalid opcode: 0000 [#1] SMP PTI
      [    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
      [    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [    0.042496] EIP: free_highmem_page+0x64/0x80
      [    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
      [    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
      [    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
      [    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
      [    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
      [    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.046913] DR6: fffe0ff0 DR7: 00000400
      [    0.047220] Call Trace:
      [    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
      [    0.047854]  set_highmem_pages_init+0x5b/0x71
      [    0.048202]  mem_init+0x2b/0x1e8
      [    0.048460]  start_kernel+0x1d2/0x425
      [    0.048757]  i386_start_kernel+0x93/0x97
      [    0.049073]  startup_32_smp+0x164/0x168
      [    0.049379] Modules linked in:
      [    0.049626] ---[ end trace 337949378db0abbb ]---
      
      We free highmem pages before their struct pages are initialized:
      
      mem_init()
       set_highmem_pages_init()
        add_highpages_with_active_regions()
         free_highmem_page()
          .. Access uninitialized struct page here..
      
      Because there is no reason to have this feature on 32-bit systems, just
      disable it.
      
      Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
      
      
      Fixes: 2e3ca40f ("mm: relax deferred struct page requirements")
      Signed-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      889c695d
    • KJ Tsanaktsidis's avatar
      fork: report pid exhaustion correctly · f83606f5
      KJ Tsanaktsidis authored
      Make the clone and fork syscalls return EAGAIN when the limit on the
      number of pids /proc/sys/kernel/pid_max is exceeded.
      
      Currently, when the pid_max limit is exceeded, the kernel will return
      ENOSPC from the fork and clone syscalls.  This is contrary to the
      documented behaviour, which explicitly calls out the pid_max case as one
      where EAGAIN should be returned.  It also leads to really confusing error
      messages in userspace programs which will complain about a lack of disk
      space when they fail to create processes/threads for this reason.
      
      This error is being returned because alloc_pid() uses the idr api to find
      a new pid; when there are none available, idr_alloc_cyclic() returns
      -ENOSPC, and this is being propagated back to userspace.
      
      This behaviour has been broken before, and was explicitly fixed in
      commit 35f71bc0 ("fork: report pid reservation failure properly"),
      so I think -EAGAIN is definitely the right thing to return in this case.
      The current behaviour change dates from commit 95846ecf ("pid:
      replace pid bitmap implementation with IDR AIP") and was I believe
      unintentional.
      
      This patch has no impact on the case where allocating a pid fails because
      the child reaper for the namespace is dead; that case will still return
      -ENOMEM.
      
      Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
      
      
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR AIP")
      Signed-off-by: default avatarKJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83606f5
  2. Sep 20, 2018
  3. Sep 19, 2018
  4. Sep 18, 2018
  5. Sep 17, 2018
    • zhong jiang's avatar
      net: ethernet: Fix a unused function warning. · c7348091
      zhong jiang authored
      
      
      Fix the following compile warning:
      
      drivers/net/ethernet/microchip/lan743x_main.c:2964:12: warning: ‘lan743x_pm_suspend’ defined but not used [-Wunused-function]
       static int lan743x_pm_suspend(struct device *dev)
      drivers/net/ethernet/microchip/lan743x_main.c:2987:12: warning: ‘lan743x_pm_resume’ defined but not used [-Wunused-function]
       static int lan743x_pm_resume(struct device *dev)
      
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7348091
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Fix ATU Miss Violation · ddca24df
      Andrew Lunn authored
      
      
      Fix a cut/paste error and a typo which results in ATU miss violations
      not being reported.
      
      Fixes: 0977644c ("net: dsa: mv88e6xxx: Decode ATU problem interrupt")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddca24df
    • Daniel Borkmann's avatar
      tls: fix currently broken MSG_PEEK behavior · 50c6b58a
      Daniel Borkmann authored
      
      
      In kTLS MSG_PEEK behavior is currently failing, strace example:
      
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] listen(4, 10)               = 0
        [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] close(4)                    = 0
        [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64
      
      As can be seen from strace, there are two TLS records sent,
      i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
      peeking 'test_read_peektest_read_peektest'. This is clearly
      wrong, and what happens is that given peek cannot call into
      tls_sw_advance_skb() to unpause strparser and proceed with
      the next skb, we end up looping over the current one, copying
      the 'test_read_peek' over and over into the user provided
      buffer.
      
      Here, we can only peek into the currently held skb (current,
      full TLS record) as otherwise we would end up having to hold
      all the original skb(s) (depending on the peek depth) in a
      separate queue when unpausing strparser to process next
      records, minimally intrusive is to return only up to the
      current record's size (which likely was what c46234eb
      ("tls: RX path for ktls") originally intended as well). Thus,
      after patch we properly peek the first record:
      
        [pid  2046] wait4(2075,  <unfinished ...>
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] listen(4, 10)               = 0
        [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] close(4)                    = 0
        [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c6b58a
    • David S. Miller's avatar
      Merge branch 'hv_netvsc-associate-VF-and-PV-device-by-serial-number' · aa079bd0
      David S. Miller authored
      
      
      Stephen Hemminger says:
      
      ====================
      hv_netvsc: associate VF and PV device by serial number
      
      The Hyper-V implementation of PCI controller has concept of 32 bit serial number
      (not to be confused with PCI-E serial number).  This value is sent in the protocol
      from the host to indicate SR-IOV VF device is attached to a synthetic NIC.
      
      Using the serial number (instead of MAC address) to associate the two devices
      avoids lots of potential problems when there are duplicate MAC addresses from
      tunnels or layered devices.
      
      The patch set is broken into two parts, one is for the PCI controller
      and the other is for the netvsc device. Normally, these go through different
      trees but sending them together here for better review. The PCI changes
      were submitted previously, but the main review comment was "why do you
      need this?". This is why.
      
      v2 - slot name can be shorter.
           remove locking when creating pci_slots; see comment for explaination
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa079bd0
    • Stephen Hemminger's avatar
      hv_netvsc: pair VF based on serial number · 00d7ddba
      Stephen Hemminger authored
      
      
      Matching network device based on MAC address is problematic
      since a non VF network device can be creted with a duplicate MAC
      address causing confusion and problems.  The VMBus API does provide
      a serial number that is a better matching method.
      
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00d7ddba
    • Stephen Hemminger's avatar
      PCI: hv: support reporting serial number as slot information · a15f2c08
      Stephen Hemminger authored
      
      
      The Hyper-V host API for PCI provides a unique "serial number" which
      can be used as basis for sysfs PCI slot table. This can be useful
      for cases where userspace wants to find the PCI device based on
      serial number.
      
      When an SR-IOV NIC is added, the host sends an attach message
      with serial number. The kernel doesn't use the serial number, but
      it is useful when doing the same thing in a userspace driver such
      as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
      way to find the matching PCI device.
      
      There maybe some cases where serial number is not unique such
      as when using GPU's. But the PCI slot infrastructure will handle
      that.
      
      This has a side effect which may also be useful. The common udev
      network device naming policy uses the slot information (rather
      than PCI address).
      
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a15f2c08
    • Michael Chan's avatar
      bnxt_en: Fix VF mac address regression. · 28ea334b
      Michael Chan authored
      
      
      The recent commit to always forward the VF MAC address to the PF for
      approval may not work if the PF driver or the firmware is older.  This
      will cause the VF driver to fail during probe:
      
        bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
        bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
        bnxt_en 0000:00:03.0: Unable to initialize mac address.
        bnxt_en: probe of 0000:00:03.0 failed with error -99
      
      We fix it by treating the error as fatal only if the VF MAC address is
      locally generated by the VF.
      
      Fixes: 707e7e96 ("bnxt_en: Always forward VF MAC address to the PF.")
      Reported-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reported-by: default avatarSiwei Liu <loseweigh@gmail.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28ea334b
    • Eric Dumazet's avatar
      ipv6: fix possible use-after-free in ip6_xmit() · bbd6528d
      Eric Dumazet authored
      
      
      In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
      we need to call skb_set_owner_w() before consuming original skb,
      otherwise we risk a use-after-free.
      
      Bring IPv6 in line with what we do in IPv4 to fix this.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd6528d
    • Colin Ian King's avatar
      net: hp100: fix always-true check for link up state · a7f38002
      Colin Ian King authored
      
      
      The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
      that is always non-zero and hence the wait for the link to drop always
      terminates prematurely.  Fix this by using a logical not operator instead
      of a bitwise complement.  This issue has been in the driver since
      pre-2.6.12-rc2.
      
      Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7f38002
    • Nicolas Ferre's avatar
      ARM: dts: at91: add new compatibility string for macb on sama5d3 · 321cc359
      Nicolas Ferre authored
      
      
      We need this new compatibility string as we experienced different behavior
      for this 10/100Mbits/s macb interface on this particular SoC.
      Backward compatibility is preserved as we keep the alternative strings.
      
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      321cc359