Skip to content
  1. Dec 10, 2012
    • Neal Cardwell's avatar
      inet_diag: validate port comparison byte code to prevent unsafe reads · 5e1f5420
      Neal Cardwell authored
      
      
      Add logic to verify that a port comparison byte code operation
      actually has the second inet_diag_bc_op from which we read the port
      for such operations.
      
      Previously the code blindly referenced op[1] without first checking
      whether a second inet_diag_bc_op struct could fit there. So a
      malicious user could make the kernel read 4 bytes beyond the end of
      the bytecode array by claiming to have a whole port comparison byte
      code (2 inet_diag_bc_op structs) when in fact the bytecode was not
      long enough to hold both.
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1f5420
    • Neal Cardwell's avatar
      inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() · f67caec9
      Neal Cardwell authored
      
      
      Add logic to check the address family of the user-supplied conditional
      and the address family of the connection entry. We now do not do
      prefix matching of addresses from different address families (AF_INET
      vs AF_INET6), except for the previously existing support for having an
      IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
      maintains as-is).
      
      This change is needed for two reasons:
      
      (1) The addresses are different lengths, so comparing a 128-bit IPv6
      prefix match condition to a 32-bit IPv4 connection address can cause
      us to unwittingly walk off the end of the IPv4 address and read
      garbage or oops.
      
      (2) The IPv4 and IPv6 address spaces are semantically distinct, so a
      simple bit-wise comparison of the prefixes is not meaningful, and
      would lead to bogus results (except for the IPv4-mapped IPv6 case,
      which this commit maintains).
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f67caec9
    • Neal Cardwell's avatar
      inet_diag: validate byte code to prevent oops in inet_diag_bc_run() · 405c0059
      Neal Cardwell authored
      
      
      Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
      operations.
      
      Previously we did not validate the inet_diag_hostcond, address family,
      address length, and prefix length. So a malicious user could make the
      kernel read beyond the end of the bytecode array by claiming to have a
      whole inet_diag_hostcond when the bytecode was not long enough to
      contain a whole inet_diag_hostcond of the given address family. Or
      they could make the kernel read up to about 27 bytes beyond the end of
      a connection address by passing a prefix length that exceeded the
      length of addresses of the given family.
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      405c0059
    • Neal Cardwell's avatar
      inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state · 1c95df85
      Neal Cardwell authored
      
      
      Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
      instantiated for IPv4 traffic and in the SYN-RECV state were actually
      created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
      means that for such connections inet6_rsk(req) returns a pointer to a
      random spot in memory up to roughly 64KB beyond the end of the
      request_sock.
      
      With this bug, for a server using AF_INET6 TCP sockets and serving
      IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
      inet_diag_fill_req() causing an oops or the export to user space of 16
      bytes of kernel memory as a garbage IPv6 address, depending on where
      the garbage inet6_rsk(req) pointed.
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c95df85
  2. Dec 09, 2012
    • Johannes Weiner's avatar
      mm: vmscan: fix inappropriate zone congestion clearing · ed23ec4f
      Johannes Weiner authored
      commit c702418f
      
       ("mm: vmscan: do not keep kswapd looping forever due
      to individual uncompactable zones") removed zone watermark checks from
      the compaction code in kswapd but left in the zone congestion clearing,
      which now happens unconditionally on higher order reclaim.
      
      This messes up the reclaim throttling logic for zones with
      dirty/writeback pages, where zones should only lose their congestion
      status when their watermarks have been restored.
      
      Remove the clearing from the zone compaction section entirely.  The
      preliminary zone check and the reclaim loop in kswapd will clear it if
      the zone is considered balanced.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed23ec4f
    • Linus Torvalds's avatar
      vfs: fix O_DIRECT read past end of block device · 684c9aae
      Linus Torvalds authored
      The direct-IO write path already had the i_size checks in mm/filemap.c,
      but it turns out the read path did not, and removing the block size
      checks in fs/block_dev.c (commit bbec0270
      
      : "blkdev_max_block: make
      private to fs/buffer.c") removed the magic "shrink IO to past the end of
      the device" code there.
      
      Fix it by truncating the IO to the size of the block device, like the
      write path already does.
      
      NOTE! I suspect the write path would be *much* better off doing it this
      way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
      mm/filemap.c code is extremely hard to follow, and has various
      conditionals on the target being a block device (ie the flag passed in
      to 'generic_write_checks()', along with a conditional update of the
      inode timestamp etc).
      
      It is also quite possible that we should treat this whole block device
      size as a "s_maxbytes" issue, and try to make the logic even more
      generic.  However, in the meantime this is the fairly minimal targeted
      fix.
      
      Noted by Milan Broz thanks to a regression test for the cryptsetup
      reencrypt tool.
      
      Reported-and-tested-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      684c9aae
  3. Dec 08, 2012
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1b3c393c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Two stragglers:
      
         1) The new code that adds new flushing semantics to GRO can cause SKB
            pointer list corruption, manage the lists differently to avoid the
            OOPS.  Fix from Eric Dumazet.
      
         2) When TCP fast open does a retransmit of data in a SYN-ACK or
            similar, we update retransmit state that we shouldn't triggering a
            WARN_ON later.  Fix from Yuchung Cheng."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: gro: fix possible panic in skb_gro_receive()
        tcp: bug fix Fast Open client retransmission
      1b3c393c
    • Eric Dumazet's avatar
      net: gro: fix possible panic in skb_gro_receive() · c3c7c254
      Eric Dumazet authored
      commit 2e71a6f8
      
       (net: gro: selective flush of packets) added
      a bug for skbs using frag_list. This part of the GRO stack is rarely
      used, as it needs skb not using a page fragment for their skb->head.
      
      Most drivers do use a page fragment, but some of them use GFP_KERNEL
      allocations for the initial fill of their RX ring buffer.
      
      napi_gro_flush() overwrite skb->prev that was used for these skb to
      point to the last skb in frag_list.
      
      Fix this using a separate field in struct napi_gro_cb to point to the
      last fragment.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3c7c254
    • Yuchung Cheng's avatar
      tcp: bug fix Fast Open client retransmission · 93b174ad
      Yuchung Cheng authored
      
      
      If SYN-ACK partially acks SYN-data, the client retransmits the
      remaining data by tcp_retransmit_skb(). This increments lost recovery
      state variables like tp->retrans_out in Open state. If loss recovery
      happens before the retransmission is acked, it triggers the WARN_ON
      check in tcp_fastretrans_alert(). For example: the client sends
      SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
      another 4 data packets and get 3 dupacks.
      
      Since the retransmission is not caused by network drop it should not
      update the recovery state variables. Further the server may return a
      smaller MSS than the cached MSS used for SYN-data, so the retranmission
      needs a loop. Otherwise some data will not be retransmitted until timeout
      or other loss recovery events.
      
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93b174ad
    • Linus Torvalds's avatar
      Merge tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 1afa4717
      Linus Torvalds authored
      Pull MMC fixes from Chris Ball:
       "Two small regression fixes:
      
         - sdhci-s3c: Fix runtime PM regression against 3.7-rc1
         - sh-mmcif: Fix oops against 3.6"
      
      * tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
        mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
        Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
        mmc: sdhci-s3c: fix missing clock for gpio card-detect
      1afa4717
  4. Dec 07, 2012
  5. Dec 06, 2012
  6. Dec 05, 2012
    • David Howells's avatar
      ASN.1: Fix an indefinite length skip error · f3537f91
      David Howells authored
      
      
      Fix an error in asn1_find_indefinite_length() whereby small definite length
      elements of size 0x7f are incorrecly classified as non-small.  Without this
      fix, an error will be given as the length of the length will be perceived as
      being very much greater than the maximum supported size.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      f3537f91
    • David Howells's avatar
      MODSIGN: Don't use enum-type bitfields in module signature info block · 12e130b0
      David Howells authored
      
      
      Don't use enum-type bitfields in the module signature info block as we can't be
      certain how the compiler will handle them.  As I understand it, it is arch
      dependent, and it is possible for the compiler to rearrange them based on
      endianness and to insert a byte of padding to pad the three enums out to four
      bytes.
      
      Instead use u8 fields for these, which the compiler should emit in the right
      order without padding.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      12e130b0
    • Thomas Gleixner's avatar
      watchdog: Fix CPU hotplug regression · 8d451690
      Thomas Gleixner authored
      Norbert reported:
      "3.7-rc6 booted with nmi_watchdog=0 fails to suspend to RAM or
       offline CPUs. It's reproducable with a KVM guest and physical
       system."
      
      The reason is that commit bcd951cf
      
      (watchdog: Use hotplug thread
      infrastructure) missed to take this into account. So the cpu offline
      code gets stuck in the teardown function because it accesses non
      initialized data structures.
      
      Add a check for watchdog_enabled into that path to cure the issue.
      
      Reported-and-tested-by: default avatarNorbert Warmuth <nwarmuth@t-online.de>
      Tested-by: default avatarJoseph Salisbury <joseph.salisbury@canonical.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1211231033230.2701@ionos
      Link: http://bugs.launchpad.net/bugs/1079534
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      8d451690
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · df2fc246
      Linus Torvalds authored
      Pull module fixes from Rusty Russell:
       "Module signing build fixes for blackfin and metag"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        modsign: add symbol prefix to certificate list
        linux/kernel.h: define SYMBOL_PREFIX
      df2fc246
    • Linus Torvalds's avatar
      Merge tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi · 70dcc535
      Linus Torvalds authored
      Pull UBI changes from Artem Bityutskiy:
       "Fixes for 2 brown-paperbag bugs introduced this merge window by the
        fastmap code:
      
         1.  The UBI background thread got stuck when a bit-flip happened
             because free LEBs was not removed from the "free" tree when we
             started using it.
         2.  I/O debugging checks did not work because we called a sleeping
             function in atomic context."
      
      * tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi:
        UBI: dont call ubi_self_check_all_ff() in __wl_get_peb()
        UBI: remove PEB from free tree in get_peb_for_wl()
      70dcc535
    • Linus Torvalds's avatar
      Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · ca50496e
      Linus Torvalds authored
      Pull workqueue fixes from Tejun Heo:
       "So, safe fixes my ass.
      
        Commit 8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue
        timer on 0 delay") had the side-effect of performing delayed_work
        sanity checks even when @delay is 0, which should be fine for any sane
        use cases.
      
        Unfortunately, megaraid was being overly ingenious.  It seemingly
        wanted to use cancel_delayed_work_sync() before cancel_work_sync() was
        introduced, but didn't want to waste the space for full delayed_work
        as it was only going to use 0 @delay.  So, it only allocated space for
        struct work_struct and then cast it to struct delayed_work and passed
        it into delayed_work functions - truly awesome engineering tradeoff to
        save some bytes.
      
        Xiaotian fixed it by making megraid allocate full delayed_work for
        now.  It should be converted to use work_struct and cancel_work_sync()
        but I think we better do that after 3.7.
      
        I added another commit to change BUG_ON()s in __queue_delayed_work()
        to WARN_ON_ONCE()s so that the kernel doesn't crash even if there are
        more such abuses."
      
      * 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s
        megaraid: fix BUG_ON() from incorrect use of delayed work
      ca50496e
    • Ralf Baechle's avatar
      MIPS: N32: Fix preadv(2) and pwritev(2) entry points. · d5563715
      Ralf Baechle authored
      By using the native syscall entry point the kernel was also expecting
      64-bit iovec structures.
      
      This is broken since ddd9e91b
      
       [preadv/
      pwritev: MIPS: Add preadv(2) and pwritev(2) syscalls.] which originally
      added these two syscalls.  I walked through piles of code, including
      libc and couldn't find anything that would have worked around the issue
      so this change the API to what it should always have been.
      
      Noticed and patch suggested by Al Viro.
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      d5563715
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 609e3ff3
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Two small fixes for Sparc, nobody uses sparc, so these are low risk :-)
      
         1) Piggyback is too picky about the symbol types that _start and _end
            have in the final kernel image, and it thus breaks with newer
            binutils.  Future proof by getting rid of the symbol type checks.
      
         2) exit_group() should kill register windows on sparc64 the same way
            we do for plain exit().  Thanks to Al Viro for spotting this."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Fix piggyback with newer binutils.
        sparc64: exit_group should kill register windows just like plain exit.
      609e3ff3
    • Linus Torvalds's avatar
      vfs: avoid "attempt to access beyond end of device" warnings · 57302e0d
      Linus Torvalds authored
      The block device access simplification that avoided accessing the (racy)
      block size information (commit bbec0270
      
      : "blkdev_max_block: make
      private to fs/buffer.c") no longer checks the maximum block size in the
      block mapping path.
      
      That was _almost_ as simple as just removing the code entirely, because
      the readers and writers all check the size of the device anyway, so
      under normal circumstances it "just worked".
      
      However, the block size may be such that the end of the device may
      straddle one single buffer_head.  At which point we may still want to
      access the end of the device, but the buffer we use to access it
      partially extends past the end.
      
      The 'bd_set_size()' function intentionally sets the block size to avoid
      this, but mounting the device - or setting the block size by hand to
      some other value - can modify that block size.
      
      So instead, teach 'submit_bh()' about the special case of the buffer
      head straddling the end of the device, and turning such an access into a
      smaller IO access, avoiding the problem.
      
      This, btw, also means that unlike before, we can now access the whole
      device regardless of device block size setting.  So now, even if the
      device size is only 512-byte aligned, we can read and write even the
      last sector even when having a much bigger block size for accessing the
      rest of the device.
      
      So with this, we could now get rid of the 'bd_set_size()' block size
      code entirely - resulting in faster IO for the common case - but that
      would be a separate patch.
      
      Reported-and-tested-by: default avatarRomain Francoise <romain@orebokech.com>
      Reporeted-and-tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57302e0d
  7. Dec 04, 2012
    • Tejun Heo's avatar
      workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s · fc4b514f
      Tejun Heo authored
      8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue timer on
      0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
      megaraid - it allocated work_struct, casted it to delayed_work and
      then pass that into queue_delayed_work().
      
      Previously, this was okay because 0 @delay short-circuited to
      queue_work() before doing anything with delayed_work.  8852aac2
      moved 0 @delay test into __queue_delayed_work() after sanity check on
      delayed_work making megaraid trigger BUG_ON().
      
      Although megaraid is already fixed by c1d390d8
      
       ("megaraid: fix
      BUG_ON() from incorrect use of delayed work"), this patch converts
      BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
      abusers, if there are more, trigger warning but don't crash the
      machine.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      fc4b514f
    • David Daney's avatar
      MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags() · ac53c4fc
      David Daney authored
      
      
      Problem:
      
      1) Huge page mapping of anonymous memory is initially invalid.  Will be
         faulted in by copy-on-write mechanism.
      
      2) Userspace attempts store at the end of the huge mapping.
      
      3) TLB Refill exception handler fill TLB with a normal (4K sized)
         invalid page at the end of the huge mapping virtual address range.
      
      4) Userspace restarted, and re-attempts the store at the end of the
         huge mapping.
      
      5) Page from #3 is invalid, we get a fault and go to the hugepage
         fault handler.  This tries to map a huge page and calls
         huge_ptep_set_access_flags() to install the mapping.
      
      6) We just call the generic ptep_set_access_flags() to set up the page
         tables, but the flush there assumes a normal (4K sized) page and
         only tries to flush the first part of the huge page virtual address
         out of the TLB, since the existing entry from step #3 doesn't
         conflict, nothing is flushed.
      
      7) We attempt to load the mapping into the TLB, but because it
         conflicts with the entry from step #3, we get a Machine Check
         exception.
      
      The fix: Flush the entire rage covered by the huge page in
      huge_ptep_set_access_flags(), and remove the optimization in
      local_flush_tlb_range() so that the flush actually does the correct
      thing.
      
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Hillf Danton <dhillf@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/4661/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      (cherry picked from commit dd617f258cc39d36be26afee9912624a2d23112c)
      ac53c4fc
    • Xiaotian Feng's avatar
      megaraid: fix BUG_ON() from incorrect use of delayed work · c1d390d8
      Xiaotian Feng authored
      megaraid use INIT_WORK to declare a hotplug_work, but cast the
      hotplug_work from work_struct to delayed_work and
      schedule_delayed_work on it.  This is very dangerous, as other part of
      delayed_work might be kernel memories allocated by others.
      
      With commit 8852aac2
      
       ("workqueue: mod_delayed_work_on() shouldn't queue
      timer on 0 delay"), schedule_delayed_work() will check dwork->timer
      before queue_work even when @delay is 0, this causes megaraid code to
      hit the BUG_ON() in workqueue code.  Change megaraid code to use
      delayed work.
      
      Signed-off-by: default avatarXiaotian Feng <dannyfeng@tencent.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: linux-scsi@vger.kernel.org
      c1d390d8
    • Richard Weinberger's avatar
      UBI: dont call ubi_self_check_all_ff() in __wl_get_peb() · 894aef21
      Richard Weinberger authored
      
      
      As ubi_self_check_all_ff() might sleep we are not allowed
      to call it from atomic context.
      For now we call it only from ubi_wl_get_peb().
      There are some code paths where it would also make sense,
      but these paths are currently atomic and only enabled
      when fastmap is used.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      894aef21
    • Richard Weinberger's avatar
      UBI: remove PEB from free tree in get_peb_for_wl() · ed4b7021
      Richard Weinberger authored
      
      
      If UBI is built without fastmap, get_peb_for_wl() has to
      remove the PEB manially from the free tree.
      Otherwise the requested PEB lives in two trees.
      
      Reported-by: default avatarZach Sadecki <zsadecki@itwatchdogs.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      ed4b7021
    • David S. Miller's avatar
      sparc: Fix piggyback with newer binutils. · 0032c857
      David S. Miller authored
      
      
      Newer versions of binutils mark '_end' as 'B' instead of 'A' for
      whatever reason.
      
      To be honest, the piggyback code doesn't actually care what kind
      of symbol _start and _end are, it just wants to find them and
      record the address.
      
      So remove the type from the match strings.
      
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0032c857