Skip to content
  1. May 24, 2011
    • Josef Bacik's avatar
      Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f
      Josef Bacik authored
      
      
      If we have a very large filesystem, we can spend a lot of time in
      find_free_extent just trying to allocate from empty block groups.  So instead
      check to see if the block group even has enough space for the allocation, and if
      not go on to the next block group.
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      cca1c81f
    • Josef Bacik's avatar
      Btrfs: don't always do readahead · 026fd317
      Josef Bacik authored
      
      
      Our readahead is sort of sloppy, and really isn't always needed.  For example if
      ls is doing a stating ls (which is the default) it's going to stat in non-disk
      order, so if say you have a directory with a stupid amount of files, readahead
      is going to do nothing but waste time in the case of doing the stat.  Taking the
      unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
      means that everywhere we do loop through the tree we want to make sure we do set
      path->reada properly, so I went through and found all of the places where we
      loop through the path and set reada to 1.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      026fd317
    • Josef Bacik's avatar
      Btrfs: try not to sleep as much when doing slow caching · 589d8ade
      Josef Bacik authored
      
      
      When the fs is super full and we unmount the fs, we could get stuck in this
      thing where unmount is waiting for the caching kthread to make progress and the
      caching kthread keeps scheduling because we're in the middle of a commit.  So
      instead just let the caching kthread keep going and only yeild if
      need_resched().  This makes my horrible umount case go from taking up to 10
      minutes to taking less than 20 seconds.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      589d8ade
    • Josef Bacik's avatar
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik authored
      
      
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      d82a6f1d
    • Josef Bacik's avatar
      Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba
      Josef Bacik authored
      
      
      We have a bit of debugging in btrfs_search_slot to make sure the level of the
      cow block is the same as the original block we were cow'ing.  I don't think I've
      ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
      search.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7e2355ba
    • Josef Bacik's avatar
      Btrfs: map the node block when looking for readahead targets · cb25c2ea
      Josef Bacik authored
      
      
      If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
      times, which is 32 pairs of kmap/kunmap, which _sucks_.  So go ahead and map the
      extent buffer while we look for readahead targets.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      cb25c2ea
    • Josef Bacik's avatar
      Btrfs: set range_start to the right start in count_range_bits · af60bed2
      Josef Bacik authored
      
      
      In count_range_bits we are adjusting total_bytes based on the range we are
      searching for, but we don't adjust the range start according to the range we are
      searching for, which makes for weird results.  For example, if the range
      
      [0-8192]
      
      is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
      of bytes found, but the range_start will be 0, which makes it look like the
      range is [0-4096].  So instead set range_start = max(cur_start, state->start).
      This makes everything come out right.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      af60bed2
    • Josef Bacik's avatar
      Btrfs: fix how we do space reservation for truncate · fcb80c2a
      Josef Bacik authored
      
      
      The ceph guys keep running into problems where we have space reserved in our
      orphan block rsv when freeing it up.  This is because they tend to do snapshots
      alot, so their truncates tend to use a bunch of space, so when we go to do
      things like update the inode we have to steal reservation space in order to make
      the reservation happen.  This happens because truncate can use as much space as
      it freaking feels like, but we still have to hold space for removing the orphan
      item and updating the inode, which will definitely always happen.  So in order
      to fix this we need to split all of the reservation stuf up.  So with this patch
      we have
      
      1) The orphan block reserve which only holds the space for deleting our orphan
      item when everything is over.
      
      2) The truncate block reserve which gets allocated and used specifically for the
      space that the truncate will use on a per truncate basis.
      
      3) The transaction will always have 1 item's worth of data reserved so we can
      update the inode normally.
      
      Hopefully this will make the ceph problem go away.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      fcb80c2a
    • Josef Bacik's avatar
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik authored
      
      
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      a4abeea4
    • Josef Bacik's avatar
      Btrfs: if we've already started a trans handle, use that one · 2a1eb461
      Josef Bacik authored
      
      
      We currently track trans handles in current->journal_info, but we don't actually
      use it.  This patch fixes it.  This will cover the case where we have multiple
      people starting transactions down the call chain.  This keeps us from having to
      allocate a new handle and all of that, we just increase the use count of the
      current handle, save the old block_rsv, and return.  I tested this with xfstests
      and it worked out fine.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      2a1eb461
    • Josef Bacik's avatar
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik authored
      
      
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7a7eaa40
    • Josef Bacik's avatar
      Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075
      Josef Bacik authored
      
      
      In the prealloc filling code and compressed code we don't set trans->block_rsv
      to the delalloc block reserve properly, which is going to make us use metadata
      from the wrong pool, this patch fixes that.  Thanks,
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      74b21075
  2. May 19, 2011
    • Linus Torvalds's avatar
      Linux 2.6.39 · 61c4f2c8
      Linus Torvalds authored
      v2.6.39
      61c4f2c8
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · 3f80fbff
      Linus Torvalds authored
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
        configfs: Fix race between configfs_readdir() and configfs_d_iput()
        configfs: Don't try to d_delete() negative dentries.
        ocfs2/dlm: Target node death during resource migration leads to thread spin
        ocfs2: Skip mount recovery for hard-ro mounts
        ocfs2/cluster: Heartbeat mismatch message improved
        ocfs2/cluster: Increase the live threshold for global heartbeat
        ocfs2/dlm: Use negotiated o2dlm protocol version
        ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes.
        ocfs2: Initialize data_ac (might be used uninitialized)
      3f80fbff
    • Linus Torvalds's avatar
      Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 · fce51958
      Linus Torvalds authored
      * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
        drivercore: revert addition of of_match to struct device
        of: fix race when matching drivers
      fce51958
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus · 7103dbed
      Linus Torvalds authored
      * 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
        MIPS: Kludge IP27 build for 2.6.39.
        MIPS: AR7: Fix GPIO register size for Titan variant.
        MIPS: Fix duplicate invocation of notify_die.
        MIPS: RB532: Fix iomap resource size miscalculation.
      7103dbed
    • Grant Likely's avatar
      drivercore: revert addition of of_match to struct device · b1608d69
      Grant Likely authored
      Commit b826291c
      
      , "drivercore/dt: add a match table pointer to struct
      device" added an of_match pointer to struct device to cache the
      of_match_table entry discovered at driver match time.  This was unsafe
      because matching is not an atomic operation with probing a driver.  If
      two or more drivers are attempted to be matched to a driver at the
      same time, then the cached matching entry pointer could get
      overwritten.
      
      This patch reverts the of_match cache pointer and reworks all users to
      call of_match_device() directly instead.
      
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      b1608d69
    • Milton Miller's avatar
      of: fix race when matching drivers · 01294d82
      Milton Miller authored
      
      
      If two drivers are probing devices at the same time, both will write
      their match table result to the dev->of_match cache at the same time.
      
      Only write the result if the device matches.
      
      In a thread titled "SBus devices sometimes detected, sometimes not",
      Meelis reported his SBus hme was not detected about 50% of the time.
      From the debug suggested by Grant it was obvious another driver matched
      some devices between the call to match the hme and the hme discovery
      failling.
      
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      [grant.likely: modified to only call of_match_device() once]
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      01294d82
  3. May 18, 2011
  4. May 17, 2011
    • Linus Torvalds's avatar
      Merge branch 'timers-fixes-for-linus' of... · a085963a
      Linus Torvalds authored
      Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        tick: Clear broadcast active bit when switching to oneshot
        rtc: mc13xxx: Don't call rtc_device_register while holding lock
        rtc: rp5c01: Initialize drvdata before registering device
        rtc: pcap: Initialize drvdata before registering device
        rtc: msm6242: Initialize drvdata before registering device
        rtc: max8998: Initialize drvdata before registering device
        rtc: max8925: Initialize drvdata before registering device
        rtc: m41t80: Initialize clientdata before registering device
        rtc: ds1286: Initialize drvdata before registering device
        rtc: ep93xx: Initialize drvdata before registering device
        rtc: davinci: Initialize drvdata before registering device
        rtc: mxc: Initialize drvdata before registering device
        clocksource: Install completely before selecting
      a085963a
    • Borislav Petkov's avatar
      x86, AMD: Fix ARAT feature setting again · 14fb57dc
      Borislav Petkov authored
      
      
      Trying to enable the local APIC timer on early K8 revisions
      uncovers a number of other issues with it, in conjunction with
      the C1E enter path on AMD. Fixing those causes much more churn
      and troubles than the benefit of using that timer brings so
      don't enable it on K8 at all, falling back to the original
      functionality the kernel had wrt to that.
      
      Reported-and-bisected-by: default avatarNick Bowler <nbowler@elliptictech.com>
      Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Nick Bowler <nbowler@elliptictech.com>
      Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
      Signed-off-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14fb57dc
    • Borislav Petkov's avatar
      Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors" · 328935e6
      Borislav Petkov authored
      This reverts commit e20a2d20
      
      , as it crashes
      certain boxes with specific AMD CPU models.
      
      Moving the lower endpoint of the Erratum 400 check to accomodate
      earlier K8 revisions (A-E) opens a can of worms which is simply
      not worth to fix properly by tweaking the errata checking
      framework:
      
      * missing IntPenging MSR on revisions < CG cause #GP:
      
      http://marc.info/?l=linux-kernel&m=130541471818831
      
      * makes earlier revisions use the LAPIC timer instead of the C1E
      idle routine which switches to HPET, thus not waking up in
      deeper C-states:
      
      http://lkml.org/lkml/2011/4/24/20
      
      Therefore, leave the original boundary starting with K8-revF.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      328935e6