Commits · f758eeabeb96f878c860e8f110f94ec8820822a9 · 方亚芬 / linux

Jun 08, 2011

writeback: split inode_wb_list_lock into bdi_writeback.list_lock · f758eeab

Christoph Hellwig authored Apr 21, 2011



Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
as it's currently the most contended lock in the system for metadata
heavy workloads.  It won't help for single-filesystem workloads for
which we'll need the I/O-less balance_dirty_pages, but at least we
can dedicate a cpu to spinning on each bdi now for larger systems.

Based on earlier patches from Nick Piggin and Dave Chinner.

It reduces lock contentions to 1/4 in this test case:
10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                      ------------------
                      inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             34          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock          12893          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock          10702          [<ffffffff8115afef>] writeback_single_inode+0x16d/0x20a
                      ------------------
                      inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             19          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock           5550          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock           8511          [<ffffffff8115b4ad>] writeback_sb_inodes+0x10f/0x157

2.6.39-rc3 + patch:
                &(&wb->list_lock)->rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                ------------------------
                &(&wb->list_lock)->rlock             10          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                &(&wb->list_lock)->rlock           1493          [<ffffffff8115b1ed>] writeback_inodes_wb+0x3d/0x150
                &(&wb->list_lock)->rlock           3652          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f
                &(&wb->list_lock)->rlock           1412          [<ffffffff8115a38e>] writeback_single_inode+0x17f/0x223
                ------------------------
                &(&wb->list_lock)->rlock              3          [<ffffffff8110b5af>] bdi_lock_two+0x46/0x4b
                &(&wb->list_lock)->rlock              6          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                &(&wb->list_lock)->rlock           2061          [<ffffffff8115af97>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock           2629          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f

hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

f758eeab

writeback: refill b_io iff empty · 424b351f

Wu Fengguang authored Jul 21, 2010



There is no point to carry different refill policies between for_kupdate
and other type of works. Use a consistent "refill b_io iff empty" policy
which can guarantee fairness in an easy to understand way.

A b_io refill will setup a _fixed_ work set with all currently eligible
inodes and start a new round of walk through b_io. The "fixed" work set
means no new inodes will be added to the work set during the walk.
Only when a complete walk over b_io is done, new inodes that are
eligible at the time will be enqueued and the walk be started over.

This procedure provides fairness among the inodes because it guarantees
each inode to be synced once and only once at each round. So all inodes
will be free from starvations.

This change relies on wb_writeback() to keep retrying as long as we made
some progress on cleaning some pages and/or inodes. Without that ability,
the old logic on background works relies on aggressively queuing all
eligible inodes into b_io at every time. But that's not a guarantee.

The below test script completes a slightly faster now:

             2.6.39-rc3	  2.6.39-rc3-dyn-expire+
------------------------------------------------
all elapsed     256.043      252.367
stddev           24.381       12.530

tar elapsed      30.097       28.808
dd  elapsed      13.214       11.782

	#!/bin/zsh

	cp /c/linux-2.6.38.3.tar.bz2 /dev/shm/

	umount /dev/sda7
	mkfs.xfs -f /dev/sda7
	mount /dev/sda7 /fs

	echo 3 > /proc/sys/vm/drop_caches

	tic=$(cat /proc/uptime|cut -d' ' -f2)

	cd /fs
	time tar jxf /dev/shm/linux-2.6.38.3.tar.bz2 &
	time dd if=/dev/zero of=/fs/zero bs=1M count=1000 &

	wait
	sync
	tac=$(cat /proc/uptime|cut -d' ' -f2)
	echo elapsed: $((tac - tic))

It maintains roughly the same small vs. large file writeout shares, and
offers large files better chances to be written in nice 4M chunks.

Analyzes from Dave Chinner in great details:

Let's say we have lots of inodes with 100 dirty pages being created,
and one large writeback going on. We expire 8 new inodes for every
1024 pages we write back.

With the old code, we do:

	b_more_io (large inode) -> b_io (1l)
	8 newly expired inodes -> b_io (1l, 8s)

	writeback  large inode 1024 pages -> b_more_io

	b_more_io (large inode) -> b_io (8s, 1l)
	8 newly expired inodes -> b_io (8s, 1l, 8s)

	writeback  8 small inodes 800 pages
		   1 large inode 224 pages -> b_more_io

	b_more_io (large inode) -> b_io (8s, 1l)
	8 newly expired inodes -> b_io (8s, 1l, 8s)
	.....

Your new code:

	b_more_io (large inode) -> b_io (1l)
	8 newly expired inodes -> b_io (1l, 8s)

	writeback  large inode 1024 pages -> b_more_io
	(b_io == 8s)
	writeback  8 small inodes 800 pages

	b_io empty: (1800 pages written)
		b_more_io (large inode) -> b_io (1l)
		14 newly expired inodes -> b_io (1l, 14s)

	writeback  large inode 1024 pages -> b_more_io
	(b_io == 14s)
	writeback  10 small inodes 1000 pages
		   1 small inode 24 pages -> b_more_io (1l, 1s(24))
	writeback  5 small inodes 500 pages
	b_io empty: (2548 pages written)
		b_more_io (large inode) -> b_io (1l, 1s(24))
		20 newly expired inodes -> b_io (1l, 1s(24), 20s)
	......

Rough progression of pages written at b_io refill:

Old code:

	total	large file	% of writeback
	1024	224		21.9% (fixed)

New code:
	total	large file	% of writeback
	1800	1024		~55%
	2550	1024		~40%
	3050	1024		~33%
	3500	1024		~29%
	3950	1024		~26%
	4250	1024		~24%
	4500	1024		~22.7%
	4700	1024		~21.7%
	4800	1024		~21.3%
	4800	1024		~21.3%
	(pretty much steady state from here)

Ok, so the steady state is reached with a similar percentage of
writeback to the large file as the existing code. Ok, that's good,
but providing some evidence that is doesn't change the shared of
writeback to the large should be in the commit message ;)

The other advantage to this is that we always write 1024 page chunks
to the large file, rather than smaller "whatever remains" chunks.

CC: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

424b351f

writeback: the kupdate expire timestamp should be a moving target · ba9aa839

Wu Fengguang authored Jul 21, 2010



Dynamically compute the dirty expire timestamp at queue_io() time.

writeback_control.older_than_this used to be determined at entrance to
the kupdate writeback work. This _static_ timestamp may go stale if the
kupdate work runs on and on. The flusher may then stuck with some old
busy inodes, never considering newly expired inodes thereafter.

This has two possible problems:

- It is unfair for a large dirty inode to delay (for a long time) the
  writeback of small dirty inodes.

- As time goes by, the large and busy dirty inode may contain only
  _freshly_ dirtied pages. Ignoring newly expired dirty inodes risks
  delaying the expired dirty pages to the end of LRU lists, triggering
  the evil pageout(). Nevertheless this patch merely addresses part
  of the problem.

v2: keep policy changes inside wb_writeback() and keep the
wbc.older_than_this visibility as suggested by Dave.

CC: Dave Chinner <david@fromorbit.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

ba9aa839

writeback: try more writeback as long as something was written · e6fb6da2

Wu Fengguang authored Jul 22, 2010



writeback_inodes_wb()/__writeback_inodes_sb() are not aggressive in that
they only populate possibly a subset of eligible inodes into b_io at
entrance time. When the queued set of inodes are all synced, they just
return, possibly with all queued inode pages written but still
wbc.nr_to_write > 0.

For kupdate and background writeback, there may be more eligible inodes
sitting in b_dirty when the current set of b_io inodes are completed. So
it is necessary to try another round of writeback as long as we made some
progress in this round. When there are no more eligible inodes, no more
inodes will be enqueued in queue_io(), hence nothing could/will be
synced and we may safely bail.

For example, imagine 100 inodes

        i0, i1, i2, ..., i90, i91, i99

At queue_io() time, i90-i99 happen to be expired and moved to s_io for
IO. When finished successfully, if their total size is less than
MAX_WRITEBACK_PAGES, nr_to_write will be > 0. Then wb_writeback() will
quit the background work (w/o this patch) while it's still over
background threshold. This will be a fairly normal/frequent case I guess.

Now that we do tagged sync and update inode->dirtied_when after the sync,
this change won't livelock sync(1).  I actually tried to write 1 page
per 1ms with this command

	write-and-fsync -n10000 -S 1000 -c 4096 /fs/test

and do sync(1) at the same time. The sync completes quickly on ext4,
xfs, btrfs.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

e6fb6da2

writeback: introduce writeback_control.inodes_written · cb9bd115

Wu Fengguang authored Jul 21, 2010



The flusher works on dirty inodes in batches, and may quit prematurely
if the batch of inodes happen to be metadata-only dirtied: in this case
wbc->nr_to_write won't be decreased at all, which stands for "no pages
written" but also mis-interpreted as "no progress".

So introduce writeback_control.inodes_written to count the inodes get
cleaned from VFS POV.  A non-zero value means there are some progress on
writeback, in which case more writeback can be tried.

Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

cb9bd115

writeback: update dirtied_when for synced inode to prevent livelock · 94c3dcbb

Wu Fengguang authored Apr 27, 2011

Explicitly update .dirtied_when on synced inodes, so that they are no
longer considered for writeback in the next round.

It can prevent both of the following livelock schemes:

- while true; do echo data >> f; done
- while true; do touch f;        done (in theory)

The exact livelock condition is, during sync(1):

(1) no new inodes are dirtied
(2) an inode being actively dirtied

On (2), the inode will be tagged and synced with .nr_to_write=LONG_MAX.
When finished, it will be redirty_tail()ed because it's still dirty
and (.nr_to_write > 0). redirty_tail() won't update its ->dirtied_when
on condition (1). The sync work will then revisit it on the next
queue_io() and find it eligible again because its old ->dirtied_when
predates the sync work start time.

We'll do more aggressive "keep writeback as long as we wrote something"
logic in wb_writeback(). The "use LONG_MAX .nr_to_write" trick in commit
b9543dac

 ("writeback: avoid livelocking WB_SYNC_ALL writeback") will
no longer be enough to stop sync livelock.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

94c3dcbb

writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage · 6e6938b6

Wu Fengguang authored Jun 06, 2010

sync(2) is performed in two stages: the WB_SYNC_NONE sync and the
WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and
do livelock prevention for it, too.

Jan's commit f446daae ("mm: implement writeback livelock avoidance
using page tagging") is a partial fix in that it only fixed the
WB_SYNC_ALL phase livelock.

Although ext4 is tested to no longer livelock with commit f446daae

,
it may due to some "redirty_tail() after pages_skipped" effect which
is by no means a guarantee for _all_ the file systems.

Note that writeback_inodes_sb() is called by not only sync(), they are
treated the same because the other callers also need livelock prevention.

Impact:  It changes the order in which pages/inodes are synced to disk.
Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode
until finished with the current inode.

Acked-by: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

6e6938b6

Jun 06, 2011

Linux 3.0-rc2 · 59c5f46f
Linus Torvalds authored Jun 06, 2011

v3.0-rc2

59c5f46f

mm: fix ENOSPC returned by handle_mm_fault() · e0dcd8a0

Hugh Dickins authored Jun 05, 2011



Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e0dcd8a0

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 0d6925d4

Linus Torvalds authored Jun 06, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: usb - turn off de-emphasis in s/pdif for cm6206
  ALSA: asihpi: Use angle brackets for system includes
  ALSA: fm801: add error handling if auto-detect fails
  ALSA: hda - Check pin support EAPD in ad198x_power_eapd_write
  ALSA: hda - Fix HP and Front pins of ad1988/ad1989 in ad198x_power_eapd()
  ALSA: 6fire: Don't leak firmware in error path
  ASoC: Fix wm_hubs input PGA ZC bits
  ASoC: Fix dapm_is_shared_kcontrol so everything isn't shared

0d6925d4

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging · c12f667e

Linus Torvalds authored Jun 06, 2011

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
  hwmon: (max6642): Better chip detection schema
  hwmon: (coretemp) Further relax temperature range checks
  hwmon: (coretemp) Fix TjMax detection for older CPUs
  hwmon: (coretemp) Relax target temperature range check
  hwmon: (max6642) Rename temp_fault sysfs attribute to temp2_fault

c12f667e

Merge branch 'fix/asoc' into for-linus · 3190dad9
Takashi Iwai authored Jun 06, 2011

3190dad9

Jun 05, 2011

Merge branch 'for-linus' of git://android.git.kernel.org/kernel/tegra · 0792644d
Linus Torvalds authored Jun 05, 2011
```
* 'for-linus' of git://android.git.kernel.org/kernel/tegra:
  ARM: Tegra: Harmony: Fix conflicting GPIO numbering
```
0792644d

ARM: Tegra: Harmony: Fix conflicting GPIO numbering · f2a4d8ae

Stephen Warren authored May 31, 2011

Currently, both the WM8903 and TPS6586x chips attempt to register with
gpiolib using the same GPIO numbers. This causes the audio driver to
fail to initialize.

To solve this, add a define to board-harmony.h for the TPS6586x, and make
board-harmony-power.c use this define, instead of directly referencing
TEGRA_NR_GPIOS.

This fixes a regression introduced by commit
6f168f2f

.
ARM: tegra: harmony: initialize the TPS65862 PMIC

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Colin Cross <ccross@android.com>

f2a4d8ae

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · e6ece707

Linus Torvalds authored Jun 05, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
  btrfs: fix uninitialized variable warning
  btrfs: add helper for fs_info->closing
  Btrfs: add mount -o inode_cache
  btrfs: scrub: add explicit plugging
  btrfs: use btrfs_ino to access inode number
  Btrfs: don't save the inode cache if we are deleting this root
  btrfs: false BUG_ON when degraded
  Btrfs: don't save the inode cache in non-FS roots
  Btrfs: make sure we don't overflow the free space cache crc page
  Btrfs: fix uninit variable in the delayed inode code
  btrfs: scrub: don't reuse bios and pages
  Btrfs: leave spinning on lookup and map the leaf
  Btrfs: check for duplicate entries in the free space cache
  Btrfs: don't try to allocate from a block group that doesn't have enough space
  Btrfs: don't always do readahead
  Btrfs: try not to sleep as much when doing slow caching
  Btrfs: kill BTRFS_I(inode)->block_group
  Btrfs: don't look at the extent buffer level 3 times in a row
  Btrfs: map the node block when looking for readahead targets
  Btrfs: set range_start to the right start in count_range_bits
  ...

e6ece707

hwmon: (max6642): Better chip detection schema · 942c1a92

Per Dalén authored May 26, 2011



Improve detection of MAX6642 by reading non existing registers (0x04, 0x06
and 0xff). Reading those registers returns the previously read value.

Signed-off-by: Per Dalen <per.dalen@appeartv.com>
[guenter.roeck@ericsson.com: added second set of register reads]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>

942c1a92

Jun 04, 2011

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 23c79d31

Linus Torvalds authored Jun 04, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] Fix oops caused by queue refcounting failure

23c79d31

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 0e833d8c

Linus Torvalds authored Jun 04, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
  tg3: Fix tg3_skb_error_unmap()
  net: tracepoint of net_dev_xmit sees freed skb and causes panic
  drivers/net/can/flexcan.c: add missing clk_put
  net: dm9000: Get the chip in a known good state before enabling interrupts
  drivers/net/davinci_emac.c: add missing clk_put
  af-packet: Add flag to distinguish VID 0 from no-vlan.
  caif: Fix race when conditionally taking rtnl lock
  usbnet/cdc_ncm: add missing .reset_resume hook
  vlan: fix typo in vlan_dev_hard_start_xmit()
  net/ipv4: Check for mistakenly passed in non-IPv4 address
  iwl4965: correctly validate temperature value
  bluetooth l2cap: fix locking in l2cap_global_chan_by_psm
  ath9k: fix two more bugs in tx power
  cfg80211: don't drop p2p probe responses
  Revert "net: fix section mismatches"
  drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run()
  sctp: stop pending timers and purge queues when peer restart asoc
  drivers/net: ks8842 Fix crash on received packet when in PIO mode.
  ip_options_compile: properly handle unaligned pointer
  iwlagn: fix incorrect PCI subsystem id for 6150 devices
  ...

0e833d8c

btrfs: fix uninitialized variable warning · aa0467d8

David Sterba authored Jun 03, 2011

With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
produced this warning:

fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
uninitialized in this function

Introduced by commit 16cdcec7

 ("btrfs: implement delayed inode items
operation").

This fixes a bug in btrfs_update_inode(): if the returned value from
btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
updated and several call paths may hit a BUG_ON or fail with strange
code.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Sterba <dsterba@suse.cz>

aa0467d8

btrfs: add helper for fs_info->closing · 7841cb28

David Sterba authored May 31, 2011



wrap checking of filesystem 'closing' flag and fix a few missing memory
barriers.

Signed-off-by: David Sterba <dsterba@suse.cz>

7841cb28

Btrfs: add mount -o inode_cache · 4b9465cb

Chris Mason authored Jun 03, 2011



This makes the inode map cache default to off until we
fix the overflow problem when the free space crcs don't fit
inside a single page.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

4b9465cb

btrfs: scrub: add explicit plugging · e7786c3a

Arne Jansen authored May 28, 2011



With the removal of the implicit plugging scrub ends up doing more and
smaller I/O than necessary. This patch adds explicit plugging per chunk.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e7786c3a

btrfs: use btrfs_ino to access inode number · a4689d2b

David Sterba authored May 31, 2011

commit 4cb5300b

 ("Btrfs: add mount -o auto_defrag") accesses inode
number directly while it should use the helper with the new inode
number allocator.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a4689d2b

Btrfs: don't save the inode cache if we are deleting this root · d132a538

Josef Bacik authored May 31, 2011

With xfstest 254 I can panic the box every time with the inode number caching
stuff on. This is because we clean the inodes out when we delete the subvolume,
but then we write out the inode cache which adds an inode to the subvolume inode
tree, and then when it gets evicted again the root gets added back on the dead
roots list and is deleted again, so we have a double free. To stop this from
happening just return 0 if refs is 0 (and we're not the tree root since tree
root always has refs of 0). With this fix 254 no longer panics. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d132a538

btrfs: false BUG_ON when degraded · 5f3f302a

Arne Jansen authored May 30, 2011



In degraded mode the struct btrfs_device of missing devs don't have
device->name set. A kstrdup of NULL correctly returns NULL. Don't
BUG in this case.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5f3f302a

Btrfs: don't save the inode cache in non-FS roots · ca456ae2

liubo authored Jun 01, 2011



This adds extra checks to make sure the inode map we are caching really
belongs to a FS root instead of a special relocation tree.  It
prevents crashes during balancing operations.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ca456ae2

Btrfs: make sure we don't overflow the free space cache crc page · 211f96c2

Chris Mason authored Jun 03, 2011



The free space cache uses only one page for crcs right now,
which means we can't have a cache file bigger than the
crcs we can fit in the first page.  This adds a check to
enforce that restriction.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

211f96c2

Btrfs: fix uninit variable in the delayed inode code · 17aca1c9
Chris Mason authored Jun 03, 2011
```
The nitems counter needs to start at zero

Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
17aca1c9

btrfs: scrub: don't reuse bios and pages · 1bc87793

Arne Jansen authored May 28, 2011

The current scrub implementation reuses bios and pages as often as possible,
allocating them only on start and releasing them when finished. This leads
to more problems with the block layer than it's worth. The elevator gets
confused when there are more pages added to the bio than bi_size suggests.
This patch completely rips out the reuse of bios and pages and allocates
them freshly for each submit.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Maosn <chris.mason@oracle.com>

1bc87793

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 4f1ba49e

Linus Torvalds authored Jun 04, 2011

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: Use hlist_entry() for io_context.cic_list.first
  cfq-iosched: Remove bogus check in queue_fail path
  xen/blkback: potential null dereference in error handling
  xen/blkback: don't call vbd_size() if bd_disk is NULL
  block: blkdev_get() should access ->bd_disk only after success
  CFQ: Fix typo and remove unnecessary semicolon
  block: remove unwanted semicolons
  Revert "block: Remove extra discard_alignment from hd_struct."
  nbd: adjust 'max_part' according to part_shift
  nbd: limit module parameters to a sane value
  nbd: pass MSG_* flags to kernel_recvmsg()
  block: improve the bio_add_page() and bio_add_pc_page() descriptions

4f1ba49e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin · 39b4a46f
Linus Torvalds authored Jun 04, 2011
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
  Blackfin: strncpy: fix handling of zero lengths
```
39b4a46f

Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · cb37bbd9

Linus Torvalds authored Jun 04, 2011

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  asm-generic/unistd.h: support sendmmsg syscall
  tile: enable CONFIG_BUGVERBOSE

cb37bbd9

Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 · 3af91a12

Linus Torvalds authored Jun 04, 2011

* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix-up free space earlier
  UBIFS: intialize LPT earlier
  UBIFS: assert no fixup when writing a node
  UBIFS: fix clean znode counter corruption in error cases
  UBIFS: fix memory leak on error path
  UBIFS: fix shrinker object count reports
  UBIFS: fix recovery broken by the previous recovery fix
  UBIFS: amend ubifs_recover_leb interface
  UBIFS: introduce a "grouped" journal head flag
  UBIFS: supress false error messages

3af91a12

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-ktest · 9a44fde3

Linus Torvalds authored Jun 04, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-ktest:
  ktest: Ignore unset values of the minconfig in config_bisect
  ktest: Fix result of rebooting the kernel
  ktest: Fix off-by-one in config bisect result

9a44fde3

Merge branch 'rmobile-fixes-for-linus' of... · cd4ecf87

Linus Torvalds authored Jun 04, 2011

Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  ARM: mach-shmobile: add DMAC clock definitions on SH7372
  ARM: arch-shmobile: support SDHI card detection on mackerel, using a GPIO
  sh_mobile_meram: MERAM platform data for LCDC

cd4ecf87

Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 1f3bd0f2

Linus Torvalds authored Jun 04, 2011

* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  dmaengine: shdma: fix a regression: initialise DMA channels for memcpy
  dmaengine: shdma: Fix up fallout from runtime PM changes.
  Revert "clocksource: sh_cmt: Runtime PM support"
  Revert "clocksource: sh_tmu: Runtime PM support"
  sh: Fix up asm-generic/ptrace.h fallout.
  sh64: Move from P1SEG to CAC_ADDR for consistent sync.
  sh64: asm/pgtable.h needs asm/mmu.h
  sh: asm/tlb.h needs linux/swap.h
  sh: mark DMA slave ID 0 as invalid
  sh: Update shmin to reflect PIO dependency.
  sh: arch/sh/kernel/process_32.c needs linux/prefetch.h.
  sh: add MMCIF runtime PM support on ecovec
  sh: switch ap325rxa to dynamically manage the platform camera

1f3bd0f2

Revert "ASoC: Update cx20442 for TTY API change" · bb3d6bf1

Linus Torvalds authored Jun 04, 2011

This reverts commit ed0bd233

.

Since we reverted the TTY API change, we should revert the ASoC update
to it too.

Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Liam Girdwood <lrg@ti.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bb3d6bf1

Revert "tty: make receive_buf() return the amout of bytes received" · 55db4c64

Linus Torvalds authored Jun 04, 2011

This reverts commit b1c43f82.

It was broken in so many ways, and results in random odd pty issues.

It re-introduced the buggy schedule_work() in flush_to_ldisc() that can
cause endless work-loops (see commit a5660b41: "tty: fix endless
work loop when the buffer fills up").

It also used an "unsigned int" return value fo the ->receive_buf()
function, but then made multiple functions return a negative error code,
and didn't actually check for the error in the caller.

And it didn't actually work at all.  BenH bisected down odd tty behavior
to it:
  "It looks like the patch is causing some major malfunctions of the X
   server for me, possibly related to PTYs.  For example, cat'ing a
   large file in a gnome terminal hangs the kernel for -minutes- in a
   loop of what looks like flush_to_ldisc/workqueue code, (some ftrace
   data in the quoted bits further down).

   ...

   Some more data: It -looks- like what happens is that the
   flush_to_ldisc work queue entry constantly re-queues itself (because
   the PTY is full ?) and the workqueue thread will basically loop
   forver calling it without ever scheduling, thus starving the consumer
   process that could have emptied the PTY."

which is pretty much exactly the problem we fixed in a5660b41

.

Milton Miller pointed out the 'unsigned int' issue.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Milton Miller <miltonm@bga.com>
Cc: Stefan Bigler <stefan.bigler@keymile.com>
Cc: Toby Gray <toby.gray@realvnc.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

55db4c64

Merge branch 'master' of... · 7b29dc21

John W. Linville authored Jun 03, 2011

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into for-davem

7b29dc21

ALSA: usb - turn off de-emphasis in s/pdif for cm6206 · 157186bc

Eric Lammerts authored May 27, 2011



CM6206: Turn off de-emphasis channel status bit in S/PDIF output.

Signed-off-by: Eric Lammerts <eric@lammerts.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

157186bc