Commits · a36b44988cef1fc007535107013571fa691a2d7f · 方亚芬 / linux

Aug 26, 2009

ext4: use ext4_grpblk_t more extensively · a36b4498

Eric Sandeen authored Aug 25, 2009



unsigned  short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.

There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.

I added a few comments to the struct ext4_group_info definition
as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a36b4498

ext4: use variables not types in sizeofs() for allocations · 1927805e

Eric Sandeen authored Aug 25, 2009



Precursor to changing some types; to keep things in sync, it 
seems better to allocate/memset based on the size of the 
variables we are using rather than on some disconnected 
basic type like "unsigned short"

Signed-off-by: Eric Sandeen <sandeen@redhat.com>

1927805e

ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84

Aneesh Kumar K.V authored Aug 25, 2009



We need to unlock the new inode before iput.  This patch fixes the
following warning when calling chattr +e to migrate a file to use
extents.  It also fixes problems in when e4defrag attempts to
defragment an inode.

[  470.400044] ------------[ cut here ]------------
[  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
[  470.400072] Hardware name: N/A
.....
...
[  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
[  470.400359] Call Trace:
[  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
[  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
[  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
[  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
[  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
[  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
[  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
[  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
[  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
[  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
[  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
[  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
[  470.400557] ---[ end trace ab85723542352dac ]---

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a8526e84

Aug 18, 2009

ext4: Add feature set check helper for mount & remount paths · a13fb1a4

Eric Sandeen authored Aug 18, 2009



A user reported that although his root ext4 filesystem was mounting
fine, other filesystems would not mount, with the:

"Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"

error on his 32-bit box built without CONFIG_LBDAF.  This is because
the test at mount time for this situation was not being re-checked
on remount, and the normal boot process makes an ro->rw transition,
so this was being missed.

Refactor to make a common helper function to test the filesystem
features against the type of mount request (RO vs. RW) so that we 
stay consistent.

Addresses Red-Hat-Bugzilla: #517650

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a13fb1a4

simplify some logic in ext4_mb_normalize_request · 38877f4e

Eric Sandeen authored Aug 17, 2009



While reading through some of the mballoc code it seems that a couple
spots in the size normalization function could be streamlined.

The test for non-overlapping PAs can be or'd for the start & end
conditions, and the tests for adjacent PAs can be else-if'd - 
it's essentially independently testing:

	if (A + B <= C)
		...
	if (A > C)
		...

These cannot both be true so it seems like the else-if might
be slightly more efficient and/or informative.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

38877f4e

ext4: open-code ext4_mb_update_group_info · 0373130d

Eric Sandeen authored Aug 17, 2009



ext4_mb_update_group_info is only called in one place, and it's
extremely simple.  There's no reason to have it in a separate function
in a separate file as far as I can tell, it just obfuscates what's
really going on.

Perhaps it was intended to keep the grp->bb_* manipulation local to
mballoc.c but we're already accessing other grp-> fields in balloc.c
directly so this seems ok.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0373130d

ext4: reject too-large filesystems on 32-bit kernels · bf43d84b

Eric Sandeen authored Aug 17, 2009



ext4 will happily mount a > 16T filesystem on a 32-bit box, but
this is not safe; writes to the block device will wrap past 16T
and the page cache can't index past 16T (232 index * 4k pages).

Adding another test to the existing "too many sectors" test
should do the trick.

Add a comment, a relevant return value, and fix the reference
to the CONFIG_LBD(AF) option as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

bf43d84b

jbd2: bitfields should be unsigned · 0ccff1a4

H Hartley Sweeten authored Aug 17, 2009



This fixes sparse noise:
  error: dubious one-bit signed bitfield

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@ucw.cz>

0ccff1a4

ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef

Jan Kara authored Aug 17, 2009



During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).

We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:

1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).

2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.

This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

487caeef

jbd2: Annotate transaction start also for jbd2_journal_restart() · 9599b0e5

Jan Kara authored Aug 17, 2009



lockdep annotation for a transaction start has been at the end of
jbd2_journal_start(). But a transaction is also started from
jbd2_journal_restart(). Move the lockdep annotation to start_this_handle()
which covers both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9599b0e5

Sep 19, 2009

ext4: Show unwritten extent flag in ext4_ext_show_leaf() · 553f9008

Mingming authored Sep 18, 2009



ext4_ext_show_leaf() will display the leaf extents when extent
debugging is enabled.

Printing out the unwritten bit is useful for debugging unwritten
extent, allow us to see the unwritten extents vs written extents,
after the unwritten extents are splitted or converted.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>

553f9008

Sep 01, 2009

ext4: Compile warning fix when EXT_DEBUG enabled · 84fe3bef

Mingming authored Sep 01, 2009



When EXT_DEBUG is enabled I received the following compile warning on
PPC64:

  CC [M]  fs/ext4/inode.o
  CC [M]  fs/ext4/extents.o
fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
  CC [M]  fs/ext4/migrate.o

The patch fixes compile warning.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Index: linux-2.6.31-rc4/fs/ext4/extents.c
===================================================================

84fe3bef

Sep 19, 2009

ext4: Avoid group preallocation for closed files · 50797481

Theodore Ts'o authored Sep 18, 2009

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

50797481

Aug 10, 2009

ext4: Fix bugs in mballoc's stream allocation mode · 4ba74d00

Theodore Ts'o authored Aug 09, 2009



The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up.  These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

4ba74d00

ext4: Display the mballoc flags in mb_history in hex instead of decimal · 0ef90db9

Theodore Ts'o authored Aug 09, 2009



Displaying the flags in base 16 makes it easier to see which flags
have been set.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0ef90db9

Sep 19, 2009

ext4: Add configurable run-time mballoc debugging · 6ba495e9

Theodore Ts'o authored Sep 18, 2009



Allow mballoc debugging to be enabled at run-time instead of just at
compile time.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6ba495e9

Aug 11, 2009

ext4: fix journal ref count in move_extent_par_page · 91cc219a

Peng Tao authored Aug 10, 2009

move_extent_par_page calls a_ops->write_begin() to increase journal
handler's reference count. However, if either mext_replace_branches()
or ext4_get_block fails, the increased reference count isn't
decreased. This will cause a later attempt to umount of the fs to hang
forever. The patch addresses the issue by calling ext4_journal_stop()
if page is not NULL (which means a_ops->write_end() isn't invoked).

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

91cc219a

jbd2: round commit timer up to avoid uncommitted transaction · b1f485f2

Andreas Dilger authored Aug 10, 2009



fix jiffie rounding in jbd commit timer setup code.  Rounding down
could cause the timer to be fired before the corresponding transaction
has expired.  That transaction can stay not committed forever if no
new transaction is created or expicit sync/umount happens.

Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com>
Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b1f485f2

ext4: remove redundant test on unsigned · c333e073

Roel Kluin authored Aug 10, 2009



unsigned i_block cannot be less than 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c333e073

Jul 28, 2009

ext4: fix build warning when EXT4FS_DEBUG is on · 785b4b3a

Peng Tao authored Jul 27, 2009



When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings:

linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’:
linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type
‘long unsigned int’, but argument 2 has type ‘ext4_group_t’

So add a type cast to suppress it. 

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

785b4b3a

Jul 06, 2009

ext4: Fix compile warnings with MB_DEBUG · 1c718505

Akira Fujita authored Jul 05, 2009



When MB_DEBUG is enabled, we get some compile warnings because
ext4_group_t is unsigned int.  This patch fixes them.

Signed-off-by Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1c718505

ext4: Remove unnecessary semicolons in mballoc.c · 5a4a7989

Joe Perches authored Jul 05, 2009



Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5a4a7989

Jul 17, 2009

ext4: More buffer head reference leaks · 6487a9d3

Curt Wohlgemuth authored Jul 17, 2009



After the patch I posted last week regarding buffer head ref leaks in
no-journal mode, I looked at all the code that uses buffer heads and
searched for more potential leaks.

The patch below fixes the issues I found; these can occur even when a
journal is present.

The change to inode.c fixes a double release if
ext4_journal_get_create_access() fails.

The changes to namei.c are more complicated.  add_dirent_to_buf() will
release the input buffer head EXCEPT when it returns -ENOSPC.  There are
some callers of this routine that don't always do the brelse() in the event
that -ENOSPC is returned.  Unfortunately, to put this fix into ext4_add_entry()
required capturing the return value of make_indexed_dir() and
add_dirent_to_buf().

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6487a9d3

jbd2: Fail to load a journal if it is too short · f6f50e28

Jan Kara authored Jul 17, 2009



Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f6f50e28

Jul 28, 2009

ext4: Avoid null pointer dereference when decoding EROFS w/o a journal · 78f1ddbb

Theodore Ts'o authored Jul 27, 2009



We need to check to make sure a journal is present before checking the
journal flags in ext4_decode_error().

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

78f1ddbb

ext4: Fix typo in ext4/Kconfig · 43b38520

Manish Katiyar authored Jul 27, 2009



Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

43b38520

Jul 17, 2009

ext4: Fix memory leak fix when mounting an ext4 filesystem · 024eab4d

Aneesh Kumar K.V authored Jul 17, 2009

The allocation of the ext4_group_info array was moved to a new
function ext4_mb_add_group_info() in commit 5f21b0e6

 so that online
resize would use a common (and correct) codepath.  Unfortunately, the
call to the new ext4_mb_add_group_info() function was added without
removing the code which originally allocated the array.  This caused a
memory leak each time an ext4 filesystem was mounted.

The fix is simple; remove the code that did the original allocation,
since it is no longer needed.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

024eab4d

Sep 16, 2009

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 · ab86e576

Linus Torvalds authored Sep 16, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
  debugfs: Modify default debugfs directory for debugging pktcdvd.
  debugfs: Modified default dir of debugfs for debugging UHCI.
  debugfs: Change debugfs directory of IWMC3200
  debugfs: Change debuhgfs directory of trace-events-sample.h
  debugfs: Fix mount directory of debugfs by default in events.txt
  hpilo: add poll f_op
  hpilo: add interrupt handler
  hpilo: staging for interrupt handling
  driver core: platform_device_add_data(): use kmemdup()
  Driver core: Add support for compatibility classes
  uio: add generic driver for PCI 2.3 devices
  driver-core: move dma-coherent.c from kernel to driver/base
  mem_class: fix bug
  mem_class: use minor as index instead of searching the array
  driver model: constify attribute groups
  UIO: remove 'default n' from Kconfig
  Driver core: Add accessor for device platform data
  Driver core: move dev_get/set_drvdata to drivers/base/dd.c
  Driver core: add new device to bus's list before probing

ab86e576

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 · 7ea61767

Linus Torvalds authored Sep 16, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (641 commits)
  Staging: remove sxg driver
  Staging: remove heci driver
  Staging: remove at76_usb wireless driver.
  Staging: rspiusb: remove the driver
  Staging: meilhaus: remove the drivers
  Staging: remove me4000 driver.
  Staging: line6: ffzb returns an unsigned integer
  Staging: line6: pod.c: style cleanups
  Staging: iio: introduce missing kfree
  Staging: dream: introduce missing kfree
  Staging: comedi: addi-data: NULL dereference of amcc in v_pci_card_list_init()
  Staging: vt665x: fix built-in compiling
  Staging: rt3090: enable NATIVE_WPA_SUPPLICANT_SUPPORT option
  Staging: rt3090: port changes in WPA_MIX_PAIR_CIPHER to rt3090
  Staging: rt3090: rename device from raX to wlanX
  Staging: rt3090: remove possible conflict with rt2860
  Staging: rt2860/rt2870/rt3070/rt3090: fix compiler warning on x86_64
  Staging: rt2860: add new device ids
  Staging: rt3090: add device id 1462:891a
  Staging: asus_oled: Cleaned up checkpatch issues.
  ...

7ea61767

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pcmcia-2.6 · 0950efd1

Linus Torvalds authored Sep 16, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pcmcia-2.6:
  pcmcia: document return value of pcmcia_loop_config
  pcmcia: dtl1_cs: fix pcmcia_loop_config logic
  pcmcia: drop non-existant includes
  pcmcia: disable prefetch/burst for OZ6933
  pcmcia: fix incorrect argument order to list_add_tail()
  pcmcia: drivers/pcmcia/pcmcia_resource.c: Remove unnecessary semicolons
  pcmcia: Use phys_addr_t for physical addresses
  pcmcia: drivers/pcmcia: Make static

0950efd1

Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · 4406c56d

Linus Torvalds authored Sep 16, 2009

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
  PCI hotplug: clean up acpi_run_hpp()
  PCI hotplug: acpiphp: use generic pci_configure_slot()
  PCI hotplug: shpchp: use generic pci_configure_slot()
  PCI hotplug: pciehp: use generic pci_configure_slot()
  PCI hotplug: add pci_configure_slot()
  PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
  PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
  PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
  PCI: Clear saved_state after the state has been restored
  PCI PM: Return error codes from pci_pm_resume()
  PCI: use dev_printk in quirk messages
  PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
  PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
  PCI Hotplug: acpiphp: find bridges the easy way
  PCI: pcie portdrv: remove unused variable
  PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
  ACPI PM: Replace wakeup.prepared with reference counter
  PCI PM: Introduce device flag wakeup_prepared
  PCI / ACPI PM: Rework some debug messages
  PCI PM: Simplify PCI wake-up code
  ...

Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases.  The
'needs_freset' initialization added in 6e19314c ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.

4406c56d

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 6b7b352f

Linus Torvalds authored Sep 16, 2009

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK

6b7b352f

Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block · a3eb51ec

Linus Torvalds authored Sep 16, 2009

* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: fix possible bdi writeback refcounting problem
  writeback: Fix bdi use after free in wb_work_complete()
  writeback: improve scalability of bdi writeback work queues
  writeback: remove smp_mb(), it's not needed with list_add_tail_rcu()
  writeback: use schedule_timeout_interruptible()
  writeback: add comments to bdi_work structure
  writeback: splice dirty inode entries to default bdi on bdi_destroy()
  writeback: separate starting of sync vs opportunistic writeback
  writeback: inline allocation failure handling in bdi_alloc_queue_work()
  writeback: use RCU to protect bdi_list
  writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout
  fs: Assign bdi in super_block
  writeback: make wb_writeback() take an argument structure
  writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE
  writeback: get rid of wbc->for_writepages
  fs: remove bdev->bd_inode_backing_dev_info

a3eb51ec

writeback: fix possible bdi writeback refcounting problem · 1ef7d9aa

Nick Piggin authored Sep 15, 2009



wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.

Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?

Also add a comment in bdi_work_clear.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1ef7d9aa

writeback: Fix bdi use after free in wb_work_complete() · 77b9d059

Nick Piggin authored Sep 15, 2009



By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

77b9d059

writeback: improve scalability of bdi writeback work queues · 77fad5e6

Nick Piggin authored Sep 15, 2009

If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

77fad5e6

writeback: remove smp_mb(), it's not needed with list_add_tail_rcu() · deed62ed

Nick Piggin authored Sep 15, 2009



list_add_tail_rcu contains required barriers.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

deed62ed

writeback: use schedule_timeout_interruptible() · 49db0414

Jens Axboe authored Sep 15, 2009



Gets rid of a manual set_current_state().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

49db0414

writeback: add comments to bdi_work structure · 8010c3b6

Jens Axboe authored Sep 15, 2009



And document its retriever, get_next_work_item().

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

8010c3b6

writeback: splice dirty inode entries to default bdi on bdi_destroy() · ce5f8e77

Jens Axboe authored Sep 14, 2009



We cannot safely ensure that the inodes are all gone at this point
in time, and we must not destroy this bdi with inodes having off it.
So just splice our entries to the default bdi since that one will
always persist.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ce5f8e77