linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs/Kconfig: move nilfs2 outside misc filesystems	Ryusuke Konishi	2009-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: convert nilfs_bmap_lookup to an inline function	Ryusuke Konishi	2009-09-14	3	-36/+28
\| \| \| \| \| \| \| \| \| \|	The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: allow btree code to directly call dat operations	Ryusuke Konishi	2009-09-14	4	-299/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: add update functions of virtual block address to dat	Ryusuke Konishi	2009-09-14	3	-20/+44
\| \| \| \| \| \| \| \| \| \|	This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: remove individual gfp constants for each metadata file	Ryusuke Konishi	2009-09-14	8	-19/+9
\| \| \| \| \| \| \| \|	This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: stop zero-fill of btree path just before free it	Ryusuke Konishi	2009-09-14	1	-24/+12
\| \| \| \| \| \| \| \|	The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: remove unused btree argument from btree functions	Ryusuke Konishi	2009-09-14	1	-248/+208
\| \| \| \| \| \| \| \| \| \| \| \|	Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free	Ryusuke Konishi	2009-09-14	2	-12/+0
\| \| \| \| \| \|	These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: shorten freeze period due to GC in write operation v3	Jiro SEKIBA	2009-09-14	2	-7/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: add more check routines in mount process	Zhu Yanhai	2009-09-14	2	-4/+14
\| \| \| \| \| \| \| \| \|	nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: An unassigned variable is assigned to a never used structure member	Zhang Qiang	2009-09-14	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \|	nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT	Ryusuke Konishi	2009-09-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: stop using periodic write_super callback	Jiro SEKIBA	2009-09-14	2	-46/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: clean up nilfs_write_super	Jiro SEKIBA	2009-09-14	2	-8/+17
\| \| \| \| \| \| \| \|	Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs	Jiro SEKIBA	2009-09-14	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: remove redundant super block commit	Jiro SEKIBA	2009-09-14	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: implement nilfs_show_options to display mount options in /proc/mounts	Jiro SEKIBA	2009-09-14	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \|	This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: always lookup disk block address before reading metadata block	Ryusuke Konishi	2009-09-14	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: use semaphore to protect pointer to a writable FS-instance	Ryusuke Konishi	2009-09-14	3	-26/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit 201913ed746c7724a40d33ee5a0b6a1fd2ef3193. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix format string compile warning (ino_t)	Heiko Carstens	2009-09-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix ignored error code in __nilfs_read_inode()	Ryusuke Konishi	2009-09-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key	Ryusuke Konishi	2009-08-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org
*	nilfs2: fix oopses with doubly mounted snapshots	Ryusuke Konishi	2009-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()	Zhang Qiang	2009-08-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix missing unlock in error path of nilfs_mdt_write_page	Ryusuke Konishi	2009-08-02	1	-1/+3
\| \| \| \| \| \| \|	This adds a missing unlock of nilfs->ns_writer_mutex in nilfs_mdt_write_page() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodes	Ryusuke Konishi	2009-08-01	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrea Gelmini gave me a report that a kernel oops hit on a nilfs filesystem with a 1KB block size when doing rsync. This turned out to be caused by an inconsistency of dirty state between a page and its buffers storing b-tree node blocks. If the page had multiple buffers split over multiple logs, and if the logs were written at a time, a dirty flag remained in the page even every dirty flag in the buffers was cleared. This will fix the failure by dropping the dirty flag properly for pages with the discrete multiple b-tree nodes. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: stable@kernel.org
*	fs/Kconfig: move nilfs2 out	Ryusuke Konishi	2009-07-14	1	-0/+25
\| \| \| \| \| \| \| \| \|	fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>
*	headers: smp_lock.h redux	Alexey Dobriyan	2009-07-12	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: fix disorder in cp count on error during deleting checkpoints	Jiro SEKIBA	2009-07-05	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug that checkpoint count gets wrong on errors when deleting a series of checkpoints. The count error is persistent since the checkpoint count is stored on disk. Some userland programs refer to the count via ioctl, and this bugfix is needed to prevent malfunction of such programs. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
*	nilfs2: fix lockdep warning between regular file and inode file	Ryusuke Konishi	2009-07-05	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will fix the following false positive of recursive locking which lockdep has detected: ============================================= [ INFO: possible recursive locking detected ] 2.6.30-nilfs #42 --------------------------------------------- nilfs_cleanerd/10607 is trying to acquire lock: (&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] other info that might help us debug this: 2 locks held by nilfs_cleanerd/10607: #0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] #1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: fix incorrect KERN_CRIT messages in case of write failures	Ryusuke Konishi	2009-07-05	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of write-failure retries, the following KERN_CRIT level messages are mistakenly output by nilfs_dat_commit_start() function: nilfs_dat_commit_start: vbn = 408463, start = 12506, end = 18446744073709551615, pbn = 530210 nilfs_dat_commit_start: vbn = 408515, start = 12506, end = 18446744073709551615, pbn = 530211 nilfs_dat_commit_start: vbn = 408464, start = 12506, end = 18446744073709551615, pbn = 530212 ... This suppresses these messages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
*	nilfs2: fix hang problem of log writer which occurs after write failures	Ryusuke Konishi	2009-07-05	1	-20/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Leandro Lucarella gave me a report that nilfs gets stuck after its write function fails. The problem turned out to be caused by bugs which leave writeback flag on pages. This fixes the problem by ensuring to clear the writeback flag in error path. Reported-by: Leandro Lucarella <llucax@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
*	nilfs2: remove unlikely directive causing mis-conversion of error code	Ryusuke Konishi	2009-07-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following error code handling in nilfs_segctor_write() function wrongly converted negative error codes to a truth value (i.e. 1): err = unlikely(err) ? : res; which originaly meant to be err = err ? : res; This mis-conversion caused that write or sync functions receive the unexpected error code. This fixes the bug by removing the unlikely directive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
*	switch nilfs2 to inode->i_acl	Al Viro	2009-06-24	3	-22/+0
\| \| \| \| \| \| \|	Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2009-06-15	26	-858/+725
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits) nilfs2: support contiguous lookup of blocks nilfs2: add sync_page method to page caches of meta data nilfs2: use device's backing_dev_info for btree node caches nilfs2: return EBUSY against delete request on snapshot nilfs2: modify list of unsupported features in caveats nilfs2: enable sync_page method nilfs2: set bio unplug flag for the last bio in segment nilfs2: allow future expansion of metadata read out via get info ioctl NILFS2: Pagecache usage optimization on NILFS2 nilfs2: remove nilfs_btree_operations from btree mapping nilfs2: remove nilfs_direct_operations from direct mapping nilfs2: remove bmap pointer operations nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function nilfs2: move get block functions in bmap.c into btree codes nilfs2: remove nilfs_bmap_delete_block nilfs2: remove nilfs_bmap_put_block nilfs2: remove header file for segment list operations nilfs2: eliminate removal list of segments nilfs2: add sufile function that can modify multiple segment usages ...
\| *	nilfs2: support contiguous lookup of blocks	Ryusuke Konishi	2009-06-10	5	-8/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although get_block() callback function can return extent of contiguous blocks with bh->b_size, nilfs_get_block() function did not support this feature. This adds contiguous lookup feature to the block mapping codes of nilfs, and allows the nilfs_get_blocks() function to return the extent information by applying the feature. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: add sync_page method to page caches of meta data	Ryusuke Konishi	2009-06-10	3	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This applies block_sync_page() function to the sync_page method of page caches for meta data files, gc page caches, and btree node buffers. This is a companion patch of ("nilfs2: enable sync_page mothod") which applied the function for data pages. This allows lock_page() for those meta data to unplug pending bio requests. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: use device's backing_dev_info for btree node caches	Ryusuke Konishi	2009-06-10	5	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, default_backing_dev_info was used for the mapping of btree node caches. This uses device dependent backing_dev_info to allow detailed control of the device for the btree node pages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: return EBUSY against delete request on snapshot	Ryusuke Konishi	2009-06-10	1	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps userland programs like the rmcp command to distinguish error codes returned against a checkpoint removal request. Previously -EPERM was returned, and not discriminable from real permission errors. This also allows removal of the latest checkpoint because the deletion leads to create a new checkpoint, and thus it's harmless for the filesystem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: enable sync_page method	Ryusuke Konishi	2009-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a missing sync_page method which unplugs bio requests when waiting for page locks. This will improve read performance of nilfs. Here is a measurement result using dd command. Without this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s With this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: set bio unplug flag for the last bio in segment	Ryusuke Konishi	2009-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This sets BIO_RW_UNPLUG flag on the last bio of each segment during write. The last bio should be unplugged immediately because the caller waits for the completion after the submission. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: allow future expansion of metadata read out via get info ioctl	Ryusuke Konishi	2009-06-10	7	-39/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nilfs has some ioctl commands to read out metadata from meta data files: - NILFS_IOCTL_GET_CPINFO for checkpoint file, - NILFS_IOCTL_GET_SUINFO for segment usage file, and - NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file, respectively. Every routine on these metadata files is implemented so that it allows future expansion of on-disk format. But, the above ioctl commands do not support expansion even though nilfs_argv structure can handle arbitrary size for data exchanged via ioctl. This allows future expansion of the following structures which give basic format of the "get information" ioctls: - struct nilfs_cpinfo - struct nilfs_suinfo - struct nilfs_vinfo So, this introduces forward compatility of such ioctl commands. In this patch, a sanity check in nilfs_ioctl_get_info() function is changed to accept larger data structure [1], and metadata read routines are rewritten so that they become compatible for larger structures; the routines will just ignore the remaining fields which the current version of nilfs doesn't know. [1] The ioctl function already has another upper limit (PAGE_SIZE against a structure, which appears in nilfs_ioctl_wrap_copy function), and this will not cause security problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	NILFS2: Pagecache usage optimization on NILFS2	Hisashi Hifumi	2009-06-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, I introduced "is_partially_uptodate" aops for NILFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. 1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil e-rw-ratio=1 run -2.6.30-rc5 Test execution summary: total time: 151.2907s total number of events: 200000 total time taken by event execution: 2409.8387 per-request statistics: min: 0.0000s avg: 0.0120s max: 0.9306s approx. 95 percentile: 0.0439s Threads fairness: events (avg/stddev): 12500.0000/238.52 execution time (avg/stddev): 150.6149/0.01 -2.6.30-rc5-patched Test execution summary: total time: 140.8828s total number of events: 200000 total time taken by event execution: 2240.8577 per-request statistics: min: 0.0000s avg: 0.0112s max: 0.8750s approx. 95 percentile: 0.0418s Threads fairness: events (avg/stddev): 12500.0000/218.43 execution time (avg/stddev): 140.0536/0.01 arch: ia64 pagesize: 16k Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove nilfs_btree_operations from btree mapping	Ryusuke Konishi	2009-06-10	2	-63/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	will remove indirect function calls using nilfs_btree_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove nilfs_direct_operations from direct mapping	Ryusuke Konishi	2009-06-10	2	-52/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	will remove indirect function calls using nilfs_direct_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove bmap pointer operations	Ryusuke Konishi	2009-06-10	4	-234/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the bmap codes of nilfs used three types of function tables. The abuse of indirect function calls decreased source readability and suffered many indirect jumps which would confuse branch prediction of processors. This eliminates one type of the function tables, nilfs_bmap_ptr_operations, which was used to dispatch low level pointer operations of the nilfs bmap. This adds a new integer variable "b_ptr_type" to nilfs_bmap struct, and uses the value to select the pointer operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct	Ryusuke Konishi	2009-06-10	6	-43/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will cut off 16 bytes from the nilfs_bmap struct which is embedded in the on-memory inode of nilfs. The b_high field was never used, and the b_low field stores a constant value which can be determined by whether the inode uses btree for block mapping or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function	Ryusuke Konishi	2009-06-10	2	-13/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This indirect function is set to NULL only for gc cache inodes, but the gc cache inodes never call this function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: move get block functions in bmap.c into btree codes	Ryusuke Konishi	2009-06-10	3	-48/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two get block function for btree nodes, nilfs_bmap_get_block() and nilfs_bmap_get_new_block(), are called only from the btree codes. This relocation will increase opportunities of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove nilfs_bmap_delete_block	Ryusuke Konishi	2009-06-10	3	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	nilfs_bmap_delete_block() is a wrapper function calling nilfs_btnode_delete(). This removes it for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>