summaryrefslogtreecommitdiffstats
path: root/include/linux/bio.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-mergedTejun Heo2012-04-011-4/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup/for-3.5 contains the following changes which blk-cgroup needs to proceed with the on-going cleanup. * Dynamic addition and removal of cftypes to make config/stat file handling modular for policies. * cgroup removal update to not wait for css references to drain to fix blkcg removal hang caused by cfq caching cfqgs. Pull in cgroup/for-3.5 into block/for-3.5/core. This causes the following conflicts in block/blk-cgroup.c. * 761b3ef50e "cgroup: remove cgroup_subsys argument from callbacks" conflicts with blkiocg_pre_destroy() addition and blkiocg_attach() removal. Resolved by removing @subsys from all subsys methods. * 676f7c8f84 "cgroup: relocate cftype and cgroup_subsys definitions in controllers" conflicts with ->pre_destroy() and ->attach() updates and removal of modular config. Resolved by dropping forward declarations of the methods and applying updates to the relocated blkio_subsys. * 4baf6e3325 "cgroup: convert all non-memcg controllers to the new cftype interface" builds upon the previous item. Resolved by adding ->base_cftypes to the relocated blkio_subsys. Signed-off-by: Tejun Heo <tj@kernel.org>
| * Merge tag 'bug-for-3.4' of ↵Linus Torvalds2012-03-241-0/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
| | * BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker2012-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | fs: remove the second argument of k[un]map_atomic()Cong Wang2012-03-201-4/+4
| |/ | | | | | | | | Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Cong Wang <amwang@redhat.com>
* / block: implement bio_associate_current()Tejun Heo2012-03-061-0/+8
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IO scheduling and cgroup are tied to the issuing task via io_context and cgroup of %current. Unfortunately, there are cases where IOs need to be routed via a different task which makes scheduling and cgroup limit enforcement applied completely incorrectly. For example, all bios delayed by blk-throttle end up being issued by a delayed work item and get assigned the io_context of the worker task which happens to serve the work item and dumped to the default block cgroup. This is double confusing as bios which aren't delayed end up in the correct cgroup and makes using blk-throttle and cfq propio together impossible. Any code which punts IO issuing to another task is affected which is getting more and more common (e.g. btrfs). As both io_context and cgroup are firmly tied to task including userland visible APIs to manipulate them, it makes a lot of sense to match up tasks to bios. This patch implements bio_associate_current() which associates the specified bio with %current. The bio will record the associated ioc and blkcg at that point and block layer will use the recorded ones regardless of which task actually ends up issuing the bio. bio release puts the associated ioc and blkcg. It grabs and remembers ioc and blkcg instead of the task itself because task may already be dead by the time the bio is issued making ioc and blkcg inaccessible and those are all block layer cares about. elevator_set_req_fn() is updated such that the bio elvdata is being allocated for is available to the elevator. This doesn't update block cgroup policies yet. Further patches will implement the support. -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in rq_ioc() to fix build breakage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Stop using macro stubs for the bio data integrity callsMartin K. Petersen2012-01-131-13/+53
| | | | | | | | Replace preprocessor macro stubs with real function declarations to prevent warnings when CONFIG_BLK_DEV_INTEGRITY is disabled. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bio: change some signed vars to unsignedDan Carpenter2011-11-161-2/+2
| | | | | | | | | | | | | This is just a cleanup patch to silence a static checker warning. The problem is that we cap "nr_iovecs" so it can't be larger than "UIO_MAXIOV" but we don't check for negative values. It turns out this is prevented at other layers, but logically it doesn't make sense to have negative nr_iovecs so making it unsigned is nicer. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* include/linux/bio.h: use a static inline function for bio_integrity_clone()Stephen Rothwell2011-11-161-1/+5
| | | | | | | | | | | | | When CONFIG_BLK_DEV_INTEGRITY is not set, we get these warnings: drivers/md/dm.c: In function 'split_bvec': drivers/md/dm.c:1061:3: warning: statement with no effect drivers/md/dm.c: In function 'clone_bio': drivers/md/dm.c:1088:3: warning: statement with no effect Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Remove the control of complete cpu from bio.Tao Ma2011-10-241-8/+0
| | | | | | | | | | | | | | | | | | | | | bio originally has the functionality to set the complete cpu, but it is broken. Chirstoph said that "This code is unused, and from the all the discussions lately pretty obviously broken. The only thing keeping it serves is creating more confusion and possibly more bugs." And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine with leaving cpu control to the request based drivers, they are the only ones that can toggle the setting anyway". So this patch tries to remove all the work of controling complete cpu from a bio. Cc: Shaohua Li <shaohua.li@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: biovec_slab vs. CONFIG_BLK_DEV_INTEGRITYMartin K. Petersen2011-03-081-1/+0
| | | | | | | | The block integrity subsystem no longer uses the bio_vec slabs so this code can safely be compiled in. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: remove REQ_HARDBARRIERChristoph Hellwig2010-11-101-4/+0
| | | | | | | | | | | | | | | | | | | | REQ_HARDBARRIER is dead now, so remove the leftovers. What's left at this point is: - various checks inside the block layer. - sanity checks in bio based drivers. - now unused bio_empty_barrier helper. - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while, but Xen really needs to sort out it's barrier situaton. - setting of ordered tags in uas - dead code copied from old scsi drivers. - scsi different retry for barriers - it's dead and should have been removed when flushes were converted to FS requests. - blktrace handling of barriers - removed. Someone who knows blktrace better should add support for REQ_FLUSH and REQ_FUA, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: Turn bvec_k{un,}map_irq() into static inline functionsGeert Uytterhoeven2010-10-211-2/+9
| | | | | | | | | | | Convert bvec_k{un,}map_irq() from macros to static inline functions if !CONFIG_HIGHMEM, so we can easier detect mistakes like the one fixed in 93055c31045a2d5599ec613a0c6cdcefc481a460 ("ps3disk: passing wrong variable = to bvec_kunmap_irq()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block/scsi: Provide a limit on the number of integrity segmentsMartin K. Petersen2010-09-101-0/+4
| | | | | | | | | | | | | | | Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
* bio, fs: separate out bio_types.h and define READ/WRITE constants in terms ↵Tejun Heo2010-08-071-179/+4
| | | | | | | | | | | | | | | of BIO_RW_* flags linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_* flags. This is fragile and caused breakage during BIO_RW_* flag rearrangement. The hardcoding is to avoid include dependency hell. Create linux/bio_types.h which contatins definitions for bio data structures and flags and include it from bio.h and fs.h, and make fs.h define all READ/WRITE related constants in terms of BIO_RW_* flags. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: introduce REQ_FLUSH flagFUJITA Tomonori2010-08-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | SCSI-ml needs a way to mark a request as flush request in q->prepare_flush_fn because it needs to identify them later (e.g. in q->request_fn or prep_rq_fn). queue_flush sets REQ_HARDBARRIER in rq->cmd_flags however the block layer also sends normal REQ_TYPE_FS requests with REQ_HARDBARRIER. So SCSI-ml can't use REQ_HARDBARRIER to identify flush requests. We could change the block layer to clear REQ_HARDBARRIER bit before sending non flush requests to the lower layers. However, intorudcing the new flag looks cleaner (surely easier). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alasdair G Kergon <agk@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: unify flags for struct bio and struct requestChristoph Hellwig2010-08-071-47/+78
| | | | | | | | | | | | | | Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: add helpers to run flush_dcache_page() against a bio and a request's ↵Ilya Loginov2009-11-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | pages Mtdblock driver doesn't call flush_dcache_page for pages in request. So, this causes problems on architectures where the icache doesn't fill from the dcache or with dcache aliases. The patch fixes this. The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid pointless empty cache-thrashing loops on architectures for which flush_dcache_page() is a no-op. Every architecture was provided with this flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is equal 1 or do nothing otherwise. See "fix mtd_blkdevs problem with caches on some architectures" discussion on LKML for more information. Signed-off-by: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Peter Horton <phorton@bitbox.co.uk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Do not __always_inline bvec_kmap_irq() and bvec_kunmap_irq()Alberto Bertogli2009-11-021-6/+2
| | | | | | | | So remove both the comment and the inline requirement, going back to the inline hint. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe2009-09-111-18/+7
| | | | | | | | Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: make bio_rw_flagged() return a boolJens Axboe2009-09-111-1/+4
| | | | | | Makes for a saner interface, instead of returning the bit position. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: use the same failfast bits for bio and requestTejun Heo2009-09-111-20/+23
| | | | | | | | | | | | | | | | | | | | bio and request use the same set of failfast bits. This patch makes the following changes to simplify things. * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_* bits coincide with __REQ_FAILFAST_* bits. * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV but the matching is useless anyway. init_request_from_bio() is responsible for setting FAILFAST bits on FS requests and non-FS requests never use BIO_RW_AHEAD. Drop the code and comment from blk_rq_bio_prep(). * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and simplify FAILFAST flags handling in init_request_from_bio(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: Create bip slabs with embedded integrity vectorsMartin K. Petersen2009-07-011-6/+16
| | | | | | | | | | | | | This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>
* block: Add bio_list_peek()Geert Uytterhoeven2009-06-151-0/+5
| | | | | | | | | | Introduce bio_list_peek(), to obtain a pointer to the first bio on the bio_list without actually removing it from the list. This is needed when you want to serialize based on the list being empty or not. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* block: Use accessor functions for queue limitsMartin K. Petersen2009-05-221-1/+1
| | | | | | | | Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: drop request->hard_* and *nr_sectorsTejun Heo2009-05-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct request has had a few different ways to represent some properties of a request. ->hard_* represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least. Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case. rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing. rq->{hard_cur|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing. In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors. * blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho). * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length. * blk_rq_pos() is now guranteed to be always correct. In-block users converted. * blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted. * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used. * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request. [ Impact: API cleanup, single way to represent one property of a request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* loop: use BIO list management functionsAkinobu Mita2009-04-281-1/+1
| | | | | | | | | | Now that the bio list management stuff is generic, convert loop to use bio lists instead of its own private bio list implementation. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: fix bio_kmalloc()Tejun Heo2009-04-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix bio_kmalloc() and its destruction path bio_kmalloc() was broken in two ways. * bvec_alloc_bs() first allocates bvec using kmalloc() and then ignores it and allocates again like non-kmalloc bvecs. * bio_kmalloc_destructor() didn't check for and free bio integrity data. This patch fixes the above problems. kmalloc patch is separated out from bio_alloc_bioset() and allocates the requested number of bvecs as inline bvecs. * bio_alloc_bioset() no longer takes NULL @bs. None other than bio_kmalloc() used it and outside users can't know how it was allocated anyway. * Define and use BIO_POOL_NONE so that pool index check in bvec_free_bs() triggers if inline or kmalloc allocated bvec gets there. * Relocate destructors on top of each allocation function so that how they're used is more clear. Jens Axboe suggested allocating bvecs inline. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: move bio list helpers into bio.hChristoph Hellwig2009-04-151-0/+109
| | | | | | | | | It's used by DM and MD and generally useful, so move the bio list helpers into bio.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: Add flag for telling the IO schedulers NOT to anticipate more IOJens Axboe2009-04-061-8/+11
| | | | | | | | | | | | | | | By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: add private bio_set for bio integrity allocationsMartin K. Petersen2009-03-241-14/+4
| | | | | | | | The integrity bio allocation needs its own bio_set to avoid violating the mempool allocation rules and risking deadlocks. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: Add gfp_mask parameter to bio_integrity_clone()un'ichi Nomura2009-03-141-2/+2
| | | | | | | | | | | | Stricter gfp_mask might be required for clone allocation. For example, request-based dm may clone bio in interrupt context so it has to use GFP_ATOMIC. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix bad definition of BIO_RW_SYNCJens Axboe2009-02-181-2/+0
| | | | | | | | We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before 213d9417fec62ef4c3675621b9364a667954d4dd. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio.h: If they MUST be inlined, then use __always_inlineAlberto Bertogli2009-02-021-2/+4
| | | | | | | | bvec_kmap_irq() and bvec_kunmap_irq() comments say they MUST be inlined, so mark them as __always_inline. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Fix misleading comment in bio.hAlberto Bertogli2009-02-021-2/+2
| | | | | | | | The comment says "remember to add offset!", but the function already adds it. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: add bio_rw_flagged() for testing bio->bi_rwJens Axboe2009-01-301-10/+18
| | | | | | | | | | | The existing functions for checking bio->bi_rw are badly named. So lets mirror what we do for bio->bi_flags testing, use a properly named function so that it's immediately obvious what is being tested. Maintain compatability names for the old macros, eventually we'll get rid of these. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: seperate bio/request unplug and sync bitsJens Axboe2009-01-301-7/+11
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Fix small typo in bio.h's documentationAlberto Bertogli2009-01-301-1/+1
| | | | | Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: Don't verify integrity metadata on read errorMartin K. Petersen2009-01-301-1/+0
| | | | | | | | | If we get an I/O error on a read request there is no point in doing a verify pass on the integrity buffer. Adjust the completion path accordingly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: add support for inlining a number of bio_vecs inside the bioJens Axboe2008-12-291-0/+12
| | | | | | | | | | | | | When we go and allocate a bio for IO, we actually do two allocations. One for the bio itself, and one for the bi_io_vec that holds the actual pages we are interested in. This feature inlines a definable amount of io vecs inside the bio itself, so we eliminate the bio_vec array allocation for IO's up to a certain size. It defaults to 4 vecs, which is typically 16k of IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: allow individual slabs in the bio_setJens Axboe2008-12-291-1/+5
| | | | | | | | | | | | | | | Instead of having a global bio slab cache, add a reference to one in each bio_set that is created. This allows for personalized slabs in each bio_set, so that they can have bios of different sizes. This means we can personalize the bios we return. File systems may want to embed the bio inside another structure, to avoid allocation more items (and stuffing them in ->bi_private) after the get a bio. Or we may want to embed a number of bio_vecs directly at the end of a bio, to avoid doing two allocations to return a bio. This is now possible. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: move the slab pointer inside the bio_setJens Axboe2008-12-291-0/+1
| | | | | | In preparation for adding differently sized bios. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: only mempool back the largest bio_vec slab cacheJens Axboe2008-12-291-1/+2
| | | | | | | | | We only very rarely need the mempool backing, so it makes sense to get rid of all but one of the mempool in a bio_set. So keep the largest bio_vec count mempool so we can always honor the largest allocation, and "upgrade" callers that fail. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: reorder struct bio to remove padding on 64bitRichard Kennedy2008-12-291-1/+2
| | | | | | | | | Remove 8 bytes of padding from struct bio which also removes 16 bytes from struct bio_pair to make it 248 bytes. bio_pair then fits into one fewer cache lines & into a smaller slab. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: Supress Buffer I/O errors when SCSI REQ_QUIET flag setKeith Mannthey2008-12-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the scsi request REQ_QUIET flag to be propagated to the buffer file system layer. The basic ideas is to pass the flag from the scsi request to the bio (block IO) and then to the buffer layer. The buffer layer can then suppress needless printks. This patch declutters the kernel log by removed the 40-50 (per lun) buffer io error messages seen during a boot in my multipath setup . It is a good chance any real errors will be missed in the "noise" it the logs without this patch. During boot I see blocks of messages like " __ratelimit: 211 callbacks suppressed Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242847 Buffer I/O error on device sdm, logical block 1 Buffer I/O error on device sdm, logical block 5242878 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242872 " in my logs. My disk environment is multipath fiber channel using the SCSI_DH_RDAC code and multipathd. This topology includes an "active" and "ghost" path for each lun. IO's to the "ghost" path will never complete and the SCSI layer, via the scsi device handler rdac code, quick returns the IOs to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi layer messages. I am wanting to extend the QUIET behavior to include the buffer file system layer to deal with these errors as well. I have been running this patch for a while now on several boxes without issue. A few runs of bonnie++ show no noticeable difference in performance in my setup. Thanks for John Stultz for the quiet_error finalization. Submitted-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* bio: define __BIOVEC_PHYS_MERGEABLEJeremy Fitzhardinge2008-11-061-1/+5
| | | | | | | | | Define __BIOVEC_PHYS_MERGEABLE as the default implementation of BIOVEC_PHYS_MERGEABLE, so that its available for reuse within an arch-specific definition of BIOVEC_PHYS_MERGEABLE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2008-10-171-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: remove __generic_unplug_device() from exports block: move q->unplug_work initialization blktrace: pass zfcp driver data blktrace: add support for driver data block: fix current kernel-doc warnings block: only call ->request_fn when the queue is not stopped block: simplify string handling in elv_iosched_store() block: fix kernel-doc for blk_alloc_devt() block: fix nr_phys_segments miscalculation bug block: add partition attribute for partition number block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT softirq: Add support for triggering softirq work on softirqs.
| * block: fix nr_phys_segments miscalculation bugFUJITA Tomonori2008-10-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the bug reported by Nikanth Karthikesan <knikanth@suse.de>: http://lkml.org/lkml/2008/10/2/203 The root cause of the bug is that blk_phys_contig_segment miscalculates q->max_segment_size. blk_phys_contig_segment checks: req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size But blk_recalc_rq_segments might expect that req->biotail and the previous bio in the req are supposed be merged into one segment. blk_recalc_rq_segments might also expect that next_req->bio and the next bio in the next_req are supposed be merged into one segment. In such case, we merge two requests that can't be merged here. Later, blk_rq_map_sg gives more segments than it should. We need to keep track of segment size in blk_recalc_rq_segments and use it to see if two requests can be merged. This patch implements it in the similar way that we used to do for hw merging (virtual merging). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | [SCSI] block: separate failfast into multiple bits.Mike Christie2008-10-131-9/+17
|/ | | | | | | | | | | | | | | | | | | | | | Multipath is best at handling transport errors. If it gets a device error then there is not much the multipath layer can do. It will just access the same device but from a different path. This patch breaks up failfast into device, transport and driver errors. The multipath layers (md and dm mutlipath) only ask the lower levels to fast fail transport errors. The user of failfast, read ahead, will ask to fast fail on all errors. Note that blk_noretry_request will return true if any failfast bit is set. This allows drivers that do not support the multipath failfast bits to continue to fail on any failfast error like before. Drivers like scsi that are able to fail fast specific errors can check for the specific fail fast type. In the next patch I will convert scsi. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* block: add some comments around the bio read-write flagsJens Axboe2008-10-091-1/+11
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: mark bio_split_pool staticDenis ChengRq2008-10-091-3/+1
| | | | | | | | | | | Since all bio_split calls refer the same single bio_split_pool, the bio_split function can use bio_split_pool directly instead of the mempool_t parameter; then the mempool_t parameter can be removed from bio_split param list, and bio_split_pool is only referred in fs/bio.c file, can be marked static. Signed-off-by: Denis ChengRq <crquan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>