summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] Swapless page migration: add R/W migration entriesChristoph Lameter2006-06-237-32/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement read/write migration ptes We take the upper two swapfiles for the two types of migration ptes and define a series of macros in swapops.h. The VM is modified to handle the migration entries. migration entries can only be encountered when the page they are pointing to is locked. This limits the number of places one has to fix. We also check in copy_pte_range and in mprotect_pte_range() for migration ptes. We check for migration ptes in do_swap_cache and call a function that will then wait on the page lock. This allows us to effectively stop all accesses to apge. Migration entries are created by try_to_unmap if called for migration and removed by local functions in migrate.c From: Hugh Dickins <hugh@veritas.com> Several times while testing swapless page migration (I've no NUMA, just hacking it up to migrate recklessly while running load), I've hit the BUG_ON(!PageLocked(p)) in migration_entry_to_page. This comes from an orphaned migration entry, unrelated to the current correctly locked migration, but hit by remove_anon_migration_ptes as it checks an address in each vma of the anon_vma list. Such an orphan may be left behind if an earlier migration raced with fork: copy_one_pte can duplicate a migration entry from parent to child, after remove_anon_migration_ptes has checked the child vma, but before it has removed it from the parent vma. (If the process were later to fault on this orphaned entry, it would hit the same BUG from migration_entry_wait.) This could be fixed by locking anon_vma in copy_one_pte, but we'd rather not. There's no such problem with file pages, because vma_prio_tree_add adds child vma after parent vma, and the page table locking at each end is enough to serialize. Follow that example with anon_vma: add new vmas to the tail instead of the head. (There's no corresponding problem when inserting migration entries, because a missed pte will leave the page count and mapcount high, which is allowed for. And there's no corresponding problem when migrating via swap, because a leftover swap entry will be correctly faulted. But the swapless method has no refcounting of its entries.) From: Ingo Molnar <mingo@elte.hu> pte_unmap_unlock() takes the pte pointer as an argument. From: Hugh Dickins <hugh@veritas.com> Several times while testing swapless page migration, gcc has tried to exec a pointer instead of a string: smells like COW mappings are not being properly write-protected on fork. The protection in copy_one_pte looks very convincing, until at last you realize that the second arg to make_migration_entry is a boolean "write", and SWP_MIGRATION_READ is 30. Anyway, it's better done like in change_pte_range, using is_write_migration_entry and make_migration_entry_read. From: Hugh Dickins <hugh@veritas.com> Remove unnecessary obfuscation from sys_swapon's range check on swap type, which blew up causing memory corruption once swapless migration made MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> From: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: move fallback handling into special functionChristoph Lameter2006-06-231-51/+39
| | | | | | | | | | Move the fallback code into a new fallback function and make the function behave like any other migration function. This requires retaking the lock if pageout() drops it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: pass "mapping" to migration functionsChristoph Lameter2006-06-233-40/+42
| | | | | | | | | | | | | | | | | Change handling of address spaces. Pass a pointer to the address space in which the page is migrated to all migration function. This avoids repeatedly having to retrieve the address space pointer from the page and checking it for validity. The old page mapping will change once migration has gone to a certain step, so it is less confusing to have the pointer always available. Move the setting of the mapping and index for the new page into migrate_pages(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: extract try_to_unmap from migration functionsChristoph Lameter2006-06-231-45/+31
| | | | | | | | | | | | | | | | | | Extract try_to_unmap and rename remove_references -> move_mapping try_to_unmap() may significantly change the page state by for example setting the dirty bit. It is therefore best to unmap in migrate_pages() before calling any migration functions. migrate_page_remove_references() will then only move the new page in place of the old page in the mapping. Rename the function to migrate_page_move_mapping(). This allows us to get rid of the special unmapping for the fallback path. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: drop nr_refs in remove_references()Christoph Lameter2006-06-231-10/+11
| | | | | | | | | | | | | | | | | | | Drop nr_refs parameter from migrate_page_remove_references() The nr_refs parameter is not really useful since the number of remaining references is always 1 for anonymous pages without a mapping 2 for pages with a mapping 3 for pages with a mapping and PagePrivate set. Remove the early check for the number of references since we are checking page_mapcount() earlier. Ultimately only the refcount matters after the tree_lock has been obtained. Signed-off-by: Christoph Lameter <clameter@sgi.coim> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: remove useless definitionsChristoph Lameter2006-06-232-6/+2
| | | | | | | | | | | Remove the export for migrate_page_remove_references() and migrate_page_copy() that are unlikely to be used directly by filesystems implementing migration. The export was useful when buffer_migrate_page() lived in fs/buffer.c but it has now been moved to migrate.c in the migration reorg. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: group functionsChristoph Lameter2006-06-231-70/+72
| | | | | | | | | Reorder functions in migrate.c. Group all migration functions for struct address_space_operations together. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: rename "ignrefs" to "migration"Christoph Lameter2006-06-231-9/+9
| | | | | | | | migrate is a better name since it is only used by page migration. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] writeback: fix range handlingOGAWA Hirofumi2006-06-239-31/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] buglet in radix_tree_tag_setPeter Zijlstra2006-06-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | The comment states: 'Setting a tag on a not-present item is a BUG.' Hence if 'index' is larger than the maxindex; the item _cannot_ be presen; it should also be a BUG. Also, this allows the following statement (assume a fresh tree): radix_tree_tag_set(root, 16, 1); to fail silently, but when preceded by: radix_tree_insert(root, 32, item); it would BUG, because the height has been extended by the insert. In neither case was 16 present. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: redzone double-free detectionPekka Enberg2006-06-231-9/+23
| | | | | | | | | | | | | | | | At present our slab debugging tells us that it detected a double-free or corruption - it does not distinguish between them. Sometimes it's useful to be able to differentiate between these two types of information. Add double-free detection to redzone verification when freeing an object. As explained by Manfred, when we are freeing an object, both redzones should be RED_ACTIVE. However, if both are RED_INACTIVE, we are trying to free an object that was already free'd. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] likely cleanup: remove unlikely in sys_mprotect()Hua Zhong2006-06-231-2/+1
| | | | | | | | | With likely/unlikely profiling on my not-so-busy-typical-developmentsystem there are 5k misses vs 2k hits. So I guess we should remove the unlikely. Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] radix-tree: smallNick Piggin2006-06-231-1/+1
| | | | | | | | | | Reduce radix tree node memory usage by about a factor of 4 for small files (< 64K). There are pointer traversal and memory usage costs for large files with dense pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] radix-tree: direct dataNick Piggin2006-06-232-83/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ability to have height 0 radix trees (a direct pointer to the data item rather than going through a full node->slot) quietly disappeared with old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd. On 64-bit machines this causes nearly 600 bytes to be used for every <= 4K file in pagecache. Re-introduce this feature, root tags stored in spare ->gfp_mask bits. Simplify radix_tree_delete's complex tag clearing arrangement (which would become even more complex) by just falling back to tag clearing functions (the pagecache radix-tree never uses this path anyway, so the icache savings will mean it's actually a speedup). On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in pagecache. Pagecache lookup, insertion, and removal speed for small files will also be improved. This makes RCU radix tree harder, but it's worth it. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] change gen_pool allocator to not touch managed memoryDean Nelson2006-06-234-261/+252
| | | | | | | | | | | | | | | | | | | | | | | Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme instead of the buddy scheme. The purpose of this change is to eliminate the touching of the actual memory being allocated. Since the change modifies the interface, a change to the uncached allocator (arch/ia64/kernel/uncached.c) is also required. Both Andrey Volkov and Jes Sorenson have expressed a desire that the gen_pool allocator not write to the memory being managed. See the following: http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2 Signed-off-by: Dean Nelson <dcn@sgi.com> Cc: Andrey Volkov <avolkov@varma-el.com> Acked-by: Jes Sorensen <jes@trained-monkey.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: introduce remap_vmalloc_range()Nick Piggin2006-06-232-2/+128
| | | | | | | | | Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers can have a nice interface for remapping vmalloc memory. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Unify pxm_to_node() and node_to_pxm()Yasunori Goto2006-06-2312-71/+102
| | | | | | | | | | | | | Consolidate the various arch-specific implementations of pxm_to_node() and node_to_pxm() into a single generic version. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: rework memory shrinkerRafael J. Wysocki2006-06-232-57/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the swsusp's memory shrinker in the following way: - Simplify balance_pgdat() by removing all of the swsusp-related code from it. - Make shrink_all_memory() use shrink_slab() and a new function shrink_all_zones() which calls shrink_active_list() and shrink_inactive_list() directly for each zone in a way that's optimized for suspend. In shrink_all_memory() we try to free exactly as many pages as the caller asks for, preferably in one shot, starting from easier targets.  If slab caches are huge, they are most likely to have enough pages to reclaim.  The inactive lists are next (the zones with more inactive pages go first) etc. Each time shrink_all_memory() attempts to shrink the active and inactive lists for each zone in 5 passes.  In the first pass, only the inactive lists are taken into consideration.  In the next two passes the active lists are also shrunk, but mapped pages are not reclaimed.  In the last two passes the active and inactive lists are shrunk and mapped pages are reclaimed as well. The aim of this is to alter the reclaim logic to choose the best pages to keep on resume and improve the responsiveness of the resumed system. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Con Kolivas <kernel@kolivas.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: stop using list_for_eachChristoph Hellwig2006-06-231-27/+11
| | | | | | | | Use the _entry variant everywhere to clean the code up a tiny bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: clean up kmem_getpagesChristoph Hellwig2006-06-231-16/+14
| | | | | | | | | | | | | | The last ifdef addition hit the ugliness treshold on this functions, so: - rename the variable i to nr_pages so it's somewhat descriptive - remove the addr variable and do the page_address call at the very end - instead of ifdef'ing the whole alloc_pages_node call just make the __GFP_COMP addition to flags conditional - rewrite the __GFP_COMP comment to make sense Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] tightening hugetlb strict accountingChen, Kenneth W2006-06-233-138/+173
| | | | | | | | | | | | | | | | | | | | | Current hugetlb strict accounting for shared mapping always assume mapping starts at zero file offset and reserves pages between zero and size of the file. This assumption often reserves (or lock down) a lot more pages then necessary if application maps at none zero file offset. libhugetlbfs is one example that requires proper reservation on shared mapping starts at none zero offset. This patch extends the reservation and hugetlb strict accounting to support any arbitrary pair of (offset, len), resulting a much more robust and accurate scheme. More importantly, it won't lock down any hugetlb pages outside file mapping. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] reserve space for swap labelAndreas Dilger2006-06-231-6/+8
| | | | | | | | | | | | | | | | | Reserve space in the swap disk header for a LABEL and UUID to be specified. This has been possible with util-linux-2.12b (via e2fsprogs 1.36 libblkid), and is used by at least FC3 and later. The kernel doesn't really care about this, but the space shouldn't accidentally be used by something else either. Also make the on-disk structures be fixed-size types, instead of "int", though I don't know of any architecture in use where an "int" isn't the same size as a "__u32" (all current kernel arches have it as "unsigned int"). Signed-off-by: Andreas Dilger <adilger@shaw.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: fix typos in comments in mm/oom_kill.cDave Peterson2006-06-231-3/+3
| | | | | | | | This fixes a few typos in the comments in mm/oom_kill.c. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] support for panic at OOMKAMEZAWA Hiroyuki2006-06-234-0/+26
| | | | | | | | | | | | | | | | | This patch adds panic_on_oom sysctl under sys.vm. When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in the same way as it does today. Of course, the default value is 0 and only root can modifies it. In general, oom_killer works well and kill rogue processes. So the whole system can survive. But there are environments where panic is preferable rather than kill some processes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] squash duplicate page_to_pfn and pfn_to_pageAndy Whitcroft2006-06-232-42/+17
| | | | | | | | | | | | We have architectures where the size of page_to_pfn and pfn_to_page are significant enough to overall image size that they wish to push them out of line. However, in the process we have grown a second copy of the implementation of each of these routines for each memory model. Share the implmentation exposing it either inline or out-of-line as required. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: update zonelistsYasunori Goto2006-06-232-5/+33
| | | | | | | | | | | | | | | | | | | | | | In current code, zonelist is considered to be build once, no modification. But MemoryHotplug can add new zone/pgdat. It must be updated. This patch modifies build_all_zonelists(). By this, build_all_zonelist() can reconfig pgdat's zonelists. To update them safety, this patch use stop_machine_run(). Other cpus don't touch among updating them by using it. In old version (V2 of node hotadd), kernel updated them after zone initialization. But present_page of its new zone is still 0, because online_page() is not called yet at this time. Build_zonelists() checks present_pages to find present zone. It was too early. So, I changed it after online_pages(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: wait_table ↵Yasunori Goto2006-06-231-6/+53
| | | | | | | | | | | | | | | initialization Wait_table is initialized according to zone size at boot time. But, we cannot know the maixmum zone size when memory hotplug is enabled. It can be changed.... And resizing of wait_table is hard. So kernel allocate and initialzie wait_table as its maximum size. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: add return ↵Yasunori Goto2006-06-233-5/+24
| | | | | | | | | | | | | | | | code for init_current_empty_zone When add_zone() is called against empty zone (not populated zone), we have to initialize the zone which didn't initialize at boot time. But, init_currently_empty_zone() may fail due to allocation of wait table. So, this patch is to catch its error code. Changes against wait_table is in the next patch. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: change to ↵Yasunori Goto2006-06-232-11/+11
| | | | | | | | | | | | | meminit for build_zonelist Change definitions of some functions and data from __init to __meminit. These functions and data can be used after bootup by this patch to be used for hot-add codes. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: change name ↵Yasunori Goto2006-06-232-7/+9
| | | | | | | | | | of wait_table_size() This is just to rename from wait_table_size() to wait_table_hash_nr_entries(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] migration: remove unnecessary PageSwapCache checksChristoph Lameter2006-06-232-15/+0
| | | | | | | | | | Remove two unnecessary PageSwapCache checks. The page refcount is raised and therefore page migration cannot occur in both functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: page mapping cleanupPekka Enberg2006-06-231-11/+16
| | | | | | | | | | | Clean up slab allocator page mapping a bit. The memory allocated for a slab is physically contiguous so it is okay to assume struct pages are too so kill the long-standing comment. Furthermore, rename set_slab_attr to slab_map_pages and add a comment explaining why its needed. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] PG_uncached is ia64 onlyAndrew Morton2006-06-231-1/+13
| | | | | | | | | | | | | | As Nick points out, only ia64 uses PG_uncached. So we can push it up into the higher bits of the lower half of page->flags and make room for another flag on 32-bit machines. Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@sgi.com> Cc: Jes Sorensen <jes@trained-monkey.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract cache_free_alien from __cache_freePekka Enberg2006-06-231-35/+42
| | | | | | | | | | Move alien object freeing to cache_free_alien() to reduce #ifdef clutter in __cache_free(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Page Migration: Make do_swap_page redo the faultChristoph Lameter2006-06-231-7/+0
| | | | | | | | | | | | | | | | It is better to redo the complete fault if do_swap_page() finds that the page is not in PageSwapCache() because the page migration code may have replaced the swap pte already with a pte pointing to valid memory. do_swap_page() may interpret an invalid swap entry without this patch because we do not reload the pte if we are looping back. The page migration code may already have reused the swap entry referenced by our local swp_entry. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] zone handle unaligned zone boundariesAndy Whitcroft2006-06-232-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buddy allocator has a requirement that boundaries between contigious zones occur aligned with the the MAX_ORDER ranges. Where they do not we will incorrectly merge pages cross zone boundaries. This can lead to pages from the wrong zone being handed out. Originally the buddy allocator would check that buddies were in the same zone by referencing the zone start and end page frame numbers. This was removed as it became very expensive and the buddy allocator already made the assumption that zones boundaries were aligned. It is clear that not all configurations and architectures are honouring this alignment requirement. Therefore it seems safest to reintroduce support for non-aligned zone boundaries. This patch introduces a new check when considering a page a buddy it compares the zone_table index for the two pages and refuses to merge the pages where they do not match. The zone_table index is unique for each node/zone combination when FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size). Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] for_each_possible_cpu: xfsKAMEZAWA Hiroyuki2006-06-232-3/+3
| | | | | | | | | | | | | | | | | | for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. in xfs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the resultsDavid Howells2006-06-231-1/+2
| | | | | | | | | | Enable XFS to limit the statfs() results to the project quota covering the dentry used as a base for call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS: Permit filesystem to perform statfs with a known root dentryDavid Howells2006-06-2360-144/+175
| | | | | | | | | | | | | | | | | | | | | Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells2006-06-2383-431/+482
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix cdrom being confused on using kdumpRachita Kothiyal2006-06-231-1/+4
| | | | | | | | | | | | | | | | | | I have seen the cdrom drive appearing confused on using kdump on certain x86_64 systems. During the booting up of the second kernel, the following message would keep flooding the console, and the booting would not proceed any further. hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) In this patch, whenever we are hitting a confused state in the interrupt handler with the DRQ set, we end the request and return ide_stopped. Using this I dont see the status error. Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds2006-06-222-8/+13
|\ | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: [PATCH] Driver core: fix locking issues with the devices that are attached to classes [PATCH] USB: get USB suspend to work again
| * [PATCH] Driver core: fix locking issues with the devices that are attached ↵Greg Kroah-Hartman2006-06-221-8/+11
| | | | | | | | | | | | | | | | to classes Doh, that was foolish... Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * [PATCH] USB: get USB suspend to work againGreg Kroah-Hartman2006-06-221-0/+2
| | | | | | | | | | | | | | | | | | | | Yeah, it's a hack, but it is only temporary until Alan's patches reworking this area make it in. We really should not care what devices below us are doing, especially when we do not really know what type of devices they are. This patch relies on the fact that the endpoint devices do not have a driver assigned to us. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-mmcLinus Torvalds2006-06-224-83/+74
|\ \ | | | | | | | | | | | | | | | * 'devel' of master.kernel.org:/home/rmk/linux-2.6-mmc: [ARM] 3565/1: AT91RM9200 MMC update [MMC] Convert all hosts except mmci to use data->blksz
| * | [ARM] 3565/1: AT91RM9200 MMC updateAndrew Victor2006-06-191-74/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Andrew Victor This patch includes code cleanups and minor fixes to the AT91RM9200 MMC driver. 1. Replace calls to DBG() with pr_debug(). 2. 'host' can never be null, so don't bother checking for that case. 3. Remove SA_SAMPLE_RANDOM from request_irq(). [Patch from Matt Mackall] 4. clk_get() doesn't return NULL on error - need to test returned value with IS_ERR(). 5. Free resources if clk_get() or request_irq() fails. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | [MMC] Convert all hosts except mmci to use data->blkszRussell King2006-06-194-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MMC specification allows non-power of two block sizes. As such, we should not pass the log2 block size to host drivers, but instead pass the byte size. However, ARM MMCI can only work with log2 block size, so continue to pass both the log2 block size and byte block size. This means that for the moment, the byte block size must remain a power of two, but this is the first stage of removing this restriction for other hosts. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2006-06-2247-497/+1946
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits) [ARM] 3629/1: S3C24XX: fix missing bracket in regs-dsc.h [ARM] 3537/1: Rework DMA-bounce locking for finer granularity [ARM] 3601/1: i.MX/MX1 DMA error handling for signaled channels only [ARM] 3597/1: ixp4xx/nslu2: Board support for new LED subsystem [ARM] 3595/1: ixp4xx/nas100d: Board support for new LED subsystem [ARM] 3626/1: ARM EABI: fix syscall restarting [ARM] 3628/1: S3C24XX: add get_rate call to struct clk [ARM] 3627/1: S3C24XX: split s3c2410 clocks from core clocks [ARM] 3613/1: S3C2410: Add sysdev and sysclass [ARM] 3624/1: Report true modem control line states [ARM] 3620/2: ixp23xx: add uengine loader support [ARM] 3618/1: add defconfig for logicpd pxa270 card engine [ARM] 3617/1: ep93xx: fix slightly incorrect timer tick rate [ARM] 3616/1: fix timer handler wrap logic for a number of platforms [ARM] 3615/1: ixp23xx: use platform devices for physmap flash [ARM] 3614/1: ep93xx: use platform devices for physmap flash [ARM] 3621/1: fix compilation breakage for pnx4008 [ARM] 3623/1: pnx4008: move GPIO-related defines to gpio.h [ARM] 3622/1: pnx4008: remove clk_use/clk_unuse [ARM] Enable VFP to be built when non-VFP capable CPUs are selected ...
| * | | [ARM] 3629/1: S3C24XX: fix missing bracket in regs-dsc.hBen Dooks2006-06-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Ben Dooks Fix missing bracket in include/asm-arm/arch-s3c2410/regs-dsc.h Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3537/1: Rework DMA-bounce locking for finer granularityKevin Hilman2006-06-221-46/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Kevin Hilman This time with IRQ versions of locks. Rework also enables compatability with realtime-preemption patch. With the current locking via interrupt disabling, under RT, potentially sleeping functions can be called with interrupts disabled. Signed-off-by: Kevin Hilman <khilman@mvista.com> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>