summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'simplify_PRT' into releaseLen Brown2009-01-094-413/+168
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/acpi/pci_irq.c Note that this merge disables e1d3a90846b40ad3160bf4b648d36c6badad39ac pci, acpi: reroute PCI interrupt to legacy boot interrupt equivalent Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: simplify buffer management for acpi_pci_bind() etc.Len Brown2008-12-302-50/+33
| | | | | | | | | | | | | | | | | | | | use ACPI_ALLOCATE_BUFFER to remove the allocations within acpi_pci_bind(), acpi_pci_unbind() and acpi_pci_bind_root(). While there, delete some unnecessary param inits from those routines. Delete concept of ACPI_PATHNAME_MAX, since this was the last use. Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: add HP copyrightBjorn Helgaas2008-12-301-0/+2
| | | | | | | | | | | | | | Add HP copyright to pci_irq.c. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: whitespace and useless initialization cleanupBjorn Helgaas2008-12-301-19/+9
| | | | | | | | | | | | | | | | This patch makes function declarations consistent throughout the file and removes some unnecessary initializations. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: expand acpi_pci_allocate_irq() and acpi_pci_free_irq() inlineBjorn Helgaas2008-12-301-43/+13
| | | | | | | | | | | | | | | | acpi_pci_allocate_irq() and acpi_pci_free_irq() are trivial and only used once, so just open-code them. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: simplify struct acpi_prt_entryBjorn Helgaas2008-12-301-20/+15
| | | | | | | | | | | | | | | | Remove unused "irq" field, remove unnecessary struct, rename "handle" to "link". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: simplify list of _PRT entriesBjorn Helgaas2008-12-301-48/+12
| | | | | | | | | | | | | | | | | | We don't need a struct containing a count and a list_head; a simple list_head is sufficient. The list iterators handle empty lists fine. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: combine lookup and deriveBjorn Helgaas2008-12-301-44/+11
| | | | | | | | | | | | | | | | This folds acpi_pci_irq_derive() into acpi_pci_irq_lookup() so it can be easily used by both acpi_pci_irq_enable() and acpi_pci_irq_disable(). Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: follow typical PCI INTx swizzling patternBjorn Helgaas2008-12-301-4/+7
| | | | | | | | | | | | | | | | No functional change; this just uses the typical pattern of PCI INTx swizzling done on other architectures. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: use positive logic to simplify codeBjorn Helgaas2008-12-301-17/+16
| | | | | | | | | | | | | | | | | | | | This doesn't change anything functionally; it just changes tests so we test for success instead of failure. This makes the code read more easily and allows us to remove the "!entry" in the while loop condition. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: remove callback from acpi_pci_irq_lookup & acpi_pci_irq_deriveBjorn Helgaas2008-12-301-57/+36
| | | | | | | | | | | | | | | | | | | | | | We currently pass a callback function (either acpi_pci_allocate_irq() or acpi_pci_free_irq()) to acpi_pci_irq_lookup() and acpi_pci_irq_derive(). I think it's simpler to remove the callback and just have the enable/ disable functions make the calls directly. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: tweak _PRT lookup debugBjorn Helgaas2008-12-301-6/+7
| | | | | | | | | | | | | | | | Print one message (either "found" or "not found") for every _PRT search. And add pin information to the INTx swizzling debug. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: lookup _PRT entry by PCI dev and pin, not segment/bus/dev/pinBjorn Helgaas2008-12-301-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to pass around segment, bus, and device independently when we can just pass the pci_dev pointer, which carries all those already. The pci_dev contains an interrupt pin, too, but we still have to pass both the pci_dev and the pin because when we use a bridge to derive an IRQ, we need the pin from the downstream device, not the bridge. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: use 1-based encoding for _PRT quirksBjorn Helgaas2008-12-301-4/+6
| | | | | | | | | | | | | | | | Use the PCI INTx pin encoding (1=INTA, 2=INTB, etc) for _PRT quirks. Then we can simply compare "entry->pin == quirk->pin". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: always use the PCI INTx pin values, not the _PRT onesBjorn Helgaas2008-12-301-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes pci_irq.c to always use PCI INTx pin encodings instead of a mix of PCI and _PRT encodings. The PCI INTx pin numbers from the PCI_INTERRUPT_PIN config register are 0=device doesn't use interrupts, 1=INTA, ..., 4=INTD. But the _PRT table uses 0=INTA, ..., 3=INTD. This patch converts the _PRT encoding to the PCI encoding immediately when we add a _PRT entry to the global list. All the rest of the code can then use the PCI encoding consistently. The point of this is to make the interrupt swizzling look the same as on other architectures, so someday we can unify them. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: add a helper to convert _PRT INTx pin number to nameBjorn Helgaas2008-12-301-10/+15
| | | | | | | | | | | | | | | | This adds a helper function to convert INTx pin numbers from the _PRT (0, 1, 2, 3) to the pin name ('A', 'B', 'C', 'D'). Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: move struct acpi_prt_entry declaration out of public header fileBjorn Helgaas2008-12-302-16/+16
| | | | | | | | | | | | | | | | | | The struct acpi_prt_entry is used only in pci_irq.c, so there's no need for the declaration to be public. This patch moves it into pci_irq.c. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: fix GSI/IRQ naming confusionBjorn Helgaas2008-12-301-10/+10
| | | | | | | | | | | | | | The interrupt numbers from _PRT entries are GSIs, not Linux IRQs. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: ignore _PRT function informationBjorn Helgaas2008-12-301-1/+0
| | | | | | | | | | | | | | | | | | | | _PRT entries don't contain any useful PCI function information (the function part of the PCI address is supposed to be 0xffff), and we don't ever look at it, so this patch just removes the reference to it. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: simplify buffer management for evaluating _PRTBjorn Helgaas2008-12-301-40/+12
| | | | | | | | | | | | | | | | | | | | | | Previously, acpi_pci_irq_add_prt() did all its own buffer management. But now that we have ACPI_ALLOCATE_BUFFER, we no longer need to do that management. And we don't have to call acpi_get_irq_routing_table() twice (once to learn the size of the buffer needed, and again to actually get the table). Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: remove unnecessary null pointer checksBjorn Helgaas2008-12-301-18/+0
| | | | | | | | | | | | | | Better to oops and learn about a bug than to silently cover it up. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * ACPI: PCI: use conventional PCI address formatBjorn Helgaas2008-12-302-11/+11
| | | | | | | | | | | | | | Use the conventional format for PCI addresses (%04x:%02x:%02x.%d). Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
* | Merge branch 'linus' into releaseLen Brown2009-01-098119-315797/+1011444
|\ \
| * \ Merge branch 'for_linus' of ↵Linus Torvalds2009-01-0839-1301/+2271
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
| | * | jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fsJan Kara2009-01-061-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Add error checks so we fail gracefully if getblk() returns NULL (which can also happen on memory allocation failures). Thanks to David Maciejak from Fortinet's FortiGuard Global Security Research Team for reporting this bug. http://bugzilla.kernel.org/show_bug.cgi?id=12370 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> cc: stable@kernel.org
| | * | ext4: Remove "extents" mount optionTheodore Ts'o2009-01-067-64/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | block: Add Kconfig help which notes that ext4 needs CONFIG_LBDTheodore Ts'o2009-01-061-0/+6
| | | | | | | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
| | * | ext4: Make printk's consistently prefixed with "EXT4-fs: "Theodore Ts'o2009-01-062-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, some were "ext4: ", and some were "EXT4: "; change them to be consistent with most ext4 printk's, which is to use "EXT4-fs: ". Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Add sanity checks for the superblock before mounting the filesystemTheodore Ts'o2009-01-061-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids insane superblock configurations that could lead to kernel oops due to null pointer derefences. http://bugzilla.kernel.org/show_bug.cgi?id=12371 Thanks to David Maciejak at Fortinet's FortiGuard Global Security Research Team who discovered this bug independently (but at approximately the same time) as Thiemo Nagel, who submitted the patch. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: Add mount option to set kjournald's I/O priorityTheodore Ts'o2009-01-054-5/+36
| | | | | | | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
| | * | jbd2: Submit writes to the journal using WRITE_SYNCTheodore Ts'o2009-01-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we will be waiting the write of the commit record to the journal to complete in journal_submit_commit_record(), submit it using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | jbd2: Add pid and journal device name to the "kjournald2 starting" messageTheodore Ts'o2009-01-031-2/+3
| | | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Add markers for better debuggabilityTheodore Ts'o2009-01-033-3/+116
| | | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Remove code to create the journal inodeTheodore Ts'o2009-01-064-141/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: provide function to release metadata pages under memory pressureToshiyuki Okajima2009-01-051-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pages in the page cache belonging to ext4 data files are released via the ext4_releasepage() function specified in the ext4 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a release_metadata function which calls jbd2_journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
| | * | ext3: provide function to release metadata pages under memory pressureToshiyuki Okajima2009-01-051-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pages in the page cache belonging to ext3 data files are released via the ext3_releasepage() function specified in the ext3 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a try_to_free_pages() function which calls journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
| | * | add releasepage hooks to block devices which can be used by file systemsTheodore Ts'o2009-01-033-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement blkdev_releasepage() to release the buffer_heads and pages after we release private data belonging to a mounted filesystem. Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelallocAneesh Kumar K.V2009-01-051-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With nodelalloc option we need to update the dirty block counter on block allocation failure. This is needed because we increment the dirty block counter early in the block allocation phase. Without the patch s_dirty_blocks_counter goes wrong so that filesystem's free blocks decreases incorrectly. Tested-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: Init the complete page while building buddy cacheAneesh Kumar K.V2009-01-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to init the complete page during buddy cache init by setting the contents to '1'. Otherwise we can see the following errors after doing an online resize of the filesystem: EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used: Allocating block 1040385 in system zone of 127 group Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: Don't allow new groups to be added during block allocationAneesh Kumar K.V2009-01-052-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we mark the blocks in the buddy cache as allocated, we need to ensure that we don't reinit the buddy cache until the block bitmap is updated. This commit achieves this by holding the group_info alloc_semaphore till ext4_mb_release_context Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: mark the blocks/inode bitmap beyond end of group as usedAneesh Kumar K.V2009-01-053-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to mark the block/inode bitmap beyond the end of the group with '1'. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: Use new buffer_head flag to check uninit group bitmaps initializationAneesh Kumar K.V2009-01-054-6/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For uninit block group, the on-disk bitmap is not initialized. That implies we cannot depend on the uptodate flag on the bitmap buffer_head to find bitmap validity. Use a new buffer_head flag which would be set after we properly initialize the bitmap. This also prevents (re-)initializing the uninit group bitmap every time we call ext4_read_block_bitmap(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()Aneesh Kumar K.V2009-01-051-60/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure we update the inode bitmap and clear EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide whether to initialize the inode bitmap each time it is called. (introduced by commit c806e68f.) ext4_read_inode_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { ext4_init_inode_bitmap(sb, bh, block_group, desc); and ext4_new_inode does if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group), ino, inode_bitmap_bh->b_data)) ...... ... spin_lock(sb_bgl_lock(sbi, group)); gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); i.e., on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a parallel ext4_read_inode_bitmap can zero out the bitmap in between the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449 EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ... EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ... # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71 ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: code cleanupAneesh Kumar K.V2009-01-033-32/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename some variables. We also unlock locks in the reverse order we acquired as a part of cleanup. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Use high 16 bits of the block group descriptor's free counts fieldsAneesh Kumar K.V2009-01-057-62/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Fix race between read_block_bitmap() and mark_diskspace_used()Aneesh Kumar K.V2009-01-051-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure we update the block bitmap and clear EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide whether to initialize the block bitmap each time it is called (introduced by commit c806e68f), and this can race with block allocations in ext4_mb_mark_diskspace_used(). ext4_read_block_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { ext4_init_block_bitmap(sb, bh, block_group, desc); Now on the block allocation side we do mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); .... spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ie on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a parallel ext4_read_block_bitmap can zero out the bitmap in between the above mb_set_bits and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105 EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block .. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | ext4: fix BUG when calling ext4_error with locked block groupAneesh Kumar K.V2009-01-054-64/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: Fix lockdep recursive locking warningAneesh Kumar K.V2008-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4_mb_init_group(), if the filesystem block size is less than PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block groups in a loop. We need to allow for this by using down_write_nested() and passing in the loop index as a lock subclass number. This works because no other code path needs to take multiple alloc_sem's. Note that lockdep will fail for filesystem blocksize smaller than to PAGE_SIZE/16k. (e.g., a 1k filesystem blocksize with a 32k page size, or a 2k filesystem blocksize with a 64k blocksize, etc.) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | ext4: don't use blocks freed but not yet committed in buddy cache initAneesh Kumar K.V2009-01-051-22/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we generate buddy cache (especially during resize) we need to make sure we don't use the blocks freed but not yet comitted. This makes sure we have the right value of free blocks count in the group info and also in the bitmap. This also ensures the ordered mode consistency Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| | * | jbd2: Call journal commit callback without holding j_list_lockAneesh Kumar K.V2008-11-063-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid freeing the transaction in __jbd2_journal_drop_transaction() so the journal commit callback can run without holding j_list_lock, to avoid lock contention on this spinlock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>