linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SCSI] fc_transport: Add an API to allow an LLD to create vports	Andrew Vasquez	2008-10-03	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's already a fc_vport_termintate() call exported by the transport. This patch adds a symmetric call to the API to allow an NPIV-capable LLD to instantiate vports sans user intervention. Additional comments/updates: Re: scsi_fc_transport.txt Add a function prototype for fc_vport_terminate similar to what's done for fc_vport_create Re: fc_vport_create I recommend we pass the channel number in fc_vport_create rather than fixing it at zero. Also, ids->vport_type should be set to FC_PORTTYPE_NPIV prior to calling fc_vport_create. The comment is also meaningless. Added-by and Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
*	[SCSI] lib: add generic helper to print sizes rounded to the correct SI range	James Bottomley	2008-10-03	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability to print sizes in either units of 10^3 (SI) or 2^10 (Binary) units. It rounds up to three significant figures and can be used for either memory or storage capacities. Oh, and I'm fully aware that 64 bits is only 16EiB ... the Zetta and Yotta units are added for future proofing against the day we have 128 bit computers ... [fujita.tomonori@lab.ntt.co.jp: fix missed unsigned long long cast] Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
*	[SCSI] Update the SCSI state model to allow blocking in the created state	James Bottomley	2008-10-03	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Brian King <brking@linux.vnet.ibm.com> reported that fibre channel devices can oops during scanning if their ports block (because the device goes from CREATED -> BLOCK -> RUNNING rather than CREATED -> BLOCK -> CREATED). Fix this by adding a new state: CREATED_BLOCK which can only transition back to CREATED and disallow the CREATED -> BLOCK transition. Now both the created and blocked states that the mid-layer recognises can include CREATED_BLOCK. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
*	[SCSI] add inline functions for recognising created and blocked states	James Bottomley	2008-10-03	1	-0/+11
\| \| \| \| \| \| \|	The created and blocked states are very shortly going to correspond to mixed sdev_state states. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
*	[SCSI] scsi_netlink: Add transport and LLD recieve and event support	James Smart	2008-10-03	1	-1/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds scsi netlink recieve and event support for transport and scsi LLDD's. It is a reimplementation of the patch posted last week by David Somayajulu. http://marc.info/?l=linux-scsi&m=121745486221819&w=2 There are a few things done differently: - Transport support is included - Event delivery is included - The vendor message is now its own unique message type, considered part of the generic "SCSI Transport". - LLDD entry points are now registered rather than included in the scsi_host_template. Background: When I started to implement the event handler via template, I had to either: muck up scsi_add_host and scsi_remove_host; or have the event handler search all possible shosts. Neither was acceptable. Moving to a registration solves this, and also limits the scope of the changes to something that could be backported to a distro without breaking an already-released-distro kabi. However, I admit it isn't as elegant, as the passing of the LLDD host template in the registration and the complexity around dynamic add/remove shows. - The receive path was augmented to require a unique identifier for the LLDD before the message was allowed to be handed off to the driver. Given how quickly very fatal errors occur if there's msg mismatches (which I saw in testing my own tools :), I believe this to be a very good thing. The id plays off the vendor id scheme already introduced for the vendor unique event messages used by FC. Additionally, the id use as the basis of the registration/deregistration. - Send assist functions, for both the transport and LLDDs are included. [fujita.tomonori@lab.ntt.co.jp: fix missing cast] Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
*	mm: tiny-shmem nommu fix	Nick Piggin	2008-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous patch db203d53d474aa068984e409d807628f5841da1b ("mm: tiny-shmem fix lock ordering: mmap_sem vs i_mutex") to fix the lock ordering in tiny-shmem breaks shared anonymous and IPC memory on NOMMU architectures because it was using the expanding truncate to signal ramfs to allocate a physically contiguous RAM backing the inode (otherwise it is unusable for "memory mapping" it to userspace). However do_truncate is what caused the lock ordering error, due to it taking i_mutex. In this case, we can actually just call ramfs directly to allocate memory for the mapping, rather than go via truncate. Acked-by: David Howells <dhowells@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	inotify: fix lock ordering wrt do_page_fault's mmap_sem	Nick Piggin	2008-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fix inotify lock order reversal with mmap_sem due to holding locks over copy_to_user. Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Tested-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2008-10-01	1	-0/+3
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: af_key: Free dumping state on socket close XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep ipv6: NULL pointer dereferrence in tcp_v6_send_ack tcp: Fix NULL dereference in tcp_4_send_ack() sctp: Fix kernel panic while process protocol violation parameter iucv: Fix mismerge again. ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space
\| *	sctp: Fix kernel panic while process protocol violation parameter	Wei Yongjun	2008-09-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since call to function sctp_sf_abort_violation() need paramter 'arg' with 'struct sctp_chunk' type, it will read the chunk type and chunk length from the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen() always with 'struct sctp_paramhdr' type's parameter, it will be passed to sctp_sf_abort_violation(). This may cause kernel panic. sctp_sf_violation_paramlen() \|-- sctp_sf_abort_violation() \|-- sctp_make_abort_violation() This patch fixed this problem. This patch also fix two place which called sctp_sf_violation_paramlen() with wrong paramter type. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'timers-fixes-for-linus' of ↵	Linus Torvalds	2008-09-30	1	-4/+14
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: prevent migration of per CPU hrtimers hrtimer: mark migration state hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers hrtimer: migrate pending list on cpu offline Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
\| *	hrtimer: prevent migration of per CPU hrtimers	Thomas Gleixner	2008-09-29	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: per CPU hrtimers can be migrated from a dead CPU The hrtimer code has no knowledge about per CPU timers, but we need to prevent the migration of such timers and warn when such a timer is active at migration time. Explicitely mark the timers as per CPU and use a more understandable mode descriptor for the interrupts safe unlocked callback mode, which is used by hrtimer_sleeper and the scheduler code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| *	hrtimer: mark migration state	Thomas Gleixner	2008-09-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: during migration active hrtimers can be seen as inactive The migration code removes the hrtimers from the queues of the dead CPU and sets the state temporary to INACTIVE. The enqueue code sets it to ACTIVE/PENDING again. Prevent that the wrong state can be seen by using a separate migration state bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* \|	kgdb, x86_64: fix PS CS SS registers in gdb serial	Jason Wessel	2008-09-26	1	-11/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On x86_64 the gdb serial register structure defines the PS (also known as eflags), CS and SS registers as 4 bytes entities. This patch splits the x86_64 regnames enum into a 32 and 64 version to account for the 32 bit entities in the gdb serial packets. Also the program counter is properly filled in for the sleeping threads. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* \|	kgdb, x86_64: gdb serial has BX and DX reversed	Jason Wessel	2008-09-26	1	-2/+2
\|/ \| \| \| \| \| \|	The BX and DX registers in the gdb serial register packet need to be flipped for gdb to receive the correct data. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
*	Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus	Linus Torvalds	2008-09-24	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \|	* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Fixe the definition of PTRS_PER_PGD [MIPS] au1000: Fix gpio direction
\| *	[MIPS] Fixe the definition of PTRS_PER_PGD	Jack Tan	2008-09-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we use > 4KB's page size the original definition is not consistent with PGDIR_SIZE. For exeample, if we use 16KB page size the PGDIR_SHIFT is (14-2) + 14 = 26, PGDIR_SIZE is 2^26，so the PTRS_PER_PGD should be: 2^32/2^26 = 2^6 but the original definition of PTRS_PER_PGD is 4096 (PGDIR_ORDER = 0). So, this definition needs to be consistent with the PGDIR_SIZE. And the new definition is consistent with the PGD init in pagetable_init(). Signed-off-by: Dajie Tan <jiankemeng@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* \|	MN10300: Move asm-arm/cnt32_to_63.h to include/linux/	David Howells	2008-09-24	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move asm-arm/cnt32_to_63.h to include/linux/ so that MN10300 can make use of it too. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-09-24	2	-2/+8
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix put_data error handling 9p: use an IS_ERR test rather than a NULL test 9p: introduce missing kfree 9p-trans_fd: fix and clean up module init/exit paths 9p-trans_fd: don't do fs segment mangling in p9_fd_poll() 9p-trans_fd: clean up p9_conn_create() 9p-trans_fd: fix trans_fd::p9_conn_destroy() 9p: implement proper trans module refcounting and unregistration
\| *	9p: implement proper trans module refcounting and unregistration	Tejun Heo	2008-09-24	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	9p trans modules aren't refcounted nor were they unregistered properly. Fix it. * Add 9p_trans_module->owner and reference the module on each trans instance creation and put it on destruction. * Protect v9fs_trans_list with a spinlock. This isn't strictly necessary as the list is manipulated only during module loading / unloading but it's a good idea to make the API safe. * Unregister trans modules when the corresponding module is being unloaded. * While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* \|	Merge branch 'timers-fixes-for-linus' of ↵	Linus Torvalds	2008-09-23	3	-0/+5
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: fix build error in !oneshot case x86: c1e_idle: don't mark TSC unstable if CPU has invariant TSC x86: prevent C-states hang on AMD C1E enabled machines clockevents: prevent mode mismatch on cpu online clockevents: check broadcast device not tick device clockevents: prevent stale tick_next_period for onlining CPUs x86: prevent stale state of c1e_mask across CPU offline/online clockevents: prevent cpu online to interfere with nohz
\| * \|	x86: prevent C-states hang on AMD C1E enabled machines	Thomas Gleixner	2008-09-23	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: System hang when AMD C1E machines switch into C2/C3 AMD C1E enabled systems do not work with normal ACPI C-states even if the BIOS is advertising them. Limit the C-states to C1 for the ACPI processor idle code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	x86: prevent stale state of c1e_mask across CPU offline/online	Thomas Gleixner	2008-09-23	1	-0/+2
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: hang which happens across CPU offline/online on AMD C1E systems. When a CPU goes offline then the corresponding bit in the broadcast mask is cleared. For AMD C1E enabled CPUs we do not reenable the broadcast when the CPU comes online again as we do not clear the corresponding bit in the c1e_mask, which keeps track which CPUs have been switched to broadcast already. So on those !$@#& machines we never switch back to broadcasting after a CPU offline/online cycle. Clear the bit when the CPU plays dead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-09-23	1	-4/+4
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fix compiler warnings in pci_get_subsys() PCI: Fix pcie_aspm=force
\| * \|	PCI: fix compiler warnings in pci_get_subsys()	Greg KH	2008-09-16	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pci_get_subsys() changed in 2.6.26 so that the from pointer is modified when the call is being invoked, so fix up the 'const' marking of it that the compiler is complaining about. Reported-by: Rufus & Azrael <rufus-azrael@numericable.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* \| \|	smb.h: do not include linux/time.h in userspace	Kirill A. Shutemov	2008-09-23	1	-0/+2
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	linux/time.h conflicts with time.h from glibc It breaks building smbmount from samba. It's regression introduced by commit 76308da (" smb.h: uses struct timespec but didn't include linux/time.h"). Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-09-19	1	-0/+4
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() RDMA/nes: Fix client side QP destroy IB/mlx4: Fix up fast register page list format mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries
\| * \|	IB/mlx4: Fix up fast register page list format	Vladimir Sokolovsky	2008-09-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Byte swap the addresses in the page list for fast register work requests to big endian to match what the HCA expectx. Also, the addresses must have the "present" bit set so that the HCA knows it can access them. Otherwise the HCA will fault the first time it accesses the memory region. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \| \|	warn: Turn the netdev timeout WARN_ON() into a WARN()	Arjan van de Ven	2008-09-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(), so that the device and driver names are inside the warning message. This helps automated tools like kerneloops.org to collect the data and do statistics, as well as making it more likely that humans cut-n-paste the important message as part of a bugreport. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	Fix PNP build failure, bugzilla #11276	David Miller	2008-09-16	1	-0/+7
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fill fix the following regression list entry: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11276 Subject : build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things Submitter : Randy Dunlap <randy.dunlap@oracle.com> Date : 2008-08-06 17:18 (38 days old) References : http://marc.info/?l=linux-kernel&m=121804329014332&w=4 http://lkml.org/lkml/2008/7/22/353 Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> Patch : http://lkml.org/lkml/2008/7/22/364 with what I believe is a better fix than the one referenced in the regression entry above. These PNP header interfaces try to work in such a way that you can reference some of them even if PNP is not enabled, and the compiler was expected to optimize everything away. Which is mostly fine, except that there was one interface for which there was not provided an inline "NOP" implementation. Once we add that, all of these compile failures cannot handle any more. pnp: Provide NOP inline implementation of pnp_get_resource() when !PNP Fixes kernel bugzilla #11276. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2008-09-13	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] LBA28/LBA48 off-by-one bug in ata.h sata_inic162x: enable LED blinking ata: duplicate variable sparse warning
\| * \|	[libata] LBA28/LBA48 off-by-one bug in ata.h	Taisuke Yamada	2008-09-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I recently bought 3 HGST P7K500-series 500GB SATA drives and had trouble accessing the block right on the LBA28-LBA48 border. Here's how it fails (same for all 3 drives): # dd if=/dev/sdc bs=512 count=1 skip=268435455 > /dev/null dd: reading `/dev/sdc': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.288033 seconds, 0.0 kB/s # dmesg ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: BMDMA stat 0x25 ata1.00: cmd c8/00:08:f8:ff:ff/00:00:00:00:00/ef tag 0 dma 4096 in res 51/04:08:f8:ff:ff/00:00:00:00:00/ef Emask 0x1 (device error) ata1.00: status: { DRDY ERR } ata1.00: error: { ABRT } ata1.00: configured for UDMA/33 ata1: EH complete ... After some investigations, it turned out this seems to be caused by misinterpretation of the ATA specification on LBA28 access. Following part is the code in question: === include/linux/ata.h === static inline int lba_28_ok(u64 block, u32 n_block) { /* check the ending block number / return ((block + n_block - 1) < ((u64)1 << 28)) && (n_block <= 256); } HGST drive (sometimes) fails with LBA28 access of {block = 0xfffffff, n_block = 1}, and this behavior seems to be comformant. Other drives, including other HGST drives are not that strict, through. >From the ATA specification: (http://www.t13.org/Documents/UploadedDocuments/project/d1410r3b-ATA-ATAPI-6.pdf) 8.15.29 Word (61:60): Total number of user addressable sectors This field contains a value that is one greater than the total number of user addressable sectors (see 6.2). The maximum value that shall be placed in this field is 0FFFFFFFh. So the driver shouldn't use the value of 0xfffffff for LBA28 request as this exceeds maximum user addressable sector. The logical maximum value for LBA28 is 0xffffffe. The obvious fix is to cut "- 1" part, and the patch attached just do that. I've been using the patched kernel for about a month now, and the same fix is also floating on the net for some time. So I believe this fix works reliably. Just FYI, many Windows/Intel platform users also seems to be struck by this, and HGST has issued a note pointing to Intel ICH8/9 driver. "28-bit LBA command is being used to access LBAs 29-bits in length" http://www.hitachigst.com/hddt/knowtree.nsf/cffe836ed7c12018862565b000530c74/b531b8bce8745fb78825740f00580e23 Also, BSDs seems to have similar fix included sometime around ~2004, through I have not checked out exact portion of the code. Signed-off-by: Taisuke Yamada <tai@rakugaki.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2008-09-13	1	-1/+1
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: niu: panic on reset netlink: fix overrun in attribute iteration [Bluetooth] Fix regression from using default link policy ath9k: Assign seq# when mac80211 requests this
\| * \| \|	netlink: fix overrun in attribute iteration	Vegard Nossum	2008-09-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kmemcheck reported this: kmemcheck: Caught 16-bit read from uninitialized memory (f6c1ba30) 0500110001508abf050010000500000002017300140000006f72672e66726565 i i i i i i i i i i i i i u u u u u u u u u u u u u u u u u u u ^ Pid: 3462, comm: wpa_supplicant Not tainted (2.6.27-rc3-00054-g6397ab9-dirty #13) EIP: 0060:[<c05de64a>] EFLAGS: 00010296 CPU: 0 EIP is at nla_parse+0x5a/0xf0 EAX: 00000008 EBX: fffffffd ECX: c06f16c0 EDX: 00000005 ESI: 00000010 EDI: f6c1ba30 EBP: f6367c6c ESP: c0a11e88 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: f781cc84 CR3: 3632f000 CR4: 000006d0 DR0: c0ead9bc DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff4ff0 DR7: 00000400 [<c05d4b23>] rtnl_setlink+0x63/0x130 [<c05d5f75>] rtnetlink_rcv_msg+0x165/0x200 [<c05ddf66>] netlink_rcv_skb+0x76/0xa0 [<c05d5dfe>] rtnetlink_rcv+0x1e/0x30 [<c05dda21>] netlink_unicast+0x281/0x290 [<c05ddbe9>] netlink_sendmsg+0x1b9/0x2b0 [<c05beef2>] sock_sendmsg+0xd2/0x100 [<c05bf945>] sys_sendto+0xa5/0xd0 [<c05bf9a6>] sys_send+0x36/0x40 [<c05c03d6>] sys_socketcall+0x1e6/0x2c0 [<c020353b>] sysenter_do_call+0x12/0x3f [<ffffffff>] 0xffffffff This is the line in nla_ok(): /** * nla_ok - check if the netlink attribute fits into the remaining bytes * @nla: netlink attribute * @remaining: number of bytes remaining in attribute stream / static inline int nla_ok(const struct nlattr nla, int remaining) { return remaining >= sizeof(nla) && nla->nla_len >= sizeof(nla) && nla->nla_len <= remaining; } It turns out that remaining can become negative due to alignment in nla_next(). But GCC promotes "remaining" to unsigned in the test against sizeof(*nla) above. Therefore the test succeeds, and the nla_for_each_attr() may access memory outside the received buffer. A short example illustrating this point is here: #include <stdio.h> main(void) { printf("%d\n", -1 >= sizeof(int)); } ...which prints "1". This patch adds a cast in front of the sizeof so that GCC will make a signed comparison and fix the illegal memory dereference. With the patch applied, there is no kmemcheck report. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	memstick: fix MSProHG 8-bit interface mode support	Alex Dubov	2008-09-13	1	-48/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- 8-bit interface mode never worked properly. The only adapter I have which supports the 8b mode (the Jmicron) had some problems with its clock wiring and they discovered it only now. We also discovered that ProHG media is more sensitive to the ordering of initialization commands. - Make the driver fall back to highest supported mode instead of always falling back to serial. The driver will attempt the switch to 8b mode for any new MSPro card, but not all of them support it. Previously, these new cards ended up in serial mode, which is not the best idea (they work fine with 4b, after all). - Edit some macros for better conformance to Sony documentation Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \|	mm: mark the correct zone as full when scanning zonelists	Mel Gorman	2008-09-13	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning zonelists to keep track of where in the zonelist it is. The zoneref that is returned corresponds to the the next zone that is to be scanned, not the current one. It was intended to be treated as an opaque list. When the page allocator is scanning a zonelist, it marks elements in the zonelist corresponding to zones that are temporarily full. As the zonelist is being updated, it uses the cursor here; if (NUMA_BUILD) zlc_mark_zone_full(zonelist, z); This is intended to prevent rescanning in the near future but the zoneref cursor does not correspond to the zone that has been found to be full. This is an easy misunderstanding to make so this patch corrects the problem by changing zoneref cursor to be the current zone being scanned instead of the next one. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \|	include/linux/ioport.h: add missing macro argument for devm_release_* family	Hiroshi DOYU	2008-09-13	1	-2/+2
\| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	akpm: these have no callers at this time, but they shall soon, so let's get them right. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2008-09-11	1	-2/+0
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: disable sysfs parts of the disk command filter
\| * \| \|	block: disable sysfs parts of the disk command filter	Jens Axboe	2008-09-11	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We still have life time issues with the sysfs command filter kobject, so disable it for 2.6.27 release. We can revisit this and make it work properly for 2.6.28, for 2.6.27 release it's too risky. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6	Linus Torvalds	2008-09-11	1	-0/+14
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] fix check of PQ and PDT bits for WLUNs [SCSI] make scsi_check_sense HARDWARE_ERROR return ADD_TO_MLQUEUE on retry [SCSI] scsi_dh: make check_sense return ADD_TO_MLQUEUE [SCSI] zfcp: Remove duplicated unlikely() macros. [SCSI] zfcp: channel cannot be detached due to refcount imbalance [SCSI] zfcp: Fix reference counter for remote ports [SCSI] zfcp: Simplify ccw notify handler [SCSI] zfcp: Correctly query end flag in gpn_ft response [SCSI] zfcp: Fix request queue locking [SCSI] sd: select CRC_T10DIF only when necessary
\| * \| \|	[SCSI] fix check of PQ and PDT bits for WLUNs	James Bottomley	2008-08-29	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For IBM z series certain LUNs can no longer be accessed. This is because kernel version 2.6.19 a check was introduced not to create a generic SCSI device for devices that return PQ=1 and PDT=0x1f. For WLUNs (see SAM-3, p. 41ff) generic SCSI devices should be created unconditionally without looking at the PQ bit, so add a check for WLUNs in with this test. Acked-by: Martin Petermann <martin@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2008-09-09	1	-1/+2
\|\ \ \ \ \| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipv6: Fix OOPS in ip6_dst_lookup_tail(). ipsec: Restore larval states and socket policies in dump [Bluetooth] Reject L2CAP connections on an insecure ACL link [Bluetooth] Enforce correct authentication requirements [Bluetooth] Fix reference counting during ACL config stage
\| * \| \|	Merge branch 'master' of ↵	David S. Miller	2008-09-09	1	-1/+2
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
\| \| * \| \|	[Bluetooth] Reject L2CAP connections on an insecure ACL link	Marcel Holtmann	2008-09-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Security Mode 4 of the Bluetooth 2.1 specification has strict authentication and encryption requirements. It is the initiators job to create a secure ACL link. However in case of malicious devices, the acceptor has to make sure that the ACL is encrypted before allowing any kind of L2CAP connection. The only exception here is the PSM 1 for the service discovery protocol, because that is allowed to run on an insecure ACL link. Previously it was enough to reject a L2CAP connection during the connection setup phase, but with Bluetooth 2.1 it is forbidden to do any L2CAP protocol exchange on an insecure link (except SDP). The new hci_conn_check_link_mode() function can be used to check the integrity of an ACL link. This functions also takes care of the cases where Security Mode 4 is disabled or one of the devices is based on an older specification. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
\| \| * \| \|	[Bluetooth] Enforce correct authentication requirements	Marcel Holtmann	2008-09-09	1	-1/+1
\| \| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the introduction of Security Mode 4 and Simple Pairing from the Bluetooth 2.1 specification it became mandatory that the initiator requires authentication and encryption before any L2CAP channel can be established. The only exception here is PSM 1 for the service discovery protocol (SDP). It is meant to be used without any encryption since it contains only public information. This is how Bluetooth 2.0 and before handle connections on PSM 1. For Bluetooth 2.1 devices the pairing procedure differentiates between no bonding, general bonding and dedicated bonding. The L2CAP layer wrongly uses always general bonding when creating new connections, but it should not do this for SDP connections. In this case the authentication requirement should be no bonding and the just-works model should be used, but in case of non-SDP connection it is required to use general bonding. If the new connection requires man-in-the-middle (MITM) protection, it also first wrongly creates an unauthenticated link key and then later on requests an upgrade to an authenticated link key to provide full MITM protection. With Simple Pairing the link key generation is an expensive operation (compared to Bluetooth 2.0 and before) and doing this twice during a connection setup causes a noticeable delay when establishing a new connection. This should be avoided to not regress from the expected Bluetooth 2.0 connection times. The authentication requirements are known up-front and so enforce them. To fulfill these requirements the hci_connect() function has been extended with an authentication requirement parameter that will be stored inside the connection information and can be retrieved by userspace at any time. This allows the correct IO capabilities exchange and results in the expected behavior. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* \| / /	lib: Correct printk %pF to work on all architectures	James Bottomley	2008-09-09	2	-0/+11
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer formats" in commit 0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2. However, the current way its coded doesn't work on parisc64. For two reasons: 1) parisc isn't in the #ifdef and 2) parisc has a different format for function descriptors Make dereference_function_descriptor() more accommodating by allowing architecture overrides. I put the three overrides (for parisc64, ppc64 and ia64) in arch/kernel/module.c because that's where the kernel internal linker which knows how to deal with function descriptors sits. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Tony Luck <tony.luck@intel.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	Merge branch 'sched-fixes-for-linus' of ↵	Linus Torvalds	2008-09-08	1	-1/+1
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: arch_reinit_sched_domains() must destroy domains to force rebuild sched, cpuset: rework sched domains and CPU hotplug handling (v4)
\| * \ \	Merge branch 'sched/cpuset' into sched/urgent	Ingo Molnar	2008-09-06	1	-1/+1
\| \|\ \ \
\| \| * \| \|	sched: arch_reinit_sched_domains() must destroy domains to force rebuild	Max Krasnyansky	2008-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	What I realized recently is that calling rebuild_sched_domains() in arch_reinit_sched_domains() by itself is not enough when cpusets are enabled. partition_sched_domains() code is trying to avoid unnecessary domain rebuilds and will not actually rebuild anything if new domain masks match the old ones. What this means is that doing echo 1 > /sys/devices/system/cpu/sched_mc_power_savings on a system with cpusets enabled will not take affect untill something changes in the cpuset setup (ie new sets created or deleted). This patch fixes restore correct behaviour where domains must be rebuilt in order to enable MC powersaving flags. Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS. Also tested on dual-core Core2 laptop. Lockdep is happy and things are working as expected. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2008-09-08	1	-0/+3
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bridge: don't allow setting hello time to zero netns : fix kernel panic in timewait socket destruction pkt_sched: Fix qdisc state in net_tx_action() netfilter: nf_conntrack_irc: make sure string is terminated before calling simple_strtoul netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixlet netfilter: nf_conntrack_gre: more locking around keymap list netfilter: nf_conntrack_sip: de-static helper pointers
\| * \| \| \| \|	netns : fix kernel panic in timewait socket destruction	Daniel Lezcano	2008-09-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	How to reproduce ? - create a network namespace - use tcp protocol and get timewait socket - exit the network namespace - after a moment (when the timewait socket is destroyed), the kernel panics. # BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 PGD 119985067 PUD 11c5c0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3 RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30 RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00 RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200 R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8800bff9e000, task ffff88011ff76690) Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a 0000000000000008 0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7 ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108 Call Trace: <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193 [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28 [<ffffffff8200e611>] ? do_softirq+0x2c/0x68 [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9 [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7 65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0 48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00 RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP <ffff88011ff7fed0> CR2: 0000000000000007 This patch provides a function to purge all timewait sockets related to a network namespace. The timewait sockets life cycle is not tied with the network namespace, that means the timewait sockets stay alive while the network namespace dies. The timewait sockets are for avoiding to receive a duplicate packet from the network, if the network namespace is freed, the network stack is removed, so no chance to receive any packets from the outside world. Furthermore, having a pending destruction timer on these sockets with a network namespace freed is not safe and will lead to an oops if the timer callback which try to access data belonging to the namespace like for example in: inet_twdr_do_twkill_work -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED); Purging the timewait sockets at the network namespace destruction will: 1) speed up memory freeing for the namespace 2) fix kernel panic on asynchronous timewait destruction Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>