linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	initramfs: Fix initramfs size for 32-bit arches	Geert Uytterhoeven	2010-10-31	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ffe8018c3424 ("initramfs: fix initramfs size calculation") broke 32-bit big-endian arches like (on ARAnyM): VFS: Cannot open root device "hda1" or unknown-block(3,1) Please append a correct "root=" boot option; here are the available partitions: fe80 1059408 nfhd8 (driver?) fe81 921600 nfhd8p1 00000000-0000-0000-0000-000000000nfhd8p1 fe82 137807 nfhd8p2 00000000-0000-0000-0000-000000000nfhd8p2 0200 3280 fd0 (driver?) 0201 3280 fd1 (driver?) 0300 1059408 hda driver: ide-gd 0301 921600 hda1 00000000-0000-0000-0000-000000000hda1 0302 137807 hda2 00000000-0000-0000-0000-000000000hda2 Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(3,1) As pointed out by Kerstin Jonsson <kerstin.jonsson@ericsson.com>, this is due to CONFIG_32BIT not being defined, so the initramfs size field is done as a 64-bit quad. On little-endian (like x86) this doesn matter, but on a big-endian machine the 32-bit reads will see the (zero) high bits. Only mips, s390, and score set CONFIG_32BIT for 32-bit builds, so fix it for all other 32-bit arches by inverting the logic and testing for CONFIG_64BIT, which should be defined on all 64-bit arches. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> [ I think we should just make it "u64" on all architectures and get rid of the whole #ifdef CONFIG_xxBIT - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-10-30	15	-67/+128
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: isdn: mISDN: socket: fix information leak to userland netdev: can: Change mail address of Hans J. Koch pcnet_cs: add new_id net: Truncate recvfrom and sendto length to INT_MAX. RDS: Let rds_message_alloc_sgs() return NULL RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace RDS: Clean up error handling in rds_cmsg_rdma_args RDS: Return -EINVAL if rds_rdma_pages returns an error net: fix rds_iovec page count overflow can: pch_can: fix section mismatch warning by using a whitelisted name can: pch_can: fix sparse warning netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe ip_gre: fix fallback tunnel setup vmxnet: trivial annotation of protocol constant vmxnet3: remove unnecessary byteswapping in BAR writing macros ipv6/udp: report SndbufErrors and RcvbufErrors phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr
\| *	isdn: mISDN: socket: fix information leak to userland	Kulikov Vasiliy	2010-10-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Structure mISDN_devinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netdev: can: Change mail address of Hans J. Koch	Hans J. Koch	2010-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My old mail address doesn't exist anymore. This changes all occurrences to my new address. Signed-off-by: Hans J. Koch <hjk@hansjkoch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	pcnet_cs: add new_id	Ken Kawasaki	2010-10-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pcnet_cs: add new_id: "corega Ether CF-TD" 10Base-T PCMCIA card. Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Truncate recvfrom and sendto length to INT_MAX.	Linus Torvalds	2010-10-30	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	RDS: Let rds_message_alloc_sgs() return NULL	Andy Grover	2010-10-30	3	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even with the previous fix, we still are reading the iovecs once to determine SGs needed, and then again later on. Preallocating space for sg lists as part of rds_message seemed like a good idea but it might be better to not do this. While working to redo that code, this patch attempts to protect against userspace rewriting the rds_iovec array between the first and second accesses. The consequences of this would be either a too-small or too-large sg list array. Too large is not an issue. This patch changes all callers of message_alloc_sgs to handle running out of preallocated sgs, and fail gracefully. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace	Andy Grover	2010-10-30	1	-39/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change rds_rdma_pages to take a passed-in rds_iovec array instead of doing copy_from_user itself. Change rds_cmsg_rdma_args to copy rds_iovec array once only. This eliminates the possibility of userspace changing it after our sanity checks. Implement stack-based storage for small numbers of iovecs, based on net/socket.c, to save an alloc in the extremely common case. Although this patch reduces iovec copies in cmsg_rdma_args to 1, we still do another one in rds_rdma_extra_size. Getting rid of that one will be trickier, so it'll be a separate patch. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	RDS: Clean up error handling in rds_cmsg_rdma_args	Andy Grover	2010-10-30	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't need to set ret = 0 at the end -- it's initialized to 0. Also, don't increment s_send_rdma stat if we're exiting with an error. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	RDS: Return -EINVAL if rds_rdma_pages returns an error	Andy Grover	2010-10-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rds_cmsg_rdma_args would still return success even if rds_rdma_pages returned an error (or overflowed). Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fix rds_iovec page count overflow	Linus Torvalds	2010-10-30	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported by Thomas Pollet, the rdma page counting can overflow. We get the rdma sizes in 64-bit unsigned entities, but then limit it to UINT_MAX bytes and shift them down to pages (so with a possible "+1" for an unaligned address). So each individual page count fits comfortably in an 'unsigned int' (not even close to overflowing into signed), but as they are added up, they might end up resulting in a signed return value. Which would be wrong. Catch the case of tot_pages turning negative, and return the appropriate error code. Reported-by: Thomas Pollet <thomas.pollet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	can: pch_can: fix section mismatch warning by using a whitelisted name	Marc Kleine-Budde	2010-10-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following section mismatch warning: WARNING: drivers/net/can/pch_can.o(.data+0x18): Section mismatch in reference from the variable pch_can_pcidev to the variable .devinit.rodata:pch_pci_tbl The variable pch_can_pcidev references the variable __devinitconst pch_pci_tbl This is actually a false positive which is fixed by giving the offending variable a whitelisted name, it's renamed to "pch_can_pci_driver". This makes sense because the variable is of the type "struct pci_driver". Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	can: pch_can: fix sparse warning	Marc Kleine-Budde	2010-10-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following sparse warning: drivers/net/can/pch_can.c:231:26: warning: incorrect type in argument 1 (different address spaces) drivers/net/can/pch_can.c:231:26: expected unsigned int [usertype] addr drivers/net/can/pch_can.c:231:26: got unsigned int [noderef] <asn:2><noident> Let pch_can_bit_{set,clear} first parameter be a void __iomem pointer. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe	Denis Kirjanov	2010-10-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should not stop the egress queue during probe because it is wrong. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ip_gre: fix fallback tunnel setup	Eric Dumazet	2010-10-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before making the fallback tunnel visible to lookups, we should make sure it is completely setup, once ipgre_tunnel_init() had been called and tstats per_cpu pointer allocated. move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from ipgre_fb_tunnel_init() to ipgre_init_net() Based on a patch from Pavel Emelyanov Reported-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	vmxnet: trivial annotation of protocol constant	Harvey Harrison	2010-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Noticed by sparse: drivers/net/vmxnet3/vmxnet3_drv.c:876:38: warning: cast from restricted __be16 drivers/net/vmxnet3/vmxnet3_drv.c:876:38: warning: cast from restricted __be16 drivers/net/vmxnet3/vmxnet3_drv.c:876:24: warning: restricted __be16 degrades to integer Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	vmxnet3: remove unnecessary byteswapping in BAR writing macros	Harvey Harrison	2010-10-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	readl/writel swap to little-endian internally. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6/udp: report SndbufErrors and RcvbufErrors	Eric Dumazet	2010-10-30	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit a18135eb9389 (Add UDP_MIB_{SND,RCV}BUFERRORS handling.) forgot to make the necessary changes in net/ipv6/proc.c to report additional counters in /proc/net/snmp6 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr	Cyril Chemparathy	2010-10-29	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The marvell 88ec048's official part number is 88e1318s. This patch renames definitions in the driver to reflect this. In addition, a minor bug fix has been added to write back the MSCR1 register value properly. Signed-off-by: Cyril Chemparathy <cyril@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	nfsd4: initialize delegation pointer to lease	J. Bruce Fields	2010-10-30	1	-17/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NFSv4 server was initializing the dp->dl_flock pointer by the somewhat ridiculous method of a locks_copy_lock callback. Now that setlease uses the passed-in lock instead of doing a copy, dl_flock no longer gets set, resulting in the lock leaking on delegation release, and later possible hangs (among other problems). So, initialize dl_flock and get rid of the callback. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	locks: fix setlease methods to free passed-in lock	J. Bruce Fields	2010-10-30	5	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We modified setlease to require the caller to allocate the new lease in the case of creating a new lease, but forgot to fix up the filesystem methods. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	locks: fix leaks on setlease errors	J. Bruce Fields	2010-10-30	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're depending on setlease to free the passed-in lease on failure. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	locks: prevent ENOMEM on lease unlock	J. Bruce Fields	2010-10-30	1	-13/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing a lock shouldn't require any allocations; a failure due to ENOMEM leaves the caller with a choice between retrying or giving up and leaking an unused lease. Next we should split the other lease calls into add and delete cases. I wanted to start with just the bugfix. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	drivers/media/IR/ir-keytable.c: fix binary search	David Härdeman	2010-10-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The input-large-scancode patches changed the binary search in drivers/media/IR/ir-keytable.c to use unsigned integers, but signed integers are actually necessary for the algorithm to work. Signed-off-by: David Härdeman <david@hardeman.nu> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify	Linus Torvalds	2010-10-30	12	-64/+219
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.infradead.org/users/eparis/notify: (22 commits) Ensure FMODE_NONOTIFY is not set by userspace make fanotify_read() restartable across signals fsnotify: remove alignment padding from fsnotify_mark on 64 bit builds fs/notify/fanotify/fanotify_user.c: fix warnings fanotify: Fix FAN_CLOSE comments fanotify: do not recalculate the mask if the ignored mask changed fanotify: ignore events on directories unless specifically requested fsnotify: rename FS_IN_ISDIR to FS_ISDIR fanotify: do not send events for irregular files fanotify: limit number of listeners per user fanotify: allow userspace to override max marks fanotify: limit the number of marks in a single fanotify group fanotify: allow userspace to override max queue depth fsnotify: implement a default maximum queue depth fanotify: ignore fanotify ignore marks if open writers fanotify: allow userspace to flush all marks fsnotify: call fsnotify_parent in perm events fsnotify: correctly handle return codes from listeners fanotify: use __aligned_u64 in fanotify userspace metadata fanotify: implement fanotify listener ordering ...
\| * \|	Ensure FMODE_NONOTIFY is not set by userspace	Lino Sanfilippo	2010-10-30	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In fsnotify_open() ensure that FMODE_NONOTIFY is never set by userspace. Also always call fsnotify_parent and fsnotify. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	make fanotify_read() restartable across signals	Lino Sanfilippo	2010-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In fanotify_read() return -ERESTARTSYS instead of -EINTR to make read() restartable across signals (BSD semantic). Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: remove alignment padding from fsnotify_mark on 64 bit builds	Richard Kennedy	2010-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reorder struct fsnotfiy_mark to remove 8 bytes of alignment padding on 64 bit builds. Shrinks fsnotfiy_mark to 128 bytes allowing more objects per slab in its kmem_cache and reduces the number of cachelines needed for each structure. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fs/notify/fanotify/fanotify_user.c: fix warnings	Andrew Morton	2010-10-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fs/notify/fanotify/fanotify_user.c: In function 'fanotify_release': fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 'lre' fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 're' this is really ugly. Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: Fix FAN_CLOSE comments	Stefan Hajnoczi	2010-10-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comments for FAN_CLOSE_WRITE and FAN_CLOSE_NOWRITE do not match FS_CLOSE_WRITE and FS_CLOSE_NOWRITE, respectively. WRITE is for writable files while NOWRITE is for non-writable files. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: do not recalculate the mask if the ignored mask changed	Eric Paris	2010-10-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If fanotify sets a new bit in the ignored mask it will cause the generic fsnotify layer to recalculate the real mask. This is stupid since we didn't change that part. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: ignore events on directories unless specifically requested	Eric Paris	2010-10-28	3	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify has a very limited number of events it sends on directories. The usefulness of these events is yet to be seen and still we send them. This is particularly painful for mount marks where one might receive many of these useless events. As such this patch will drop events on IS_DIR() inodes unless they were explictly requested with FAN_ON_DIR. This means that a mark on a directory without FAN_EVENT_ON_CHILD or FAN_ON_DIR is meaningless and will result in no events ever (although it will still be allowed since detecting it is hard) Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: rename FS_IN_ISDIR to FS_ISDIR	Eric Paris	2010-10-28	3	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The _IN_ in the naming is reserved for flags only used by inotify. Since I am about to use this flag for fanotify rename it to be generic like the rest. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: do not send events for irregular files	Eric Paris	2010-10-28	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify_should_send_event has a test to see if an object is a file or directory and does not send an event otherwise. The problem is that the test is actually checking if the object with a mark is a file or directory, not if the object the event happened on is a file or directory. We should check the latter. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: limit number of listeners per user	Eric Paris	2010-10-28	4	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify currently has no limit on the number of listeners a given user can have open. This patch limits the total number of listeners per user to 128. This is the same as the inotify default limit. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: allow userspace to override max marks	Eric Paris	2010-10-28	2	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some fanotify groups, especially those like AV scanners, will need to place lots of marks, particularly ignore marks. Since ignore marks do not pin inodes in cache and are cleared if the inode is removed from core (usually under memory pressure) we expose an interface for listeners, with CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to set and 'unlimited' number of marks. Programs which make use of this feature will be able to OOM a machine. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: limit the number of marks in a single fanotify group	Eric Paris	2010-10-28	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is currently no limit on the number of marks a given fanotify group can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as a serious DoS threat. This patch implements a default of 8192, the same as inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating the default DoS'able status. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: allow userspace to override max queue depth	Eric Paris	2010-10-28	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify has a defualt max queue depth. This patch allows processes which explicitly request it to have an 'unlimited' queue depth. These processes need to be very careful to make sure they cannot fall far enough behind that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: implement a default maximum queue depth	Eric Paris	2010-10-28	2	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently fanotify has no maximum queue depth. Since fanotify is CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it certianly is possible that an fanotify listener which can't keep up could OOM the box. This patch implements a default 16k depth. This is the same default depth used by inotify, but given fanotify's better queue merging in many situations this queue will contain many additional useful events by comparison. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: ignore fanotify ignore marks if open writers	Eric Paris	2010-10-28	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify will clear ignore marks if a task changes the contents of an inode. The problem is with the races around when userspace finishes checking a file and when that result is actually attached to the inode. This race was described as such: Consider the following scenario with hostile processes A and B, and victim process C: 1. Process A opens new file for writing. File check request is generated. 2. File check is performed in userspace. Check result is "file has no malware". 3. The "permit" response is delivered to kernel space. 4. File ignored mark set. 5. Process A writes dummy bytes to the file. File ignored flags are cleared. 6. Process B opens the same file for reading. File check request is generated. 7. File check is performed in userspace. Check result is "file has no malware". 8. Process A writes malware bytes to the file. There is no cached response yet. 9. The "permit" response is delivered to kernel space and is cached in fanotify. 10. File ignored mark set. 11. Now any process C will be permitted to open the malware file. There is a race between steps 8 and 10 While fanotify makes no strong guarantees about systems with hostile processes there is no reason we cannot harden against this race. We do that by simply ignoring any ignore marks if the inode has open writers (aka i_writecount > 0). (We actually do not ignore ignore marks if the FAN_MARK_SURV_MODIFY flag is set) Reported-by: Vasily Novikov <vasily.novikov@kaspersky.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: allow userspace to flush all marks	Eric Paris	2010-10-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify is supposed to be able to flush all marks. This is mostly useful for the AV community to flush all cached decisions on a security policy change. This functionality has existed in the kernel but wasn't correctly exposed to userspace. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: call fsnotify_parent in perm events	Eric Paris	2010-10-28	3	-11/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fsnotify perm events do not call fsnotify parent. That means you cannot register a perm event on a directory and enforce permissions on all inodes in that directory. This patch fixes that situation. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: correctly handle return codes from listeners	Eric Paris	2010-10-28	2	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When fsnotify groups return errors they are ignored. For permissions events these should be passed back up the stack, but for most events these should continue to be ignored. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: use __aligned_u64 in fanotify userspace metadata	Eric Paris	2010-10-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the userspace struct exposed by fanotify uses __attribute__((packed)) to make sure that alignment works on multiarch platforms. Since this causes a severe performance penalty on some platforms we are going to switch to using explicit alignment notation on the 64bit values so we don't have to use 'packed' Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: implement fanotify listener ordering	Eric Paris	2010-10-28	2	-2/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fanotify listeners needs to be able to specify what types of operations they are going to perform so they can be ordered appropriately between other listeners doing other types of operations. They need this to be able to make sure that things like hierarchichal storage managers will get access to inodes before processes which need the data. This patch defines 3 possible uses which groups must indicate in the fanotify_init() flags. FAN_CLASS_PRE_CONTENT FAN_CLASS_CONTENT FAN_CLASS_NOTIF Groups will receive notification in that order. The order between 2 groups in the same class is undeterministic. FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to the inode before they are certain that the inode contains it's final data. A hierarchical storage manager should choose to use this class. FAN_CLASS_CONTENT is intended to be used by listeners which need access to the inode after it contains its intended contents. This would be the appropriate level for an AV solution or document control system. FAN_CLASS_NOTIF is intended for normal async notification about access, much the same as inotify and dnotify. Syncronous permissions events are not permitted at this class. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fsnotify: implement ordering between notifiers	Eric Paris	2010-10-28	3	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify needs to be able to specify that some groups get events before others. They use this idea to make sure that a hierarchical storage manager gets access to files before programs which actually use them. This is purely infrastructure. Everything will have a priority of 0, but the infrastructure will exist for it to be non-zero. Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	fanotify: allow fanotify to be built	Eric Paris	2010-10-28	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We disabled the ability to build fanotify in commit 7c5347733dcc4ba0ba. This reverts that commit and allows people to build fanotify. Signed-off-by: Eric Paris <eparis@redhat.com>
\| \| \|
\| \ \
*-. \ \	Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of ↵	Linus Torvalds	2010-10-30	12	-82/+153
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: jump label: Add work around to i386 gcc asm goto bug x86, ftrace: Use safe noops, drop trap test jump_label: Fix unaligned traps on sparc. jump label: Make arch_jump_label_text_poke_early() optional jump label: Fix error with preempt disable holding mutex oprofile: Remove deprecated use of flush_scheduled_work() oprofile: Fix the hang while taking the cpu offline jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex jump label: Fix module __init section race * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check irq_remapped instead of remapping_enabled in destroy_irq()
\| \| * \| \|	x86: Check irq_remapped instead of remapping_enabled in destroy_irq()	Yinghai Lu	2010-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Russ Anderson reported: \| There is a regression that is causing a NULL pointer dereference \| in free_irte when shutting down xpc. git bisect narrowed it down \| to git commit d585d06(intr_remap: Simplify the code further), which \| changed free_irte(). Reverse applying the patch fixes the problem. We need to use irq_remapped() for each irq instead of checking only intr_remapping_enabled as there might be non remapped irqs even when remapping is enabled. [ tglx: use cfg instead of retrieving it again. Massaged changelog ] Reported-bisected-and-tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4CCBD511.40607@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \| \| \|	Merge branch 'tip/perf/jump-label-2' of ↵	Ingo Molnar	2010-10-30	5744	-270289/+411314
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent