summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| | * ieee1394: sbp2: use list_move_tail()Stefan Richter2006-12-071-2/+1
| | | | | | | | | | | | | | | | | | It's OK to reorder list_del() and sbp2util_free_command_dma() here. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: more concise names for types and variablesStefan Richter2006-12-072-598/+575
| | | | | | | | | | | | | | | | | | | | | | | | | | | "struct scsi_id_instance_data" represents a logical unit. Rename it to "struct sbp2_lu", and "scsi_id" to "lu". Rename some other variables too. Wrap almost all lines after at most 80 columns. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove unused struct membersStefan Richter2006-12-072-25/+4
| | | | | | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: proper unit in module parameter descriptionStefan Richter2006-12-071-1/+2
| | | | | | | | | | | | | | | | | | It's 2^20 bit/s, not 0.001 bit. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: clean up sbp2_ namespaceStefan Richter2006-12-072-133/+132
| | | | | | | | | | | | | | | | | | | | | Prepend sbp2*_ to anything globally defined in sbp2.c except for some macros. Strip sbp2_ from names of struct members. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: some conditions in queue_command are unlikelyStefan Richter2006-12-071-5/+5
| | | | | | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove superfluous commentsStefan Richter2006-12-072-355/+109
| | | | | | | | | | | | | | | | | | And update and reformat remaining comments. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: delayed_work -> work_structStefan Richter2006-12-072-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This work is not delayed. Also bring the code format in a state which reduces my work to merge pending sbp2 patchs. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: coding style of some macrosStefan Richter2006-12-071-77/+78
| | | | | | | | | | | | | | | | | | Adjust parentheses, indentation, line lengths. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove debug macrosStefan Richter2006-12-071-280/+20
| | | | | | | | | | | | | | | | | | No need to keep them in released sources. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: consolidate log levelsStefan Richter2006-12-071-19/+12
| | | | | | | | | | | | | | | | | | | | | | | | Replace some calls to SBP2_ERR and SBP2_WARN by SBP2_INFO. Remove logging macros SBP2_NOTICE and SBP2_WARN. Remove direct usage of HPSB_ logging macros. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove duplicate codeStefan Richter2006-12-071-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | The same case is handled further below in sbp2scsi_complete_command. Note, the second version behaves slightly different but looks preferable. It's an extremely unlikely case by the way. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove dead codeStefan Richter2006-12-071-15/+0
| | | | | | | | | | | | | | | | | | This has been within #if 0 for a long time and is wrong anyway. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: clean up function declarationsStefan Richter2006-12-072-104/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unnecessary function prototypes. Remove variable names from function prototypes. Move declarations from sbp2.h to sbp2.c. Move definitions of driver templates together near the top of sbp2.c. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: remove irritating log messageStefan Richter2006-12-071-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The queue depth can be read from /sys/bus/scsi/devices/*/queue_depth, so don't log it. And the hint about speed improvements is misleading, at least under current kernels. If serialization is switched off, read performance is typically increased by less than 10%. (I did not test write performance recently.) On the other hand, serialize_io=0 is not yet safe due to some implementation issues that are not trivial to fix. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ohci1394: shortcut irq printingAlexey Dobriyan2006-12-071-4/+2
| | | | | | | | | | | | | | | | | | | | | To print irq number no need to transform to string using %d, then print using %s. Just use %d. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| | * ieee1394: nodemgr: take it easy if bus_rescan_devices failsStefan Richter2006-12-071-3/+2
| | | | | | | | | | | | | | | | | | This happens. No need to log a BUG trace. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * drivers/ieee1394/*: use kmemdup()Eric Sesterhenn2006-12-072-4/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: ohci1394: proper log messages in suspend and resumeStefan Richter2006-12-071-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - correct thinko in one of my last commits: cannot use PRINT macro with ohci == NULL - add log messages on ohci == NULL and on pci_enable_device != 0 - update log macros from patch "revert fail on error in suspend" to use PRINT and DBGMSG where possible Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: ohci1394: revert fail on error in suspendStefan Richter2006-12-071-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | Some errors during preparation for suspended state can be skipped with a warning instead of a failure of the whole suspend transition, notably an error in pci_set_power_state. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: only build OUI database files if config enabledRandy Dunlap2006-12-071-1/+4
| | | | | | | | | | | | | | | | | | | | | Only build IEEE1394 OUI database files if the config option is enabled. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: fix printk format warningRandy Dunlap2006-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix printk format warning: drivers/ieee1394/nodemgr.c:364: warning: long long unsigned int format, u64 arg (arg 3) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: nodemgr: revise semaphore protection of driver core dataStefan Richter2006-12-071-50/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The list "struct class.children" is supposed to be protected by class.sem, not by class.subsys.rwsem. - nodemgr_remove_uds() iterated over nodemgr_ud_class.children without proper protection. This was never observed as a bug since the code is usually only accessed by knodemgrd. All knodemgrds are currently globally serialized. But userspace can trigger this code too by writing to /sys/bus/ieee1394/destroy_node. - Clean up access to the FireWire bus type's subsys.rwsem: Access it uniformly via ieee1394_bus_type. Shrink rwsem protected regions where possible. Expand them where necessary. The latter wasn't a problem so far because knodemgr is globally serialized. This should harden the interaction of ieee1394 with sysfs and lay ground for deserialized operation of multiple knodemgrds and for implementation of subthreads for parallelized scanning and probing. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: nodemgr: reflect which return values are errorsStefan Richter2006-12-071-34/+30
| | | | | | | | | | | | | | | | | | Give better names to local variables. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: nodemgr: small fix after sysfs errors patchStefan Richter2006-12-071-1/+1
| | | | | | | | | | | | | | | | | | One hunk in "ieee1394: handle sysfs errors" was wrong. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * dv1394: remove BKL contentionStefan Richter2006-12-071-11/+3
| | | | | | | | | | | | | | | | | | Purges the one remaining call to lock_kernel() from the 1394 subsystem. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * video1394: remove BKL contentionDaniel Drake2006-12-071-24/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | video1394 does not need to take the BKL. The data structures shared between file_operations and interrupts are already protected through context-specific spinlocks. The only other danger is video1394_release() being called during another operation, however this cannot happen because release is only ever invoked when the last thread has closed the fd. Signed-off-by: Daniel Drake <ddrake@brontes3d.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * video1394: small optimizations to frame retrieval codepathDaniel Drake2006-12-071-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add some GCC branch prediction optimizations to unlikely error/safety conditions in the ioctl handling code commonly called during an application's capture loop. Signed-off-by: Daniel Drake <ddrake@brontes3d.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: handle sysfs errorsStefan Richter2006-12-072-49/+118
| | | | | | | | | | | | | | | | | | Handle driver core errors with as much care as appropriate. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: coding style in hosts.cStefan Richter2006-12-071-7/+11
| | | | | | | | | | | | | | | | | | Some 80-columns pedantry, and touch up of a // comment. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: lock smaller region by host_num_alloc mutexStefan Richter2006-12-071-4/+1
| | | | | | | | | | | | | | | | | | We need the mutex only around the iteration over existing hosts. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: usecs_to_jiffies takes unsigned int argumentStefan Richter2006-12-071-5/+3
| | | | | | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: ohci1394: suspend/resume cosmeticsStefan Richter2006-12-071-78/+48
| | | | | | | | | | | | | | | | | | | | | Reorder the definitions of ohci1394_pci_suspend and _resume. Remove redundant comments. Beautify return statements. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ohci1394: steps to implement suspend/resumeBernhard Kaindl2006-12-071-14/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I did a quick shot on what I described and the appended patch does the first thing needed for working suspend/resume in ohci1394 which is HW de- and re-initialisation. It works with suspend2disk on my Ricoh R5C552 IEEE 1394 Controller with the 2.6.17 kernel to the extent that if I call dvgrab --interactive after suspend2disk without unloading ohci1394, it does not lock up dvgrab with 100% CPU but properly connects to the camera, given that I first unplug and plug the camera after coming back from suspend. I guess that could be fixed by forcing a bus reset in the resume function. I cannot test suspend to RAM here at the moment and should follow the guidelines in Documentation/power/pci.txt also, so this is rather a quick report than a finished patch and there are some rough edges: However, with this patch, I have to unload at least some in-kernel users of ohci1394 like dv1394 or video1394 before suspending. Not doing that caused an Oops and a bad tasklet error, probably from not handling ISO tasklets during suspend/resume properly. Maybe these can be temporarily cleared or unregistered and re-registered for suspend/resume with help from the other layers or from the highlevel 1394 core, but I do not really know what these do. But this patch provides a useful base to start from and is already of much help for people which do not need dv1394 and video1394 or can unload them at least during suspend. I cannot test function with sbp2 at the moment, but raw1394 seems to work fine. Signed-off-by: Bernhard Kaindl <bk@fsfe.org> Update 1: merge with previous two ohci1394 suspend/resume patches Update 2: version for application on top of Linux 2.6.19-rc4 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: raw1394: add comments on lock usageStefan Richter2006-12-071-5/+5
| | | | | | | | | | | | | | | | | | | | | Add a who-is-who about some locks and list heads in raw1394's struct definitions. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: sbp2: slightly reorder sbp2scsi_abortStefan Richter2006-12-071-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Put the target's fetch agent into reset state before the underlying ORB DMA is unmapped and the ->done handler is called. It is highly unlikely but the target could access that ORB right before sbp2 sends the reset request. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * ieee1394: remove unused struct member from highlevel APIStefan Richter2006-12-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | struct hpsb_highlevel's struct module *owner is neither used by the IEEE 1394 core nor set by any of the in-tree drivers or the two out-of-tree highlevel drivers I know about (dfg1394, mem1394) --- nor is this member documented. An unscheduled removal seems acceptable. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2006-12-07290-1982/+13137
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (76 commits) [ARM] 4002/1: S3C24XX: leave parent IRQs unmasked [ARM] 4001/1: S3C24XX: shorten reboot time [ARM] 3983/2: remove unused argument to __bug() [ARM] 4000/1: Osiris: add third serial port in [ARM] 3999/1: RX3715: suspend to RAM support [ARM] 3998/1: VR1000: LED platform devices [ARM] 3995/1: iop13xx: add iop13xx support [ARM] 3968/1: iop13xx: add iop13xx_defconfig [ARM] Update mach-types [ARM] Allow gcc to optimise arm_add_memory a little more [ARM] 3991/1: i.MX/MX1 high resolution time source [ARM] 3990/1: i.MX/MX1 more precise PLL decode [ARM] 3986/1: H1940: suspend to RAM support [ARM] 3985/1: ixp4xx clocksource cleanup [ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2) [ARM] 3994/1: ixp23xx: fix handling of pci master aborts [ARM] 3981/1: sched_clock for PXA2xx [ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to 63 bit [ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit period [ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter ...
| | | \
| | | \
| | | \
| | | \
| | | \
| | | \
| | | \
| | | \
| | | \
| | | \
| | *---------. \ [ARM] Merge individual ARM sub-treesRussell King2006-12-07290-1982/+13137
| | |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge: Atmel AT91RM9200 and AT91SAM9260 changes General ARM developments Disconfiguous memory cleanups 64-bit/32-bit division and sched_clock extension patches EP93xx support changes IOP support changes Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | | | * | [ARM] 3995/1: iop13xx: add iop13xx supportDan Williams2006-12-0728-4/+3324
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iop348 processor integrates an Xscale (XSC3 512KB L2 Cache) core with a Serial Attached SCSI (SAS) controller, multi-ported DDR2 memory controller, 3 Application Direct Memory Access (DMA) controllers, a 133Mhz PCI-X interface, a x8 PCI-Express interface, and other peripherals to form a system-on-a-chip RAID subsystem engine. The iop342 processor replaces the SAS controller with a second Xscale core for dual core embedded applications. The iop341 processor is the single core version of iop342. This patch supports the two Intel customer reference platforms iq81340mc for external storage and iq81340sc for direct attach (HBA) development. The developer's manual is available here: ftp://download.intel.com/design/iio/docs/31503701.pdf Changelog: * removed virtual addresses from resource definitions * cleaned up some unnecessary #include's Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | | | * | [ARM] 3968/1: iop13xx: add iop13xx_defconfigDan Williams2006-12-071-0/+1134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | | | * | [ARM] Update mach-typesRussell King2006-12-071-5/+53
| | | |_|_|_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | | * | [ARM] 3993/1: ep93xx: add cirrus logic edb9302a supportLennert Buytenhek2006-12-073-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the Cirrus Logic EDB9302A Evaluation Board. Confirmed to work by Chase Douglas. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | | * | [ARM] 3964/1: ep93xx: add ads sphere supportLennert Buytenhek2006-12-014-0/+99
| | | |_|_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add initial board support for the ADS Sphere board. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | * | [ARM] 3981/1: sched_clock for PXA2xxNicolas Pitre2006-12-071-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's a 63-bit implementation of shed_clock() for PXA2xx. The actual period depends on the value of CLOCK_TICK_RATE and whether or not reduced scaling factors were provided for it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | * | [ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to ↵Nicolas Pitre2006-12-071-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 63 bit period This provides a 63 bit clock counter guaranteed to be monotonic over a period of 35583 days instead of a clock wrap every 179 seconds, as long as sched_clock() is called at least once every 89 seconds. This should not be a problem in practice, although a kernel timer could be scheduled every 80 seconds for example simply to call sched_clock() making sure top bits are always synchronized if need be. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | * | [ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit ↵Nicolas Pitre2006-12-071-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | period This provides a 63 bit clock counter guaranteed to be monotonic over a period of 370 days instead of a clock wrap every 19.4 minutes, as long as sched_clock() is called at least once every 9.7 minutes which shouldn't be a problem in practice. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | * | [ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counterNicolas Pitre2006-12-071-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is done in a completely lockless fashion. Bits 0 to 31 of the count are provided by the hardware while bits 32 to 62 are stored in memory. The top bit in memory is used to synchronize with the hardware count half-period. When the top bit of both counters (hardware and in memory) differ then the memory is updated with a new value, incrementing it when the hardware counter wraps around. Because a word store in memory is atomic then the incremented value will always be in synch with the top bit indicating to any potential concurrent reader if the value in memory is up to date or not wrt the needed increment. And any race in updating the value in memory is harmless as the same value would be stored more than once. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | | * | [ARM] 3611/4: optimize do_div() when divisor is constantNicolas Pitre2006-12-071-1/+179
| | | |_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARM all divisions have to be performed "manually". For 64-bit divisions that may take more than a hundred cycles in many cases. With 32-bit divisions gcc already use the recyprocal of constant divisors to perform a multiplication, but not with 64-bit divisions. Since the kernel is increasingly relying upon 64-bit divisions it is worth optimizing at least those cases where the divisor is a constant. This is what this patch does using plain C code that gets optimized away at compile time. For example, despite the amount of added C code, do_div(x, 10000) now produces the following assembly code (where x is assigned to r0-r1): adr r4, .L0 ldmia r4, {r4-r5} umull r2, r3, r4, r0 mov r2, #0 umlal r3, r2, r5, r0 umlal r3, r2, r4, r1 mov r3, #0 umlal r2, r3, r5, r1 mov r0, r2, lsr #11 orr r0, r0, r3, lsl #21 mov r1, r3, lsr #11 ... .L0: .word 948328779 .word 879609302 which is the fastest that can be done for any value of x in that case, many times faster than the __do_div64 code (except for the small x value space for which the result ends up being zero or a single bit). The fact that this code is generated inline produces a tiny increase in .text size, but not significant compared to the needed code around each __do_div64 call site this code is replacing. The algorithm used has been validated on a 16-bit scale for all possible values, and then recodified for 64-bit values. Furthermore I've been running it with the final BUG_ON() uncommented for over two months now with no problem. Note that this new code is compiled with gcc versions 4.0 or later. Earlier gcc versions proved themselves too problematic and only the original code is used with them. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | | | * | [ARM] Allow gcc to optimise arm_add_memory a little moreRussell King2006-12-071-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some reason, gcc was calculating meminfo.bank[meminfo.nr_banks] repeatedly. Use a pointer to it instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>