summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* soc/qman: export non-programmable FQD fields queryHoria Geantă2017-03-242-63/+2
| | | | | | | | | Export qman_query_fq_np() function and related structures. This will be needed in the caam/qi driver, where "queue empty" condition will be decided based on the frm_cnt. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* soc/qman: add dedicated channel ID for CAAMHoria Geantă2017-03-241-1/+5
| | | | | | | | Add and export the ID of the channel serviced by the CAAM (Cryptographic Acceleration and Assurance Module) DCP. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* soc/qman: export volatile dequeue related structsHoria Geantă2017-03-241-36/+0
| | | | | | | | Since qman_volatile_dequeue() is already exported, move the related structures into the public header too. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccp - Enable support for AES GCM on v5 CCPsGary R Hook2017-03-245-0/+531
| | | | | | | | | A version 5 device provides the primitive commands required for AES GCM. This patch adds support for en/decryption. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccp - Enable 3DES function on v5 CCPsGary R Hook2017-03-248-2/+552
| | | | | | | Wire up support for Triple DES in ECB mode. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccp - Add SHA-2 384- and 512-bit supportGary R Hook2017-03-243-3/+99
| | | | | | | | Incorporate 384-bit and 512-bit hashing for a version 5 CCP device Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxHerbert Xu2017-03-24266-2852/+5093
|\ | | | | | | Merging 4.11-rc3 to pick up md5 removal from /dev/random.
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds2017-03-1923-546/+1267
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI target fixes from Nicholas Bellinger: "The bulk of the changes are in qla2xxx target driver code to address various issues found during Cavium/QLogic's internal testing (stable CC's included), along with a few other stability and smaller miscellaneous improvements. There are also a couple of different patch sets from Mike Christie, which have been a result of his work to use target-core ALUA logic together with tcm-user backend driver. Finally, a patch to address some long standing issues with pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices, which will make folks using physical (or virtual) magnetic tape happy" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits) qla2xxx: Update driver version to 9.00.00.00-k qla2xxx: Fix delayed response to command for loop mode/direct connect. qla2xxx: Change scsi host lookup method. qla2xxx: Add DebugFS node to display Port Database qla2xxx: Use IOCB interface to submit non-critical MBX. qla2xxx: Add async new target notification qla2xxx: Export DIF stats via debugfs qla2xxx: Improve T10-DIF/PI handling in driver. qla2xxx: Allow relogin to proceed if remote login did not finish qla2xxx: Fix sess_lock & hardware_lock lock order problem. qla2xxx: Fix inadequate lock protection for ABTS. qla2xxx: Fix request queue corruption. qla2xxx: Fix memory leak for abts processing qla2xxx: Allow vref count to timeout on vport delete. tcmu: Convert cmd_time_out into backend device attribute tcmu: make cmd timeout configurable tcmu: add helper to check if dev was configured target: fix race during implicit transition work flushes target: allow userspace to set state to transitioning target: fix ALUA transition timeout handling ...
| | * qla2xxx: Update driver version to 9.00.00.00-kHimanshu Madhani2017-03-181-3/+3
| | | | | | | | | | | | | | | | | | Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Fix delayed response to command for loop mode/direct connect.Quinn Tran2017-03-186-20/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current driver wait for FW to be in the ready state before processing in-coming commands. For Arbitrated Loop or Point-to- Point (not switch), FW Ready state can take a while. FW will transition to ready state after all Nports have been logged in. In the mean time, certain initiators have completed the login and starts IO. Driver needs to start processing all queues if FW is already started. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Change scsi host lookup method.Quinn Tran2017-03-187-40/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For target mode, when new scsi command arrive, driver first performs a look up of the SCSI Host. The current look up method is based on the ALPA portion of the NPort ID. For Cisco switch, the ALPA can not be used as the index. Instead, the new search method is based on the full value of the Nport_ID via btree lib. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Add DebugFS node to display Port DatabaseHimanshu Madhani2017-03-182-4/+90
| | | | | | | | | | | | | | | | | | Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Use IOCB interface to submit non-critical MBX.Quinn Tran2017-03-186-65/+279
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Mailbox interface is currently over subscribed. We like to reserve the Mailbox interface for the chip managment and link initialization. Any non essential Mailbox command will be routed through the IOCB interface. The IOCB interface is able to absorb more commands. Following commands are being routed through IOCB interface - Get ID List (007Ch) - Get Port DB (0064h) - Get Link Priv Stats (006Dh) Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Add async new target notificationQuinn Tran2017-03-182-3/+4
| | | | | | | | | | | | | | | | | | Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Export DIF stats via debugfsAnil Gurumurthy2017-03-182-0/+27
| | | | | | | | | | | | | | | | | | Signed-off-by: Anil Gurumurthy <anil.gurumurthy@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Improve T10-DIF/PI handling in driver.Quinn Tran2017-03-187-251/+406
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add routines to support T10 DIF tag. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Anil Gurumurthy <anil.gurumurthy@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Allow relogin to proceed if remote login did not finishQuinn Tran2017-03-184-8/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the remote port have started the login process, then the PLOGI and PRLI should be back to back. Driver will allow the remote port to complete the process. For the case where the remote port decide to back off from sending PRLI, this local port sets an expiration timer for the PRLI. Once the expiration time passes, the relogin retry logic is allowed to go through and perform login with the remote port. Signed-off-by: Quinn Tran <quinn.tran@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Fix sess_lock & hardware_lock lock order problem.Quinn Tran2017-03-181-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main lock that needs to be held for CMD or TMR submission to upper layer is the sess_lock. The sess_lock is used to serialize cmd submission and session deletion. The addition of hardware_lock being held is not necessary. This patch removes hardware_lock dependency from CMD/TMR submission. Use hardware_lock only for error response in this case. Path1 CPU0 CPU1 ---- ---- lock(&(&ha->tgt.sess_lock)->rlock); lock(&(&ha->hardware_lock)->rlock); lock(&(&ha->tgt.sess_lock)->rlock); lock(&(&ha->hardware_lock)->rlock); Path2/deadlock *** DEADLOCK *** Call Trace: dump_stack+0x85/0xc2 print_circular_bug+0x1e3/0x250 __lock_acquire+0x1425/0x1620 lock_acquire+0xbf/0x210 _raw_spin_lock_irqsave+0x53/0x70 qlt_sess_work_fn+0x21d/0x480 [qla2xxx] process_one_work+0x1f4/0x6e0 Cc: <stable@vger.kernel.org> Cc: Bart Van Assche <Bart.VanAssche@sandisk.com> Reported-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Fix inadequate lock protection for ABTS.Quinn Tran2017-03-181-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, ABTS is sent to Target Core as Task MGMT command. In the case of error, qla2xxx needs to send response, hardware_lock is required to prevent request queue corruption. Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Fix request queue corruption.Quinn Tran2017-03-181-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When FW notify driver or driver detects low FW resource, driver tries to send out Busy SCSI Status to tell Initiator side to back off. During the send process, the lock was not held. Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Fix memory leak for abts processingQuinn Tran2017-03-181-0/+2
| | | | | | | | | | | | | | | | | | | | | Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * qla2xxx: Allow vref count to timeout on vport delete.Joe Carnuccio2017-03-185-10/+16
| | | | | | | | | | | | | | | | | | | | | Cc: <stable@vger.kernel.org> Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * tcmu: Convert cmd_time_out into backend device attributeNicholas Bellinger2017-03-181-26/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of putting cmd_time_out under ../target/core/user_0/foo/control, which has historically been used by parameters needed for initial backend device configuration, go ahead and move cmd_time_out into a backend device attribute. In order to do this, tcmu_module_init() has been updated to create a local struct configfs_attribute **tcmu_attrs, that is based upon the existing passthrough_attrib_attrs along with the new cmd_time_out attribute. Once **tcm_attrs has been setup, go ahead and point it at tcmu_ops->tb_dev_attrib_attrs so it's picked up by target-core. Also following MNC's previous change, ->cmd_time_out is stored in milliseconds but exposed via configfs in seconds. Also, note this patch restricts the modification of ->cmd_time_out to before + after the TCMU device has been configured, but not while it has active fabric exports. Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * tcmu: make cmd timeout configurableMike Christie2017-03-181-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A single daemon could implement multiple types of devices using multuple types of real devices that may not support restarting from crashes and/or handling tcmu timeouts. This makes the cmd timeout configurable, so handlers that do not support it can turn if off for now. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * tcmu: add helper to check if dev was configuredMike Christie2017-03-181-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a helper to check if the dev was configured. It will be used in the next patch to prevent updates to some config settings after the device has been setup. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: fix race during implicit transition work flushesMike Christie2017-03-181-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following races: 1. core_alua_do_transition_tg_pt could have read tg_pt_gp_alua_access_state and gone into this if chunk: if (!explicit && atomic_read(&tg_pt_gp->tg_pt_gp_alua_access_state) == ALUA_ACCESS_STATE_TRANSITION) { and then core_alua_do_transition_tg_pt_work could update the state. core_alua_do_transition_tg_pt would then only set tg_pt_gp_alua_pending_state and the tg_pt_gp_alua_access_state would not get updated with the second calls state. 2. core_alua_do_transition_tg_pt could be setting tg_pt_gp_transition_complete while the tg_pt_gp_transition_work is already completing. core_alua_do_transition_tg_pt then waits on the completion that will never be called. To handle these issues, we just call flush_work which will return when core_alua_do_transition_tg_pt_work has completed so there is no need to do the complete/wait. And, if core_alua_do_transition_tg_pt_work was running, instead of trying to sneak in the state change, we just schedule up another core_alua_do_transition_tg_pt_work call. Note that this does not handle a possible race where there are multiple threads call core_alua_do_transition_tg_pt at the same time. I think we need a mutex in target_tg_pt_gp_alua_access_state_store. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: allow userspace to set state to transitioningMike Christie2017-03-181-15/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace target_core_user handlers like tcmu-runner may want to set the ALUA state to transitioning while it does implicit transitions. This patch allows that state when set from configfs. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: fix ALUA transition timeout handlingMike Christie2017-03-181-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implicit transition time tells initiators the min time to wait before timing out a transition. We currently schedule the transition to occur in tg_pt_gp_implicit_trans_secs seconds so there is no room for delays. If core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata needs to write out info to a remote file, then the initiator can easily time out the operation. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: Use system workqueue for ALUA transitionsMike Christie2017-03-181-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If tcmu-runner is processing a STPG and needs to change the kernel's ALUA state then we cannot use the same work queue for task management requests and ALUA transitions, because we could deadlock. The problem occurs when a STPG times out before tcmu-runner is able to call into target_tg_pt_gp_alua_access_state_store-> core_alua_do_port_transition -> core_alua_do_transition_tg_pt -> queue_work. In this case, the tmr is on the work queue waiting for the STPG to complete, but the STPG transition is now queued behind the waiting tmr. Note: This bug will also be fixed by this patch: http://www.spinics.net/lists/target-devel/msg14560.html which switches the tmr code to use the system workqueues. For both, I am not sure if we need a dedicated workqueue since it is not a performance path and I do not think we need WQ_MEM_RECLAIM to make forward progress to free up memory like the block layer does. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: fail ALUA transitions for pscsiMike Christie2017-03-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not setup the LU group for pscsi devices, so if you write a state to alua_access_state that will cause a transition you will get a NULL pointer dereference. This patch will fail attempts to try and transition the path for backend devices that set the TRANSPORT_FLAG_PASSTHROUGH_ALUA flag. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: allow ALUA setup for some passthrough backendsMike Christie2017-03-183-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows passthrough backends to use the core/base LIO ALUA setup and state checks, but still handle the execution of commands. This will allow the target_core_user module to execute STPG and RTPG in userspace, and not have to duplicate the ALUA state checks, path information (needed so we can check if command is executable on specific paths) and setup (rtslib sets/updates the configfs ALUA interface like it does for iblock or file). For STPG, the target_core_user userspace daemon, tcmu-runner will still execute the STPG, and to update the core/base LIO state it will use the existing configfs interface. For RTPG, tcmu-runner will loop over configfs and/or cache the state. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * tcmu: return on first Opt parse failureMike Christie2017-03-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | We only were returing failure if the last opt to be parsed failed. This has a return failure when we first detect a failure. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * tcmu: allow hw_max_sectors greater than 128Mike Christie2017-03-181-19/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcmu hard codes the hw_max_sectors to 128 which is a litle small. Userspace uses the max_sectors to report the optimal IO size and some initiators perform better with larger IOs (open-iscsi seems to do better with 256 to 512 depending on the test). (Fix do not display hw max sectors twice - MNC) Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: Drop pointless tfo->check_stop_free checkNicholas Bellinger2017-03-182-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All in-tree fabric drivers provide a tfo->check_stop_free(), so there is no need to do the extra check within existing transport_cmd_check_stop_to_fabric() code. Just to be sure, add a check in target_fabric_tf_ops_check() to notify any out-of-tree drivers that might be missing it. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target: Fix VERIFY_16 handling in sbc_parse_cdbMax Lohrmann2017-03-071-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Max, the Windows 2008 R2 chkdsk utility expects VERIFY_16 to be supported, and does not handle the returned CHECK_CONDITION properly, resulting in an infinite loop. The kernel will log huge amounts of this error: kernel: TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x8f, sending CHECK_CONDITION. Signed-off-by: Max Lohrmann <post@wickenrode.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER exportNicholas Bellinger2017-03-071-35/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following fixes a divide by zero OOPs with TYPE_TAPE due to pscsi_tape_read_blocksize() failing causing a zero sd->sector_size being propigated up via dev_attrib.hw_block_size. It also fixes another long-standing bug where TYPE_TAPE and TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(), which does not call scsi_device_get() to take the device reference. Instead, rename pscsi_create_type_rom() to pscsi_create_type_nondisk() and use it for all cases. Finally, also drop a dump_stack() in pscsi_get_blocks() for non TYPE_DISK, which in modern target-core can get invoked via target_sense_desc_format() during CHECK_CONDITION. Reported-by: Malcolm Haak <insanemal@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2017-03-191-3/+30
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull device-dax fixes from Dan Williams: "The device-dax driver was not being careful to handle falling back to smaller fault-granularity sizes. The driver already fails fault attempts that are smaller than the device's alignment, but it also needs to handle the cases where a larger page mapping could be established. For simplicity of the immediate fix the implementation just signals VM_FAULT_FALLBACK until fault-size == device-alignment. One fix is for -stable to address pmd-to-pte fallback from the original implementation, another fix is for the new (introduced in 4.11-rc1) pud-to-pmd regression, and a typo fix comes along for the ride. These have received a build success notification from the kbuild robot" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: fix debug output typo device-dax: fix pud fault fallback handling device-dax: fix pmd/pte fault fallback handling
| | * | device-dax: fix debug output typoDave Jiang2017-03-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The debug output for return the return data of pgoff_to_phys() in the fault handlers has 'phys' and 'pgoff' incorrectly swapped. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | device-dax: fix pud fault fallback handlingDave Jiang2017-03-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Moyer reports: With a device dax alignment of 4KB or 2MB, I get sigbus when running the attached fio job file for the current kernel (4.11.0-rc1+). If I specify an alignment of 1GB, it works. I turned on debug output, and saw that it was failing in the huge fault code. dax dax1.0: dax_open dax dax1.0: dax_mmap dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60) dax dax1.0: dax_release fio config for reproduce: [global] ioengine=dev-dax direct=0 filename=/dev/dax0.0 bs=2m [write] rw=write [read] stonewall rw=read The driver fails to fallback when taking a fault that is larger than the device alignment, or handling a larger fault when a smaller mapping is already established. While we could support larger mappings for a device with a smaller alignment, that change is too large for the immediate fix. The simplest change is to force fallback until the fault size matches the alignment. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | device-dax: fix pmd/pte fault fallback handlingDave Jiang2017-03-101-0/+15
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Moyer reports: With a device dax alignment of 4KB or 2MB, I get sigbus when running the attached fio job file for the current kernel (4.11.0-rc1+). If I specify an alignment of 1GB, it works. I turned on debug output, and saw that it was failing in the huge fault code. dax dax1.0: dax_open dax dax1.0: dax_mmap dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60 dax dax1.0: dax_release fio config for reproduce: [global] ioengine=dev-dax direct=0 filename=/dev/dax0.0 bs=2m [write] rw=write [read] stonewall rw=read The driver fails to fallback when taking a fault that is larger than the device alignment, or handling a larger fault when a smaller mapping is already established. While we could support larger mappings for a device with a smaller alignment, that change is too large for the immediate fix. The simplest change is to force fallback until the fault size matches the alignment. Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Cc: <stable@vger.kernel.org> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | Merge tag 'pm-4.11-rc3' of ↵Linus Torvalds2017-03-172-36/+36
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a few more intel_pstate issues and one small issue in the cpufreq core. Specifics: - Fix breakage in the intel_pstate's debugfs interface for PID controller tuning (Rafael Wysocki) - Fix computations related to P-state limits in intel_pstate to avoid excessive rounding errors leading to visible inaccuracies (Srinivas Pandruvada, Rafael Wysocki) - Add a missing newline to a message printed by one function in the cpufreq core and clean up that function (Rafael Wysocki)" * tag 'pm-4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Fix and clean up show_cpuinfo_cur_freq() cpufreq: intel_pstate: Avoid percentages in limits-related computations cpufreq: intel_pstate: Correct frequency setting in the HWP mode cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()
| | * \ Merge branches 'pm-cpufreq-fixes' and 'intel_pstate-fixes'Rafael J. Wysocki2017-03-181-33/+31
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq-fixes: cpufreq: Fix and clean up show_cpuinfo_cur_freq() * intel_pstate-fixes: cpufreq: intel_pstate: Avoid percentages in limits-related computations cpufreq: intel_pstate: Correct frequency setting in the HWP mode cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()
| | | * | cpufreq: intel_pstate: Avoid percentages in limits-related computationsRafael J. Wysocki2017-03-151-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, intel_pstate_update_perf_limits() first converts the policy minimum and maximum limits into percentages of the maximum turbo frequency (rounding up to an integer) and then converts these percentages to fractions (by using fixed-point arithmetic to divide them by 100). That introduces a rounding error unnecessarily, because the fractions can be obtained by carrying out fixed-point divisions directly on the input numbers. Rework the computations in intel_pstate_hwp_set() to use fractions instead of percentages (and drop redundant local variables from there) and modify intel_pstate_update_perf_limits() to compute the fractions directly and percentages out of them. While at it, introduce percent_ext_fp() for converting percentages to fractions (with extended number of fraction bits) and use it in the computations. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | cpufreq: intel_pstate: Correct frequency setting in the HWP modeSrinivas Pandruvada2017-03-141-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the functions intel_pstate_hwp_set(), min/max range from HWP capability MSR along with max_perf_pct and min_perf_pct, is used to set the HWP request MSR. In some cases this doesn't result in the correct HWP max/min in HWP request. For example: In the following case: HWP capabilities from MSR 0x771 0x70a1220 Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz respectively. This will result in hwp_min = 0x07 hwp_max = 0x20 To limit max frequency to 2GHz: perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up) With the current calculation: adj_range = max_perf_pct * range / 100; adj_range = 63 * (32 - 7) / 100 adj_range = 15 max = hw_min + adj_range; max = 7 + 15 = 22 This will result in HWP request of 0x160f, which will result in a frequency cap of 2.2GHz not 2GHz. The problem with the above calculation is that hwp_min of 7 is treated as 0% in the range. But max_perf_pct is calculated with respect to minimum as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in more than the desired. Since the min_perf_pct and max_perf_pct is already a percent of max frequency or hwp_max, this min/max HWP request value can be calculated directly applying these percentage to hwp_max. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()Rafael J. Wysocki2017-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the debugfs interface for PID tuning to actually update pid_params.sample_rate_ns on PID parameters updates, as changing pid_params.sample_rate_ms via debugfs has no effect now. Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| | * | | cpufreq: Fix and clean up show_cpuinfo_cur_freq()Rafael J. Wysocki2017-03-161-3/+5
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org>
| * | | Merge branch 'x86-acpi-for-linus' of ↵Linus Torvalds2017-03-173-127/+64
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
| | * | | acpi/processor: Check for duplicate processor ids at hotplug timeDou Liyang2017-03-111-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check for duplicate processor ids happens at boot time based on the ACPI table contents, but the final sanity checks for a processor happen at hotplug time. At hotplug time, where the physical information is available, which might differ from the ACPI table information, a check for duplicate processor ids is missing. Add it to the hotplug checks and rename the function so it better reflects its purpose. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-6-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | acpi/processor: Implement DEVICE operator for processor enumerationDou Liyang2017-03-111-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPI allows to declare processors either with the PROCESSOR or with the DEVICE operator. The current implementation handles only the PROCESSOR operator. On a system which uses the DEVICE operator for processor enumeration the evaluation fails. Check for the ACPI type of the ACPI handle and evaluate PROCESSOR and DEVICE types separately. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-5-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | Revert"x86/acpi: Enable MADT APIs to return disabled apicids"Dou Liyang2017-03-111-38/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert: 8ad893faf2ea ("x86/acpi: Enable MADT APIs to return disabled apicids") Remove the leftovers of the boot time 'cpuid <-> nodeid' mapping approach. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-3-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>