summaryrefslogtreecommitdiffstats
path: root/drivers/misc/habanalabs
Commit message (Collapse)AuthorAgeFilesLines
* habanalabs: fix device IRQ unmasking for BE hostBen Segal2019-09-061-9/+24
| | | | | | | | | | | | | | | | | [ Upstream commit b421d83a3947369fd5718824aecfaebe1efbf7ed ] When unmasking IRQs inside the ASIC, the driver passes an array of all the IRQ to unmask. The ASIC's CPU is working in LE so when running in a BE host, the driver needs to do the proper endianness swapping when preparing this array. In addition, this patch also fixes the endianness of a couple of kernel log debug messages that print values of packets Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix endianness handling for internal QMAN submissionOded Gabbay2019-09-064-15/+17
| | | | | | | | | | | | | | | | | | [ Upstream commit b9040c99414ba5b85090595a61abc686a5dbb388 ] The PQs of internal H/W queues (QMANs) can be located in different memory areas for different ASICs. Therefore, when writing PQEs, we need to use the correct function according to the location of the PQ. e.g. if the PQ is located in the device's memory (SRAM or DRAM), we need to use memcpy_toio() so it would work in architectures that have separate address ranges for IO memory. This patch makes the code that writes the PQE to be ASIC-specific so we can handle this properly per ASIC. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Tested-by: Ben Segal <bpsegal20@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix completion queue handling when host is BEBen Segal2019-09-061-14/+13
| | | | | | | | | | | | | [ Upstream commit 4e87334a0ef43663019dbaf3638ad10fd8c3320c ] This patch fix the CQ irq handler to work in hosts with BE architecture. It adds the correct endian-swapping macros around the relevant memory accesses. Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix endianness handling for packets from userBen Segal2019-09-062-13/+32
| | | | | | | | | | | | | | | | [ Upstream commit 213ad5ad016a0da975b35f54e8cd236c3b04724b ] Packets that arrive from the user and need to be parsed by the driver are assumed to be in LE format. This patch fix all the places where the code handles these packets and use the correct endianness macros to handle them, as the driver handles the packets in CPU format (LE or BE depending on the arch). Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix DRAM usage accounting on context tear downTomer Tayar2019-09-061-0/+2
| | | | | | | | | | | | | [ Upstream commit c8113756ba27298d6e95403c087dc5881b419a99 ] The patch fix the DRAM usage accounting by adding a missing update of the DRAM memory consumption, when a context is being torn down without an organized release of the allocated memory. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix F/W download in BE architectureBen Segal2019-08-291-17/+2
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 75035fe22b808a520e1d712ebe913684ba406e01 ] writeX macros might perform byte-swapping in BE architectures. As our F/W is in LE format, we need to make sure no byte-swapping will occur. There is a standard kernel function (called memcpy_toio) for copying data to I/O area which is used in a lot of drivers to download F/W to PCIe adapters. That function also makes sure the data is copied "as-is", without byte-swapping. This patch use that function to copy the F/W to the GOYA ASIC instead of writeX macros. Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: use u64_to_user_ptr() for reading user pointersArnd Bergmann2019-06-201-1/+1
| | | | | | | | | | | | | | | | We cannot cast a 64-bit integer to a pointer on 32-bit architectures without a warning: drivers/misc/habanalabs/habanalabs_ioctl.c: In function 'debug_coresight': drivers/misc/habanalabs/habanalabs_ioctl.c:143:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] input = memdup_user((const void __user *) args->input_ptr, Use the macro that was defined for this purpose. Fixes: 315bc055ed56 ("habanalabs: add new IOCTL for debug, tracing and profiling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* Merge tag 'char-misc-5.2-rc4' of ↵Linus Torvalds2019-06-089-59/+65
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.2-rc4 to resolve a number of reported issues. The most "notable" one here is the kernel headers in proc^Wsysfs fixes. Those changes move the header file info into sysfs and fixes the build issues that you reported. Other than that, a bunch of small habanalabs driver fixes, some fpga driver fixes, and a few other tiny driver fixes. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: habanalabs: Read upper bits of trace buffer from RWPHI habanalabs: Fix virtual address access via debugfs for 2MB pages fpga: zynqmp-fpga: Correctly handle error pointer habanalabs: fix bug in checking huge page optimization habanalabs: Avoid using a non-initialized MMU cache mutex habanalabs: fix debugfs code uapi/habanalabs: add opcode for enable/disable device debug mode habanalabs: halt debug engines on user process close test_firmware: Use correct snprintf() limit genwqe: Prevent an integer overflow in the ioctl parport: Fix mem leak in parport_register_dev_model fpga: dfl: expand minor range when registering chrdev region fpga: dfl: Add lockdep classes for pdata->lock fpga: dfl: afu: Pass the correct device to dma_mapping_error() fpga: stratix10-soc: fix use-after-free on s10_init() w1: ds2408: Fix typo after 49695ac46861 (reset on output_write retry with readback) kheaders: Do not regenerate archive if config is not changed kheaders: Move from proc to sysfs lkdtm/bugs: Adjust recursion test to avoid elision lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
| * habanalabs: Read upper bits of trace buffer from RWPHITomer Tayar2019-06-041-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | The trace buffer address is 40 bits wide. The end of the buffer is set in the RWP register (lower 32 bits), and in the RWPHI register (upper 8 bits). Currently only the lower 32 bits are read, and this patch fixes it and concatenates the upper 8 bits to the output address. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
| * habanalabs: Fix virtual address access via debugfs for 2MB pagesTomer Tayar2019-06-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | The debugfs interface for accessing DRAM virtual addresses currently uses the 12 LSBs of a virtual address as an offset. However, it should use the 20 LSBs in case the device MMU page size is 2MB instead of 4KB. This patch fixes the offset calculation to be based on the page size. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
| * habanalabs: fix bug in checking huge page optimizationOded Gabbay2019-05-281-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a bug in the mmu code that checks whether we can use huge page mappings for host pages. The code is supposed to enable huge page mappings only if ALL DMA addresses are aligned to 2MB AND the number of pages in each DMA chunk is a modulo of the number of pages in 2MB. However, the code ignored the first requirement for the first DMA chunk. This patch fix that issue by making sure the requirement of address alignment is validated against all DMA chunks. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
| * habanalabs: Avoid using a non-initialized MMU cache mutexTomer Tayar2019-05-242-7/+3
| | | | | | | | | | | | | | | | | | | | The MMU cache mutex is used in the ASIC hw_init() functions, but it is initialized only later in hl_mmu_init(). This patch prevents it by moving the initialization to the device_early_init() function. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
| * habanalabs: fix debugfs codeJann Horn2019-05-241-42/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes multiple things in the habanalabs debugfs code, in particular: - mmu_write() was unnecessarily verbose, copying around between multiple buffers - mmu_write() could write a user-specified, unbounded amount of userspace memory into a kernel buffer (out-of-bounds write) - multiple debugfs read handlers ignored the user-supplied count, potentially corrupting out-of-bounds userspace data - hl_device_read() was unnecessarily verbose - hl_device_write() could read uninitialized stack memory - multiple debugfs read handlers copied terminating null characters to userspace Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Cc: stable@vger.kernel.org
| * habanalabs: halt debug engines on user process closeOmer Shpigelman2019-05-245-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a potential bug where a user's process has closed unexpectedly without disabling the debug engines. In that case, the debug engines might continue running but because the user's MMU mappings are going away, we will get page fault errors. This behavior is also opposed to the general rule where nothing runs on the device after the user process closes. The patch stops the debug H/W engines upon process termination and thus makes sure nothing runs on the device after the process goes away. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2019-05-213-0/+3
|/ | | | | | | | | | | | | | Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* habanalabs: Update CPU DMA memory label nameTomer Tayar2019-05-021-2/+2
| | | | | | | | | The CPU accessible DMA memory is general and not used only for PQ. Accordingly, this patch renames the "free_cpu_pq_dma_mem" label with "free_cpu_dma_mem". Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: Update CPU DMA pool label nameTomer Tayar2019-05-021-2/+2
| | | | | | | | | The CPU accessible DMA pool is general and not used only for PQ. Accordingly, this patch rename the "free_cpu_pq_pool" label with "free_cpu_accessible_dma_pool". Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: increase timeout if working with simulatorDalit Ben Zoor2019-04-302-2/+13
| | | | | | | | | | | Where there is a spike in the CPU consumption, it may cause random failures in the C/I since the KMD timeout for CPU and/or QMAN0 jobs expires and it stops communicating to the simulator. This commit fixes it by increasing timeout on polling functions if working with simulator. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: remove condition that is always trueDalit Ben Zoor2019-05-011-25/+23
| | | | | | | | | | After removing the parsing of the command submission when doing memset of the device memory, goya_validate_dma_pkt_host is never called by the kernel, so there is no need to check context id. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: remove redundant member from parser structDalit Ben Zoor2019-05-013-5/+1
| | | | | | | | | | use_virt_addr member was used for telling whether to treat the addresses in the CB as virtual during parsing. We disabled it only when calling the parser from the driver memset device function, and since this call had been removed, it should always be enabled. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: Manipulate DMA addresses in ASIC functionsTomer Tayar2019-05-016-56/+72
| | | | | | | | | | | | | | | | Routing device accesses to the host memory requires the usage of a base offset, which is canceled by the iATU just before leaving the device. The value of the base offset might be distinctive between different ASIC types. The manipulation of the addresses is currently used throughout the driver code, and one should be aware to it whenever providing a host memory address to the device. This patch removes this manipulation from the driver common code, and moves it to the ASIC specific functions that are responsible for host memory allocation/mapping. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: rename functions to improve code readabilityOded Gabbay2019-05-015-36/+38
| | | | | | | | | | | This patch renames four functions in the ASIC-specific functions section, so it will be easier to differentiate them from the generic kernel functions with the same name. This will help in future code reviews, to make sure we don't use the kernel functions directly. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: remove call to cs_parser()Dalit Ben Zoor2019-04-301-24/+3
| | | | | | | | | | There is no need to parse the command submission when doing memset of the device memory using the DMA engine because only the driver calls the memset function and therefore, the CS is trusted and doesn't require validation and patching. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: Use single pool for CPU accessible host memoryTomer Tayar2019-04-284-22/+48
| | | | | | | | | | | | | The device's CPU accessible memory on host is managed in a dedicated pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) - which are allocated from generic DMA pools. Due to address length limitations of the CPU, the addresses of all these memory regions must have the same MSBs starting at bit 40. This patch modifies the allocation of the PQ and EQ to be also from the dedicated pool, to ensure compliance with the limitation. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: return old dram bar address upon changeOded Gabbay2019-04-283-35/+35
| | | | | | | | | | | | | This patch changes the ASIC interface function that changes the DRAM bar window. The change is to return the old address that the DRAM bar pointed to instead of an error code. This simplifies the code that use this function (mainly in debugfs) to restore the bar to the old setting. This is also needed for easier support in future ASICs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: rename restore to ctx_switch when appropriateOded Gabbay2019-04-254-21/+22
| | | | | | | | | | | | This patch only does renaming of certain variables and structure members, and their accompanied comments. This is done to better reflect the actions these variables and members represent. There is no functional change in this patch. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: use ASIC functions interface for rreg/wregOded Gabbay2019-04-223-44/+71
| | | | | | | | | | | | | | | | | | | | | This patch slightly changes the macros of RREG32 and WREG32, which are used when reading or writing from registers. Instead of directly calling a function in the common code from these macros, the new code calls a function from the ASIC functions interface. This change allows us to share much more code between real ASICs and simulators, which in turn reduces the maintenance burden and the chances for forgetting to port code between the ASIC files. The patch also implements the hl_poll_timeout macro, instead of calling the generic readl_poll_timeout macro. This is required to allow use of this macro in the simulator files. As a result from this change, more functions in goya.c are shared with the simulator and therefore, should not be defined as static. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* uapi/habanalabs: add missing fields in bmon paramsOded Gabbay2019-04-211-4/+12
| | | | | | | | | | | This patch adds missing fields of start address 0 and 1 in the bmon parameter structure that is received from the user in the debug IOCTL. Without these fields, the functionality of the bmon trace is broken, because there is no configuration of the base address of the filter of the bus monitor. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: re-factor goya_parse_cb_no_ext_queue()Oded Gabbay2019-04-211-22/+21
| | | | | | | | | | This patch re-factors goya_parse_cb_no_ext_queue() to make it more readable by inverting the check inside the first if statement so the bulk of the function won't be inside an if statement. The patch also fixes a spelling error in the name of the function. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* habanalabs: Cancel pr_fmt() definition dependency on includes orderTomer Tayar2019-04-103-2/+4
| | | | | | | | | | | | | pr_fmt() should be defined before including linux/printk.h, either directly or indirectly, in order to avoid redefinition of the macro. Currently the macro definition is in habanalabs.h, which is included in many files, and that makes the addition/reorder of includes to be prone to compilation errors. This patch cancels this dependency by defining the macro only in the few source files that use it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* Merge 5.1-rc6 into char-misc-nextGreg Kroah-Hartman2019-04-211-5/+4
|\ | | | | | | | | | | | | We want the fixes, and this resolves a merge error in the fastrpc driver. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * habanalabs: remove low credit limit of DMA #0Oded Gabbay2019-03-241-5/+4
| | | | | | | | | | | | | | | | Because DMA #0 is now used by the user, remove the limitation of credits from this channel. Without this patch, this channel is pretty much unusable due to its very low bandwidth configuration. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: prevent device PTE read/write during hard-resetOded Gabbay2019-04-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During hard-reset, contexts are closed as part of the tear-down process. After a context is closed, the driver cleans up the page tables of that context in the device's DRAM. This action is both dangerous and unnecessary. It is unnecessary, because the device is going through a hard-reset, which means the device's DRAM contents are no longer valid and the device's MMU is being reset. It is dangerous, because if the hard-reset came as a result of a PCI freeze, this action may cause the entire host machine to hang. Therefore, prevent all device PTE updates when a hard-reset operation is pending. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: improve IOCTLs behavior when disabled or resetOded Gabbay2019-04-064-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes some improvement in how IOCTLs behave when the device is disabled or under reset. The new code checks, at the start of every IOCTL, if the device is disabled or in reset. If so, it prints an appropriate kernel message and returns -EBUSY to user-space. In addition, the code modifies the location of where the hard_reset_pending flag is being set or cleared: 1. It is now cleared immediately after the reset *tear-down* flow is finished but before the re-initialization flow begins. 2. It is being set in the remove function of the device, to make the behavior the same with the hard-reset flow There are two exceptions to the disable or in reset check: 1. The HL_INFO_DEVICE_STATUS opcode in the INFO IOCTL. This opcode allows the user to inquire about the status of the device, whether it is operational, in reset or malfunction (disabled). If the driver will block this IOCTL, the user won't be able to retrieve the status in case of malfunction or in reset. 2. The WAIT_FOR_CS IOCTL. This IOCTL allows the user to inquire about the status of a CS. We want to allow the user to continue to do so, even if we started a soft-reset process because it will allow the user to get the correct error code for each CS he submitted. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: all FD must be closed before removing deviceOded Gabbay2019-04-061-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug in the implementation of the function that removes the device. The bug can happen when the device is removed but not the driver itself (e.g. remove by the OS due to PCI freeze in Power architecture). In that case, there maybe open users that are calling IOCTLs while the device is removed. This is a possible race condition that the driver must handle. Otherwise, a kernel panic may occur. This race is prevented in the hard-reset flow, because the driver makes sure the users are closed before continuing with the hard-reset. This race can not occur when the driver itself is removed because the OS makes sure all the file descriptors are closed. The fix is to make sure the open users close their file descriptors and if they don't (after a certain amount of time), the driver sends them a SIGKILL, because the remove of the device can't be stopped. The patch re-uses the same code that is called from the hard-reset flow. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: split mmu/no-mmu code paths in memory ioctlOded Gabbay2019-04-041-85/+92
| | | | | | | | | | | | | | | | To make the memory ioctl code more readable, this patch moves the legacy/debug code path of mmu-disabled to a separate function, which is called (if necessary) from the main memory ioctl function. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: ASIC_AUTO_DETECT enum value is redundantOded Gabbay2019-04-042-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the enum value of ASIC_AUTO_DETECT because we can use the validity of the pdev variable to know whether we have a real device or a simulator. For a real device, we detect the asic type from the device ID while for a simulator, the simulator code calls create_hdev() with the specified ASIC type. Set ASIC_INVALID as the first option in the enum to make sure that no other enum value will receive the value 0 (which indicates a non-existing entry in the simulator array). Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: refactoring in goya.cOded Gabbay2019-04-023-81/+88
| | | | | | | | | | | | | | | | | | | | This patch does some refactoring in goya.c to make code more reusable between goya code and the goya simulator code (which is not upstreamed). In addition, the patch removes some dead functions from goya.c which are not used by the current upstream code Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: add goya implementation for debug configurationOmer Shpigelman2019-04-016-6/+1122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ASIC-specific function for GOYA to configure the coresight components. Most of the components have an enabled/disabled flag, depending on whether the user wants to enable the component or disable it. For some of the components, such as ETR and SPMU, the user can also request to read values from them. Those values are needed for the user to parse the trace data. The ETR configuration is also checked for security purposes, to make sure the trace data is written to the device's DRAM. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: add new IOCTL for debug, tracing and profilingOmer Shpigelman2019-04-016-2/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Habanalabs ASICs use the ARM coresight infrastructure to support debug, tracing and profiling of neural networks topologies. Because the coresight is configured using register writes and reads, and some of the registers hold sensitive information (e.g. the address in the device's DRAM where the trace data is written to), the user must go through the kernel driver to configure this mechanism. This patch implements the common code of the IOCTL and calls the ASIC-specific function for the actual H/W configuration. The IOCTL supports configuration of seven coresight components: ETR, ETF, STM, FUNNEL, BMON, SPMU and TIMESTAMP The user specifies which component he wishes to configure and provides a pointer to a structure (located in its process space) that contains the relevant configuration. The common code copies the relevant data from the user-space to kernel space and then calls the ASIC-specific function to do the H/W configuration. After the configuration is done, which is usually composed of several IOCTL calls depending on what the user wanted to trace, the user can start executing the topology. The trace data will be written to the user's area in the device's DRAM. After the tracing operation is complete, and user will call the IOCTL again to disable the tracing operation. The user also need to read values from registers for some of the components (e.g. the size of the trace data in the device's DRAM). In that case, the user will provide a pointer to an "output" structure in user-space, which the IOCTL code will fill according the to selected component. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: remove extra semicolonOded Gabbay2019-04-021-1/+1
| | | | | | | | | | | | | | This patch removes an extra ; after the closing brackets of a while loop. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
* | habanalabs: prevent CPU soft lockup on PalladiumOded Gabbay2019-03-311-2/+9
| | | | | | | | | | | | | | | | | | | | Unmapping ptes in the device MMU on Palladium can take a long time, which can cause a kernel BUG of CPU soft lockup. This patch minimize the chances for this bug by sleeping a little between unmapping ptes. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: remove trailing blank line from EOFOded Gabbay2019-03-3191-91/+0
| | | | | | | | | | | | | | | | GIT does not like extra blank lines at the end of the file, so this patch removes those lines. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
* | habanalabs: improve error messagesOded Gabbay2019-03-272-2/+3
| | | | | | | | | | | | | | This patch improves two error messages to help the user to better understand what error occurred. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: add device status option to INFO IOCTLDalit Ben Zoor2019-03-243-0/+35
| | | | | | | | | | | | | | | | | | | | This patch adds a new opcode to INFO IOCTL that returns the device status. This will allow users to query the device status in order to avoid sending command submissions while device is in reset. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: allow user to modify TPC clock relaxation valueDalit Ben Zoor2019-03-211-2/+12
| | | | | | | | | | | | | | | | | | | | | | This patch allows the user to modify the TPC PLL clock relaxation value on-the-fly in order to reduce power consumption. To enable this, the patch removes the protection from the specific register that controls this behavior. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: set new golden value to tpc clock relaxationDalit Ben Zoor2019-03-201-0/+2
| | | | | | | | | | | | | | | | On init or context switch, set TPC clock relaxation counter register to a golden value. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: never fail hard reset of deviceOded Gabbay2019-03-171-10/+9
| | | | | | | | | | | | | | | | | | | | Hard-reset of our device should never fail, due to dangers of permanent damage to the H/W. This patch removes the last place in the reset path where the driver might exit before doing the actual reset. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: keep track of the device's dma maskOded Gabbay2019-03-074-39/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the code that is responsible to set the DMA mask for the device. Upon each change of the dma mask, the driver will save the new value that was set. This is needed in order to make sure we don't try to increase the mask a second time, in case we failed in the first time. This is especially relevant for Power machines, as that may cause a change in configuration of the TVT which will break the device. Goya will first try to set the device's dma mask to 39 bits, so that the memory that is allocated on the host machine for communication with the device's cpu will be in a bus address which is lower then 39 bits. Later, Goya will try to increase that mask to 48 bits, but only if setting the mask to 39 bits was successful. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* | habanalabs: add MMU shadow mappingOmer Shpigelman2019-02-243-284/+356
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds shadow mapping to the MMU module. The shadow mapping allows traversing the page table in host memory rather reading each PTE from the device memory. It brings better performance and avoids reading from invalid device address upon PCI errors. Only at the end of map/unmap flow, writings to the device are performed in order to sync the H/W page tables with the shadow ones. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>