unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-20 20:58:29 +02:00

Author	SHA1	Message	Date
Emilio G. Cota	cb92eea81a	target-arm: emulate aarch64's LL/SC using cmpxchg helpers Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. Portable parallel code, however, is written assuming only cmpxchg--and not LL/SC--is available. This means that in practice emulating LL/SC with cmpxchg is a viable alternative. The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers. This works in both user and system mode. In usermode, it avoids pausing all other CPUs to perform the LL/SC pair. The subsequent performance and scalability improvement is significant, as the plots below show. They plot the throughput of atomic_add-bench compiled for ARM and executed on a 64-core x86 machine. Hi-res plots: http://imgur.com/a/JVc8Y atomic_add-bench: 1000000 ops/thread, [0,1] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + \| 16 ++master +-H--+ ++ \|\| \| 14 ++ ++ \| \| \| 12 ++\| ++ \| \| \| 10 ++++ ++ 8 ++E ++ \|+++ \| 6 ++ \| ++ \| \| \| 4 ++ \| ++ \| \| \| 2 +H++E+--- ++ + \| +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E\| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,2] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + \| 16 ++master +-H--+ ++ \| \| \| 14 ++E ++ \| \| \| 12 ++\| ++ \|+++ \| 10 ++ \| ++ 8 ++ \| ++ \| \| \| 6 ++ \| ++ \| \| \| 4 ++ \| ++ \| +E+--- \| 2 +H+ +E+-----+++ +++ +++ ---+E+-----+E+------+++ +++ + +E+---+--+E+----++E+------+E+--- ++++ +++ + +E\| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,128] range 70 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + \| 60 ++master +-H--+ +++ ---+E+-----+E+------+E+ \| +E+------E-------+E+--- \| \| --- +++ \| 50 ++ +++--- ++ \| -+E+ \| 40 ++ +++---- ++ \| E- \| \| --\| \| 30 ++ -- +++ ++ \| +E+ \| 20 ++E+ ++ \|E+ \| \| \| 10 ++ ++ + + + + + + + \| 0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,1024] range 160 ++---------+---------+----------+---------+----------+----------+---++ +cmpxchg +-E--+ + + + + + \| 140 ++master +-H--+ +++ +++ \| -+E+-----+E+-------E\| 120 ++ +++ ---- +++ \| +++ ----E-- \| 100 ++ --E--- +++ ++ \| +++ ---- +++ \| 80 ++ --E-- ++ \| ---- +++ \| \| -+E+ \| 60 ++ ---- +++ ++ \| +E+- \| 40 ++ -- ++ \| +E+ \| 20 +EE+ ++ +++ + + + + + + \| 0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads Backports commit 1dd089d0eec060dcd8478735114d98421d414805 from qemu	2018-02-28 00:21:27 -05:00
Aleksandar Markovic	6eb4fa54f6	softfloat: Implement run-time-configurable meaning of signaling NaN bit This patch modifies SoftFloat library so that it can be configured in run-time in relation to the meaning of signaling NaN bit, while, at the same time, strictly preserving its behavior on all existing platforms. Background: In floating-point calculations, there is a need for denoting undefined or unrepresentable values. This is achieved by defining certain floating-point numerical values to be NaNs (which stands for "not a number"). For additional reasons, virtually all modern floating-point unit implementations use two kinds of NaNs: quiet and signaling. The binary representations of these two kinds of NaNs, as a rule, differ only in one bit (that bit is, traditionally, the first bit of mantissa). Up to 2008, standards for floating-point did not specify all details about binary representation of NaNs. More specifically, the meaning of the bit that is used for distinguishing between signaling and quiet NaNs was not strictly prescribed. (IEEE 754-2008 was the first floating-point standard that defined that meaning clearly, see [1], p. 35) As a result, different platforms took different approaches, and that presented considerable challenge for multi-platform emulators like QEMU. Mips platform represents the most complex case among QEMU-supported platforms regarding signaling NaN bit. Up to the Release 6 of Mips architecture, "1" in signaling NaN bit denoted signaling NaN, which is opposite to IEEE 754-2008 standard. From Release 6 on, Mips architecture adopted IEEE standard prescription, and "0" denotes signaling NaN. On top of that, Mips architecture for SIMD (also known as MSA, or vector instructions) also specifies signaling bit in accordance to IEEE standard. MSA unit can be implemented with both pre-Release 6 and Release 6 main processor units. QEMU uses SoftFloat library to implement various floating-point-related instructions on all platforms. The current QEMU implementation allows for defining meaning of signaling NaN bit during build time, and is implemented via preprocessor macro called SNAN_BIT_IS_ONE. On the other hand, the change in this patch enables SoftFloat library to be configured in run-time. This configuration is meant to occur during CPU initialization, at the moment when it is definitely known what desired behavior for particular CPU (or any additional FPUs) is. The change is implemented so that it is consistent with existing implementation of similar cases. This means that structure float_status is used for passing the information about desired signaling NaN bit on each invocation of SoftFloat functions. The additional field in float_status is called snan_bit_is_one, which supersedes macro SNAN_BIT_IS_ONE. IMPORTANT: This change is not meant to create any change in emulator behavior or functionality on any platform. It just provides the means for SoftFloat library to be used in a more flexible way - in other words, it will just prepare SoftFloat library for usage related to Mips platform and its specifics regarding signaling bit meaning, which is done in some of subsequent patches from this series. Further break down of changes: 1) Added field snan_bit_is_one to the structure float_status, and correspondent setter function set_snan_bit_is_one(). 2) Constants <float16\|float32\|float64\|floatx80\|float128>_default_nan (used both internally and externally) converted to functions <float16\|float32\|float64\|floatx80\|float128>_default_nan(float_status). This is necessary since they are dependent on signaling bit meaning. At the same time, for the sake of code cleanup and simplicity, constants <floatx80\|float128>_default_nan_<low\|high> (used only internally within SoftFloat library) are removed, as not needed. 3) Added a float_status argument to SoftFloat library functions XXX_is_quiet_nan(XXX a_), XXX_is_signaling_nan(XXX a_), XXX_maybe_silence_nan(XXX a_). This argument must be present in order to enable correct invocation of new version of functions XXX_default_nan(). (XXX is <float16\|float32\|float64\|floatx80\|float128> here) 4) Updated code for all platforms to reflect changes in SoftFloat library. This change is twofolds: it includes modifications of SoftFloat library functions invocations, and an addition of invocation of function set_snan_bit_is_one() during CPU initialization, with arguments that are appropriate for each particular platform. It was established that all platforms zero their main CPU data structures, so snan_bit_is_one(0) in appropriate places is not added, as it is not needed. [1] "IEEE Standard for Floating-Point Arithmetic", IEEE Computer Society, August 29, 2008. Backports commit af39bc8c49224771ec0d38f1b693ea78e221d7bc from qemu	2018-02-24 20:27:12 -05:00
Paolo Bonzini	9485b7c2e1	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu	2018-02-24 02:39:08 -05:00
Peter Maydell	e1925bb5fb	target-arm: Move aarch64_cpu_do_interrupt() to helper.c Move the aarch64_cpu_do_interrupt() function to helper.c. We want to be able to call this from code that isn't AArch64-only, and the move allows us to avoid awkward #ifdeffery at the callsite. Backports commit f3a9b6945cbbb23f3a70da14e9ffdf1e60c580a8 from qemu	2018-02-18 22:23:06 -05:00
Peter Maydell	cd5c4037ac	target-arm: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 74c21bd07491739c6e56bcb1f962e4df730e77f3 from qemu	2018-02-17 21:09:32 -05:00
Edgar E. Iglesias	fa908ea3d3	target-arm: Log the target EL when taking exceptions Log the target EL when taking exceptions. This is useful when debugging guest SW or QEMU itself while transitioning through the various ELs. Backports commit dbc29a868cf5b7e6fa7bb2e6c4f188b9470779c5 from qemu	2018-02-17 15:23:36 -05:00
Peter Maydell	1b88e0e8c8	target-arm: Wire up HLT 0xf000 as the A64 semihosting instruction For the A64 instruction set, the semihosting call instruction is 'HLT 0xf000'. Wire this up to call do_arm_semihosting() if semihosting is enabled. Backports commit 8012c84ff92a36d05dfe61af9b24dd01a7ea25e4 from qemu	2018-02-17 15:23:34 -05:00
Soren Brinkmann	fd2ac3058f	target-arm: A64: Print ELR when taking exceptions When taking an exception print the content of the exception link register. This is useful especially for synchronous exceptions because in that case this registers holds the address of the instruction that generated the exception. Backports commit b21ab1fc217b4a2b8f2f85d16bdd8510a7817a34 from qemu	2018-02-17 15:23:13 -05:00
Greg Bellows	5ad81f095a	target-arm: Update interrupt handling to use target EL Updated the interrupt handling to utilize and report through the target EL exception field. This includes consolidating and cleaning up code where needed. Target EL is now calculated once in arm_cpu_exec_interrupt() and do_interrupt was updated to use the target_el exception field. The necessary code from arm_excp_target_el() was merged in where needed and the function removed. Backports commit 012a906b19e99b126403ff4a257617dab9b34163 from qemu	2018-02-12 22:42:37 -05:00
Peter Maydell	d723e590f2	target-arm: Store SPSR_EL1 state in banked_spsr[1] (SPSR_svc) The AArch64 SPSR_EL1 register is architecturally mandated to be mapped to the AArch32 SPSR_svc register. This means its state should live in QEMU's env->banked_spsr[1] field. Correct the various places in the code that incorrectly put it in banked_spsr[0]. Backports commit 7847f9ea9fce15a9ecfb62ab72c1e84ff516b0db from qemu	2018-02-12 16:36:44 -05:00
Greg Bellows	8612f1d3e7	target-arm: Add 32/64-bit register sync Add AArch32 to AArch64 register sychronization functions. Replace manual register synchronization with new functions in aarch64_cpu_do_interrupt() and HELPER(exception_return)(). Backports commit ce02049dbf1828b4bc77d921b108a9d84246e5aa from qemu	2018-02-12 14:57:20 -05:00
Peter Maydell	1c275134ae	target-arm: Squash input denormals in FRECPS and FRSQRTS The helper functions for FRECPS and FRSQRTS have special case handling that includes checks for zero inputs, so squash input denormals if necessary before those checks. This fixes incorrect output when the FPCR DZ bit is set to enable squashing of input denormals. Backports commit a8eb6e19991d1a7a6a7b04ac447548d30d75eb4a from qemu	2018-02-12 11:04:28 -05:00
Lioncash	3791fc69fd	target-arm: Use new revbit functions Backports commit 42fedbca8f5b54324ed89be3484d4a3dc9946387 from qemu	2018-02-11 02:57:55 -05:00
Lioncash	0f453b0595	target/arm: Add aa{32, 64}_vfp_{dreg, qreg} helpers Backports commit 9a2b5256ea1f68c89d5da4b54f180f576c2c82d6 from qemu	2018-02-07 10:09:26 -05:00
Lioncash	438e2836e0	helper_a64: Fix CRC32's implementation	2018-01-29 09:24:36 -05:00
xorstream	770c5616e2	Automated leading tab to spaces conversion.	2017-01-21 12:28:22 +11:00
pancake	fe96e8325b	Remove unused zlib dependency	2016-06-15 09:24:16 +02:00
Nguyen Anh Quynh	344d016104	import	2015-08-21 15:04:50 +08:00

18 Commits