unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-20 14:58:16 +02:00

Author	SHA1	Message	Date
Richard Henderson	091b4fa1ff	tcg/i386: Move TCG_REG_CALL_STACK from define to enum Backports commit 66c0285df4270d184afce5ac8b97ac175c89562f from qemu	2018-12-18 05:13:47 -05:00
Richard Henderson	f3a8a4a306	tcg/i386: Always use %ebp for TCG_AREG0 For x86_64, this can remove a REX prefix resulting in smaller code when manipulating globals of type i32, as we move them between backing store via cpu_env, aka TCG_AREG0. Backports commit 5740d9f714835964873325d1210b26811252843f from qemu	2018-12-18 05:13:05 -05:00
Roman Kapl	33e69342e3	tcg/i386: fix vector operations on 32-bit hosts The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers. This was defined as no-op for 32-bit x86, with the assumption that we have eight registers anyway. This assumption is not true once we have xmm regs. Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and have overflown into other opcode fields, wreaking havoc. To trigger these problems, you can try running the "movi d8, #0x0" AArch64 instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated, but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2". Fixes: 770c2fc7bb ("Add vector operations") Backports commit 93bf9a42733321fb632bcb9eafd049ef0e3d9417 from qemu	2018-10-02 04:22:35 -04:00
Richard Henderson	a4c2dbef3e	tcg/i386: Mark xmm registers call-clobbered When host vector registers and operations were introduced, I failed to mark the registers call clobbered as required by the ABI. Fixes: 770c2fc7bb7 Backports commit 672189cd586ea38a2c1d8ab91eb1f9dcff5ceb05 from qemu	2018-07-23 20:00:26 -04:00
John Arbuckle	22c3206738	tcg/i386: Use byte form of xgetbv instruction The assembler in most versions of Mac OS X is pretty old and does not support the xgetbv instruction. To go around this problem, the raw encoding of the instruction is used instead. Backports commit 1019242af11400252f6735ca71a35f81ac23a66d from qemu	2018-06-28 13:23:32 -05:00
Richard Henderson	33f7f6f09a	tcg/i386: Fix dup_vec in non-AVX2 codepath The VPUNPCKLD* instructions are all "non-destructive source", indicated by "NDS" in the encoding string in the x86 ISA manual. This means that they take two source operands, one of which is encoded in the VEX.vvvv field. We were incorrectly treating them as if they were destructive-source and passing 0 as the 'v' argument of tcg_out_vex_modrm(). This meant we were always using %xmm0 as one of the source operands, causing incorrect results if the register allocator happened to want to use something else. For instance the input AArch64 insn: DUP v26.16b, w21 which becomes TCG IR ops: dup_vec v128,e8,tmp2,x21 st_vec v128,e8,tmp2,env,$0xa40 was assembled to: 0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0 0x607c5694: 00 0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1 0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1 0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1 0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14) 0x607c56aa: 00 when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1". This resulted in our incorrectly setting the output vector to q26=0000320000003200:0000320000003200 when given an input of x21 == 0000000002803200 rather than the expected all-zeroes. Pass the correct source register number to tcg_out_vex_modrm() for these insns. Backports commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768 from qemu	2018-05-11 11:22:38 -04:00
Lioncash	6bdfeb35ec	tcg/i386: Perform comparison pass against qemu Ensures formatting and code are consistent.	2018-03-20 06:29:06 -04:00
Richard Henderson	2310bd4887	tcg/i386: Support INDEX_op_dup2_vec for -m32 Unknown why -m32 was passing with gcc but not clang; it should have failed for both. This would be used for tcg_gen_dup_i64_vec, and visible with the right TB and an aarch64 guest. Backports commit 7f34ed4bcdfda55f978f51aadca64aa970c9f4b6 from qemu	2018-03-17 20:22:24 -04:00
Lioncash	b28c64ed34	tcg/i386: Amend bad merge	2018-03-12 10:11:03 -04:00
Richard Henderson	a16ee979fc	tcg/i386: Always use TZCNT when available I think this is cleaner than sometimes using BSF. Backports commit 39f099ec9d6d420b6fe6f7f4f8ed80ae29c65ff2 from qemu	2018-03-12 05:11:42 -04:00
Richard Henderson	7e327aaf84	util: Introduce include/qemu/cpuid.h Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu	2018-03-09 12:12:00 -05:00
Richard Henderson	b3e89e9996	tcg/i386: Add vector operations The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Backports commit 770c2fc7bb70804ae9869995fd02dadd6d7656ac from qemu	2018-03-07 08:07:40 -05:00
Emilio G. Cota	3cf23eb256	tcg/i386: constify tcg_target_callee_save_regs Backports commit e268f4c036d2b47a4f8bf293c1371b328e03ca04 from qemu	2018-03-05 02:08:02 -05:00
Richard Henderson	fc8b4316a9	tcg: Remove tcg_regset_set32 It's not even clear what the interface REG and VAL32 were supposed to mean. All uses had REG = 0 and VAL32 was the bitset assigned to the destination. Backports commit f46934df662182097dce07d57ec00f37e4d2abf1 from qemu	2018-03-04 23:42:59 -05:00
Richard Henderson	49d09d6888	tcg: Remove tcg_regset_clear Backports commit ccb1bb66ea2a42e773bfa04178d8b383ff86d4d8 from qemu	2018-03-04 23:24:45 -05:00
Richard Henderson	b96f53e8a3	tcg/i386: Store out-of-range call targets in constant pool Already it saves 2 bytes per call, but also the constant pool entry may well be shared across multiple calls. Backports commit 4e45f23943c0bb91588627de3801826546155ad8 from qemu	2018-03-04 22:22:49 -05:00
Richard Henderson	f96514a99c	tcg: Rearrange ldst label tracking Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu	2018-03-04 22:13:13 -05:00
Richard Henderson	31b8b67cd3	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. Backports commit a85833933628384d74ec412024d55cf012640287 from qemu	2018-03-04 21:52:35 -05:00
Emilio G. Cota	e4dfb7f807	tcg/i386: implement goto_ptr Backports commit 5cb4ef80f65252dd85b86fa7f3c985015423d670 from qemu	2018-03-02 21:08:38 -05:00
Emilio G. Cota	8f4f15e5f5	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu	2018-03-02 21:05:18 -05:00
Alex Bennée	caba238b5a	tcg: enable MTTCG by default for ARM on x86 hosts This enables the multi-threaded system emulation by default for ARMv7 and ARMv8 guests using the x86_64 TCG backend. This is because on the guest side: - The ARM translate.c/translate-64.c have been converted to - use MTTCG safe atomic primitives - emit the appropriate barrier ops - The ARM machine has been updated to - hold the BQL when modifying shared cross-vCPU state - defer powerctl changes to async safe work All the host backends support the barrier and atomic primitives but need to provide same-or-better support for normal load/store operations. Backports commit ca759f9e387db87e1719911f019bc60c74be9ed8 from qemu	2018-03-02 10:32:47 -05:00
Richard Henderson	4bec129626	tcg/i386: Handle ctpop opcode Backports commit 993508e43e6d180e9ba9b747a9657eac69aec5bb from qemu	2018-03-01 18:49:43 -05:00
Richard Henderson	5f6e7bbdbd	tcg: Add opcode for ctpop The number of actual invocations of ctpop itself does not warrent an opcode, but it is very helpful for POWER7 to use in generating an expansion for ctz. Backports commit a768e4e99247911f00c5c0267c12d4e207d5f6cc from qemu	2018-03-01 18:26:41 -05:00
Richard Henderson	246d891668	tcg/i386: Handle ctz and clz opcodes Backports commit bbf25f90ba802a286fd72be9175a860ae5fec726 from qemu	2018-03-01 16:56:08 -05:00
Richard Henderson	73ab332185	tcg/i386: Allow bmi2 shiftx to have non-matching operands Previously we could not have different constraints for different ISA levels, which prevented us from eliding the matching constraint for shifts. We do now have to make sure that the operands match for constant shifts. We can also handle some small left shifts via lea. Backports commit 6a5aed4bdc7078838a8098336588d56c9ce09d1d from qemu	2018-03-01 16:45:04 -05:00
Richard Henderson	9e3feebbfb	tcg/i386: Hoist common arguments in tcg_out_op Backports commit 42d5b514928a8a0d2f55a4c243d1333f9675815b from qemu	2018-03-01 16:42:30 -05:00
Richard Henderson	142ca07077	tcg/i386: Fuly convert tcg_target_op_def Use a switch instead of searching a table. Share constraints between 32-bit and 64-bit, when at all possible. Backports commit cd26449a505f808e479af4fdd539e05767e09c06 from qemu	2018-03-01 16:32:31 -05:00
Richard Henderson	2cf34e1b55	tcg: Add clz and ctz opcodes Backports commit 0e28d0063bbd9e59a981ea2d20f82f30c5d956a8 from qemu	2018-03-01 16:04:11 -05:00
Richard Henderson	3f38611159	tcg: Pass the opcode width to target_parse_constraint This will let us choose how to interpret a given constraint depending on whether the opcode is 32- or 64-bit. Which will let us share more constraint combinations between opcodes. At the same time, change the interface to return the advanced pointer instead of passing it in/out by reference. Backports commit 069ea736b50b75fdec99c9b8cc603b97bd98419e from qemu	2018-03-01 15:45:40 -05:00
Richard Henderson	b8c93597b4	tcg: Transition flat op_defs array to a target callback This will allow the target to tailor the constraints to the auto-detected ISA extensions. Backports commit f69d277ece43c42c7ab0144c2ff05ba740f6706b from qemu	2018-03-01 15:40:11 -05:00
Richard Henderson	7a7a5c640d	tcg/i386: Implement field extraction opcodes Backports commit 78fdbfb94616f0391834d2eccabd16ea29e37da5 from qemu	2018-03-01 13:35:41 -05:00
Richard Henderson	8e0585dcb1	tcg: Add field extraction primitives Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of fixed position bitfields, much like we already have for deposit. Backports commit 7ec8bab3deae643b1ce579c2d65a244f30708330 from qemu	2018-03-01 13:21:30 -05:00
Richard Henderson	2ab4b8fa4d	tcg/i386: Extend TARGET_PAGE_MASK to the proper type TARGET_PAGE_MASK, as defined, has type "int". We need to extend that to the proper target width before oring in an "unsigned". Backports commit ebb90a005da67147245cd38fb04a965a87a961b7 from qemu	2018-02-26 03:32:38 -05:00
Pranith Kumar	d49bd55f52	tcg/i386: Add support for fence Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of mfence which has similar ordering semantics. Backports commit a7d00d4effb58889ac6df64f98ac50c9d1594149 from qemu	2018-02-26 03:10:58 -05:00
Richard Henderson	91f5cf0417	tcg: Support arbitrary size + alignment Previously we allowed fully unaligned operations, but not operations that are aligned but with less alignment than the operation size. In addition, arm32, ia64, mips, and sparc had been omitted from the previous overalignment patch, which would have led to that alignment being enforced. Backports commit 85aa80813dd9f5c1f581c743e45678a3bee220f8 from qemu	2018-02-26 02:47:26 -05:00
Markus Armbruster	25ec9ab016	tcg: Clean up tcg-target.h header guards These use guard symbols like TCG_TARGET_$target. scripts/clean-header-guards.pl doesn't like them because they don't match their file name (they should, to make guard collisions less likely). Clean them up: use guard symbol $target_TCG_TARGET_H for tcg/$target/tcg-target.h. Backports commit 14e54f8ecfe9c5e17348f456781344737ed10b3b from qemu	2018-02-25 04:15:08 -05:00
Sergey Sorokin	e4d123caa9	tcg: Improve the alignment check infrastructure Some architectures (e.g. ARMv8) need the address which is aligned to a size more than the size of the memory access. To support such check it's enough the current costless alignment check implementation in QEMU, but we need to support an alignment size specifying. Backports commit 1f00b27f17518a1bcb4cedca49eaec96a4d560bd from qemu	2018-02-25 02:23:28 -05:00
Richard Henderson	23586e2674	tcg: Optimize spills of constants While we can store constants via constrants on INDEX_op_st_i32 et al, we weren't able to spill constants to backing store. Add a new backend interface, tcg_out_sti, which may store the constant (and is allowed to fail). Rearrange the temp_* helpers so that we only attempt to directly store a constant when the temp is becoming dead/free. Backports commit 59d7c14eeff8d2ad7f61aed86ce5a176113bc153 from qemu	2018-02-25 01:45:29 -05:00
Sergey Fedorov	e60c24cecf	tcg: Clean up direct block chaining data fields Briefly describe in a comment how direct block chaining is done. It should help in understanding of the following data fields. Rename some fields in TranslationBlock and TCGContext structures to better reflect their purpose (dropping excessive 'tb_' prefix in TranslationBlock but keeping it in TCGContext): tb_next_offset => jmp_reset_offset tb_jmp_offset => jmp_insn_offset tb_next => jmp_target_addr jmp_next => jmp_list_next jmp_first => jmp_list_first Avoid using a magic constant as an invalid offset which is used to indicate that there's no n-th jump generated. Backports commit f309101c26b59641fc1aa8fb2a98a5441cdaea03 from qemu	2018-02-23 21:28:19 -05:00
Sergey Fedorov	5eb2d6618f	tcg/i386: Make direct jump patching thread-safe Ensure direct jump patching in i386 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. Backports commit 0d07abf05e98903c7faf204a9a90f7d45b7554dc from qemu	2018-02-23 21:28:17 -05:00
Aurelien Jarno	6060ab6596	tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code. Backports commit 8d8fdbae010aa75a23f0307172e81034125aba6e from qemu	2018-02-23 13:55:21 -05:00
Aurelien Jarno	355ed7cd08	tcg: use tcg_debug_assert instead of assert (fix performance regression) The TCG code is quite performance sensitive, but at the same time can also be quite tricky. That is why asserts that can be enabled with the --enable-debug-tcg configure option. This used to work the following way: \| #include "config.h" \| \| ... \| \| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG) \| /* define it to suppress various consistency checks (faster) */ \| #define NDEBUG \| #endif \| \| ... \| \| #include <assert.h> Since commit 757e725b (tcg: Clean up includes) "config.h" as been replaced by "qemu/osdep.h" which itself includes <assert.h>. As a consequence the assertions are always enabled, even when using --disable-debug-tcg, causing a performance regression, especially on targets with many registers. For instance on qemu-system-ppc the speed difference is about 15%. tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already uses in some places. This patch replaces all the calls to assert into calss to tcg_debug_assert. Backports commit eabb7b91b36b202b4dac2df2d59d698e3aff197a from qemu	2018-02-23 13:52:13 -05:00
Peter Maydell	764c2d09e5	tcg: Remove unnecessary osdep.h includes from tcg-target.inc.c Commit 757e725b58c57d added a number of #include "qemu/osdep.h" files to the tcg-target.c files (as they were named at the time). These are unnecessary because these files are not standalone C files, and the tcg/tcg.c file which includes them will have already included osdep.h on their behalf. Remove the unneeded include directives. Backports commit c3b7f66800fbf9f47fddbcf2e2cd30ea932e0aae from qemu	2018-02-20 20:41:00 -05:00
Peter Maydell	7784a25470	tcg: Rename tcg-target.c to tcg-target.inc.c Rename the per-architecture tcg-target.c files to tcg-target.inc.c. This makes it clearer that they are not intended to be standalone C files, but are instead #included into another source file. Backports commit ce151109813e2770fd3cee2f37bfa2cdd01a12b9 from qemu	2018-02-20 20:39:57 -05:00
Peter Maydell	4ca19f2cd6	tcg: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 757e725b58c57d3ebb66a31fd2210df977a12154 from qemu	2018-02-19 01:04:30 -05:00
Aurelien Jarno	11cfddad05	tcg/i386: use softmmu fast path for unaligned accesses Softmmu unaligned load/stores currently goes through through the slow path for two reasons: - to support unaligned access on host with strict alignement - to correctly handle accesses crossing pages x86 is only concerned by the second reason. Unaligned accesses are avoided by compilers, but are not uncommon. We therefore would like to see them going through the fast path, if they don't cross pages. For that we can use the fact that two adjacent TLB entries can't contain the same page. Therefore accessing the TLB entry corresponding to the first byte, but comparing its content to page address of the last byte ensures that we don't cross pages. We can do this check without adding more instructions in the TLB code (but increasing its length by one byte) by using the LEA instruction to combine the existing move with the size addition. On an x86-64 host, this gives a 3% boot time improvement for a powerpc guest and 4% for an x86-64 guest. Backports commit 8cc580f6a0d8c0e2f590c1472cf5cd8e51761760 from qemu	2018-02-17 15:23:33 -05:00
Paolo Bonzini	b34c233c2f	tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS This will be used to size the TLB when more than 8 MMU modes are used by the target. Limitations come from the limited size of the immediate fields (which sometimes, as in the case of Aarch64, extend to instructions that shift the immediate). Backports commit 006f8638c62bca2b0caf609485f47fa5e14d8a3c from qemu	2018-02-13 08:28:29 -05:00
Richard Henderson	58e939b91f	tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Backports commit 609ad70562793937257c89d07bf7c1370b9fc9aa from qemu	2018-02-10 23:00:45 -05:00
Aurelien Jarno	f279c93768	tcg: implement real ext_i32_i64 and extu_i32_i64 ops Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a 32-bit value is always converted to a 64-bit value and not propagated through the register allocator or the optimizer. Backports commit 4f2331e5b67af8172419eb1c8db510b497b30a7b from qemu	2018-02-10 22:45:13 -05:00
Aurelien Jarno	80223e7ad5	tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Backports commit 0632e555fc4d281d69cb08d98d500d96185b041f from qemu	2018-02-10 22:29:30 -05:00

1 2

64 Commits