unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-19 23:28:24 +02:00

Author	SHA1	Message	Date
Emilio G. Cota	0567c69235	tcg: Drop nargs from tcg_op_insert_{before,after} It's unused since 75e8b9b7aa0b95a761b9add7e2f09248b101a392. Backports commit ac1043f6d607aaac206c8aac42bc32f634f59395 from qemu	2018-12-18 06:00:13 -05:00
Richard Henderson	fdb3d6488e	tcg/optimize: Optimize bswap Somehow we forgot these operations, once upon a time. This will allow immediate stores to have their bswap optimized away. Backports commit 6498594c8eda83c5f5915afc34bd03396f8de6df from qemu	2018-12-18 05:49:29 -05:00
Richard Henderson	9e8c8a617b	tcg/optimize: Do not skip default processing of dup_vec If we do not opimize away dup_vec, we must mark its output as changed. Backports commit 1fb57da72ae0886eba1234a2d98ddd10e88a9efc from qemu	2018-08-09 00:53:07 -04:00
Lioncash	99dbbf1571	tcg/optimize: Perform comparison pass with qemu Keeps formatting and code synced	2018-03-12 18:06:29 -04:00
Richard Henderson	7f55d6ed69	tcg/optimize: Handle vector opcodes during optimize Trivial move and constant propagation. Some identity and constant function folding, but nothing that requires knowledge of the size of the vector element. Backports commit 170ba88f45bd7b1c5593021ed8e174f663b0bd1a from qemu	2018-03-06 16:10:09 -05:00
Richard Henderson	140058221d	tcg: Generalize TCGOp parameters We had two fields specific to INDEX_op_call. Rename these and add some macros so that the fields may be reused for other opcodes. Backports commit cd9090aa9dbba30db8aec9a2fc103aaf1ab0f5a7 from qemu	2018-03-05 16:53:50 -05:00
Richard Henderson	7fe5f620df	tcg: Dynamically allocate TCGOps With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Backports commit 15fa08f8451babc88d733bd411d4c94976f9d0f8 from qemu	2018-03-05 16:34:40 -05:00
Richard Henderson	ab9df6244c	tcg: Use offsets not indices for TCGv_* Using the offset of a temporary, relative to TCGContext, rather than its index means that we don't use 0. That leaves offset 0 free for a NULL representation without having to leave index 0 unused. Backports commit e89b28a63501c0ad6d2501fe851d0c5202055e70 from qemu	2018-03-05 10:12:08 -05:00
Richard Henderson	9f8c6a456b	tcg: Use per-temp state data in optimize While we're touching many of the lines anyway, adjust the naming of the functions to better distinguish when "TCGArg" vs "TCGTemp" should be used. Backports commit 6349039d0b06eda59820629b934944246b14a1c1 from qemu	2018-03-05 08:24:06 -05:00
Richard Henderson	010ded3088	tcg: Add temp_global bit to TCGTemp This avoids needing to test the index of a temp against nb_globals. Backports commit fa477d25470187030614288d35bc734edffa41ee from qemu	2018-03-05 07:21:10 -05:00
Richard Henderson	a9c46ad7a0	tcg: Introduce arg_temp Backports commit 434391390ba99996af1591b427a73b3f5c05065e from qemu	2018-03-05 07:17:44 -05:00
Richard Henderson	845cfc2ae9	tcg: Propagate args to op->args in optimizer Backports commit acd937019bdaf933fcf1a7b57679ba07119c89b7 from qemu	2018-03-05 06:56:06 -05:00
Richard Henderson	eb488f5bd6	tcg: Merge opcode arguments into TCGOp Rather than have a separate buffer of 10*max_ops entries, give each opcode 10 entries. The result is actually a bit smaller and should have slightly more cache locality. Backports commit 75e8b9b7aa0b95a761b9add7e2f09248b101a392 from qemu	2018-03-05 04:45:20 -05:00
Richard Henderson	5f6e7bbdbd	tcg: Add opcode for ctpop The number of actual invocations of ctpop itself does not warrent an opcode, but it is very helpful for POWER7 to use in generating an expansion for ctz. Backports commit a768e4e99247911f00c5c0267c12d4e207d5f6cc from qemu	2018-03-01 18:26:41 -05:00
Richard Henderson	2cf34e1b55	tcg: Add clz and ctz opcodes Backports commit 0e28d0063bbd9e59a981ea2d20f82f30c5d956a8 from qemu	2018-03-01 16:04:11 -05:00
Richard Henderson	199b3859c4	tcg/optimize: Fold movcond 0/1 into setcond Backports commit 333b21b809fc80ce67c8f6a7d1c7cc66437d9791 from qemu	2018-03-01 14:41:38 -05:00
Richard Henderson	8e0585dcb1	tcg: Add field extraction primitives Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of fixed position bitfields, much like we already have for deposit. Backports commit 7ec8bab3deae643b1ce579c2d65a244f30708330 from qemu	2018-03-01 13:21:30 -05:00
Alex Bennée	bf72733576	tcg/optimize: move default return out of if statement This is to appease sanitizer builds which complain that: "error: control reaches end of non-void function" Backports commit 550276ae0a88851edda2cb7fcdd64256dbb8e314 from qemu	2018-02-26 05:05:21 -05:00
Pranith Kumar	16d71f0f10	tcg: Optimize fence instructions This commit optimizes fence instructions. Two optimizations are currently implemented: (1) unnecessary duplicate fence instructions, and (2) merging weaker fences into a stronger fence. [rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only loop over the opcode stream once. Merge "unrelated" weaker barriers into one stronger barrier.] Backports commit 34f939218ce78163171addd63750e1e0300376ab from qemu	2018-02-26 03:29:59 -05:00
Richard Henderson	ede1cae3dc	tcg: Lower indirect registers in a separate pass Rather than rely on recursion during the middle of register allocation, lower indirect registers to loads and stores off the indirect base into plain temps. For an x86_64 host, with sufficient registers, this results in identical code, modulo the actual register assignments. For an i686 host, with insufficient registers, this means that temps can be (temporarily) spilled to the stack in order to satisfy an allocation. This as opposed to the possibility of not being able to spill, to allocate a register for the indirect base, in order to perform a spill. Backports commit 5a18407f55ade924aa6397c9a043a9ffd59645fe from qemu	2018-02-25 22:32:28 -05:00
Richard Henderson	1547048a22	tcg: Reorg TCGOp chaining Instead of using -1 as end of chain, use 0, and link through the 0 entry as a fully circular double-linked list. Backports commit dcb8e75870e2de199db853697f8839cb603beefe from qemu	2018-02-25 21:44:50 -05:00
Paolo Bonzini	58693409ea	exec: extract exec/tb-context.h TCG backends do not need most of exec-all.h; extract what they actually need to a separate file or move it directly to tcg.h. The next patch will stop including exec-all.h from everywhere. Backports commit 00f6da6a1a5d1ce085334eccbb50ec899ceed513 from qemu	2018-02-24 02:09:58 -05:00
Paolo Bonzini	37f26922dd	qemu-common: push cpu.h inclusion out of qemu-common.h Backports commit 33c11879fd422b759483ed25fef133ea900ea8d7 from qemu	2018-02-24 01:50:56 -05:00
Aurelien Jarno	355ed7cd08	tcg: use tcg_debug_assert instead of assert (fix performance regression) The TCG code is quite performance sensitive, but at the same time can also be quite tricky. That is why asserts that can be enabled with the --enable-debug-tcg configure option. This used to work the following way: \| #include "config.h" \| \| ... \| \| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG) \| /* define it to suppress various consistency checks (faster) */ \| #define NDEBUG \| #endif \| \| ... \| \| #include <assert.h> Since commit 757e725b (tcg: Clean up includes) "config.h" as been replaced by "qemu/osdep.h" which itself includes <assert.h>. As a consequence the assertions are always enabled, even when using --disable-debug-tcg, causing a performance regression, especially on targets with many registers. For instance on qemu-system-ppc the speed difference is about 15%. tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already uses in some places. This patch replaces all the calls to assert into calss to tcg_debug_assert. Backports commit eabb7b91b36b202b4dac2df2d59d698e3aff197a from qemu	2018-02-23 13:52:13 -05:00
Peter Maydell	4ca19f2cd6	tcg: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 757e725b58c57d3ebb66a31fd2210df977a12154 from qemu	2018-02-19 01:04:30 -05:00
Lioncash	fb2fe4580f	optimize: Add missing extrh/extrl case	2018-02-11 02:57:55 -05:00
Richard Henderson	58e939b91f	tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Backports commit 609ad70562793937257c89d07bf7c1370b9fc9aa from qemu	2018-02-10 23:00:45 -05:00
Aurelien Jarno	4bd3d5005e	tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops They behave the same as ext32s_i64 and ext32u_i64 from the constant folding and zero propagation point of view, except that they can't be replaced by a mov, so we don't compute the affected value. Backports commit 8bcb5c8f34f9215d4f88f388c7ff14c9bd5cecd3 from qemu	2018-02-10 22:47:26 -05:00
Aurelien Jarno	80223e7ad5	tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Backports commit 0632e555fc4d281d69cb08d98d500d96185b041f from qemu	2018-02-10 22:29:30 -05:00
Aurelien Jarno	5f0920ad0f	tcg/optimize: allow constant to have copies Now that copies and constants are tracked separately, we can allow constant to have copies, deferring the choice to use a register or a constant to the register allocation pass. This prevent this kind of regular constant reloading: -OUT: [size=338] +OUT: [size=298] mov -0x4(%r14),%ebp test %ebp,%ebp jne 0x7ffbe9cb0ed6 mov $0x40002219f8,%rbp mov %rbp,(%r14) - mov $0x40002219f8,%rbp mov $0x4000221a20,%rbx mov %rbp,(%rbx) mov $0x4000000000,%rbp mov %rbp,(%r14) - mov $0x4000000000,%rbp mov $0x4000221d38,%rbx mov %rbp,(%rbx) mov $0x40002221a8,%rbp mov %rbp,(%r14) - mov $0x40002221a8,%rbp mov $0x4000221d40,%rbx mov %rbp,(%rbx) mov $0x4000019170,%rbp mov %rbp,(%r14) - mov $0x4000019170,%rbp mov $0x4000221d48,%rbx mov %rbp,(%rbx) mov $0x40000049ee,%rbp mov %rbp,0x80(%r14) mov %r14,%rdi callq 0x7ffbe99924d0 mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp shl $0x20,%rbp mov (%r14),%rbx mov %ebx,%ebx mov %rbx,(%r14) or %rbx,%rbp mov %rbp,0x10(%r14) mov %rbp,0x90(%r14) mov 0x60(%r14),%rbx mov %rbx,0x38(%r14) mov 0x28(%r14),%rbx mov $0x4000220e60,%r12 mov %rbx,(%r12) mov $0x40002219c8,%rbx mov %rbp,(%rbx) mov 0x20(%r14),%rbp sub $0x8,%rbp mov $0x4000004a16,%rbx mov %rbx,0x0(%rbp) mov %rbp,0x20(%r14) mov $0x19,%ebp mov %ebp,0xa8(%r14) mov $0x4000015110,%rbp mov %rbp,0x80(%r14) xor %eax,%eax jmpq 0x7ffbebcae426 lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3 jmpq 0x7ffbebcae426 Backports commit 299f80130401153af1a6ddb3cc011781bcd47600 from qemu	2018-02-10 22:18:03 -05:00
Aurelien Jarno	59909fe549	tcg/optimize: track const/copy status separately Instead of using an enum which could be either a copy or a const, track them separately. This will be used in the next patch. Constants are tracked through a bool. Copies are tracked by initializing temp's next_copy and prev_copy to itself, allowing to simplify the code a bit. Backports commit b41059dd9deec367a4ccd296659f0bc5de2dc705 from qemu	2018-02-10 22:15:43 -05:00
Aurelien Jarno	134a7dfe82	tcg/optimize: add temp_is_const and temp_is_copy functions Add two accessor functions temp_is_const and temp_is_copy, to make the code more readable and make code change easier. Backports commit d9c769c60948815ee03b2684b1c1c68ee4375149 from qemu	2018-02-10 22:07:02 -05:00
Aurelien Jarno	b450b79622	tcg/optimize: optimize temps tracking The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate vector registers on most guests, it's not uncommon to have more than 100 used temps. This means we have initialize more than 2kB at least twice per TB, often more when there is a few goto_tb. Instead used a TCGTempSet bit array to track which temps are in used in the current basic block. This means there are only around 16 bytes to initialize. This improves the boot time of a MIPS guest on an x86-64 host by around 7% and moves out tcg_optimize from the the top of the profiler list. Backports commit 1208d7dd5fddc1fbd98de800d17429b4e5578848 from qemu	2018-02-10 21:51:46 -05:00
Aurelien Jarno	5f67ab74e7	tcg/optimize: fix constant signedness By convention, on a 64-bit host TCG internally stores 32-bit constants as sign-extended. This is not the case in the optimizer when a 32-bit constant is folded. This doesn't seem to have more consequences than suboptimal code generation. For instance the x86 backend assumes sign-extended constants, and in some rare cases uses a 32-bit unsigned immediate 0xffffffff instead of a 8-bit signed immediate 0xff for the constant -1. This is with a ppc guest: before ------ ---- 0x9f29cc movi_i32 tmp1,$0xffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7fd8c7dfe90c: xor %ebp,%ebp 0x7fd8c7dfe90e: mov %ebp,%r11d 0x7fd8c7dfe911: mov 0x18(%r14),%r9d 0x7fd8c7dfe915: add %r9d,%r10d 0x7fd8c7dfe918: adc %ebp,%r11d 0x7fd8c7dfe91b: add $0xffffffff,%r10d 0x7fd8c7dfe922: adc %ebp,%r11d 0x7fd8c7dfe925: mov %r11d,0x134(%r14) 0x7fd8c7dfe92c: mov %r10d,0x28(%r14) after ----- ---- 0x9f29cc movi_i32 tmp1,$0xffffffffffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7f37010d490c: xor %ebp,%ebp 0x7f37010d490e: mov %ebp,%r11d 0x7f37010d4911: mov 0x18(%r14),%r9d 0x7f37010d4915: add %r9d,%r10d 0x7f37010d4918: adc %ebp,%r11d 0x7f37010d491b: add $0xffffffffffffffff,%r10d 0x7f37010d491f: adc %ebp,%r11d 0x7f37010d4922: mov %r11d,0x134(%r14) 0x7f37010d4929: mov %r10d,0x28(%r14) Backports commit 29f3ff8d6cbc28f79933aeaa25805408d0984a8f from qemu	2018-02-10 21:40:20 -05:00
Aurelien Jarno	e273acf87a	tcg/optimize: fix tcg_opt_gen_movi Due to a copy&paste, the new op value is tested against mov_i32 instead of movi_i32. The test is therefore always false. Fix that. Backports commit 961521261a3d600b0695b2e6d2b0f490076f7e90 from qemu	2018-02-10 21:38:09 -05:00
Aurelien Jarno	42dd2addbe	tcg/optimize: rename tcg_constant_folding The tcg_constant_folding folding ends up doing all the optimizations (which is a good thing to avoid looping on all ops multiple time), so make it clear and just rename it tcg_optimize. Backports commit 36e60ef6ac5d8a262d0fbeedfdb2b588514cb1ea from qemu	2018-02-10 21:36:34 -05:00
Aurelien Jarno	7b0055d742	tcg/optimize: fold constant test in tcg_opt_gen_mov Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if the source temp is a constant. Fold that into the tcg_opt_gen_mov function. Backports commit 97a79eb70dd35a24fda87d86196afba5e6f21c5d from qemu	2018-02-10 21:34:00 -05:00
Aurelien Jarno	517fac57c3	tcg/optimize: fold temp copies test in tcg_opt_gen_mov Each call to tcg_opt_gen_mov is preceeded by a test to check if the source and destination temps are copies. Fold that into the tcg_opt_gen_mov function. Backports commit 5365718a9afeeabde3784d82a542f8ad909b18cf from qemu	2018-02-10 21:27:06 -05:00
Aurelien Jarno	d21f474c39	tcg/optimize: remove opc argument from tcg_opt_gen_mov We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Backports commit 8d6a91602ea824ef4435ea38fd475387eecc098c from qemu	2018-02-10 21:23:34 -05:00
Aurelien Jarno	0fd0afad13	tcg/optimize: remove opc argument from tcg_opt_gen_movi We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Backports commit ebd27391b00cdafc81e0541a940686137b3b48df from qemu	2018-02-10 21:21:13 -05:00
Richard Henderson	6234d07489	tcg: Merge memop and mmu_idx parameters to qemu_ld/st At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Backports commit 59227d5d45bb3c31dc2118011691c35b3c00879c from qemu	2018-02-10 19:01:49 -05:00
Richard Henderson	7532c92358	tcg/optimize: Handle or r,a,a with constant a Backports commit 2374c4b8375072da1f401c6daccc68ae76c73e63 from qemu	2018-02-09 14:56:12 -05:00
Richard Henderson	70f28c8bd5	tcg: Implement insert_op_before Rather reserving space in the op stream for optimization, let the optimizer add ops as necessary. Backports commit a4ce099a7a4b4734c372f6bf28f3362e370f23c1 from qemu	2018-02-09 13:11:50 -05:00
Richard Henderson	4fcaabf38c	tcg: Remove opcodes instead of noping them out With the linked list scheme we need not leave nops in the stream that we need to process later. Backports commit 0c627cdca20155753a536c51385abb73941a59a0 from qemu	2018-02-09 13:03:58 -05:00
Lioncash	0273e6ae18	tcg: Put opcodes in a linked list The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization.	2018-02-09 12:54:05 -05:00
xorstream	1aeaf5c40d	This code should now build the x86_x64-softmmu part 2.	2017-01-19 22:50:28 +11:00
Nguyen Anh Quynh	344d016104	import	2015-08-21 15:04:50 +08:00

47 Commits