Commit Graph

64 Commits

Author SHA1 Message Date
Richard Henderson
091b4fa1ff
tcg/i386: Move TCG_REG_CALL_STACK from define to enum
Backports commit 66c0285df4270d184afce5ac8b97ac175c89562f from qemu
2018-12-18 05:13:47 -05:00
Richard Henderson
f3a8a4a306
tcg/i386: Always use %ebp for TCG_AREG0
For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.

Backports commit 5740d9f714835964873325d1210b26811252843f from qemu
2018-12-18 05:13:05 -05:00
Roman Kapl
33e69342e3
tcg/i386: fix vector operations on 32-bit hosts
The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7bb ("Add vector operations")

Backports commit 93bf9a42733321fb632bcb9eafd049ef0e3d9417 from qemu
2018-10-02 04:22:35 -04:00
Richard Henderson
a4c2dbef3e
tcg/i386: Mark xmm registers call-clobbered
When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7bb7

Backports commit 672189cd586ea38a2c1d8ab91eb1f9dcff5ceb05 from qemu
2018-07-23 20:00:26 -04:00
John Arbuckle
22c3206738
tcg/i386: Use byte form of xgetbv instruction
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction. To go around this problem, the raw
encoding of the instruction is used instead.

Backports commit 1019242af11400252f6735ca71a35f81ac23a66d from qemu
2018-06-28 13:23:32 -05:00
Richard Henderson
33f7f6f09a
tcg/i386: Fix dup_vec in non-AVX2 codepath
The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
DUP v26.16b, w21
which becomes TCG IR ops:
dup_vec v128,e8,tmp2,x21
st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0
0x607c5694: 00
0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1
0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14)
0x607c56aa: 00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Backports commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768 from qemu
2018-05-11 11:22:38 -04:00
Lioncash
6bdfeb35ec
tcg/i386: Perform comparison pass against qemu
Ensures formatting and code are consistent.
2018-03-20 06:29:06 -04:00
Richard Henderson
2310bd4887
tcg/i386: Support INDEX_op_dup2_vec for -m32
Unknown why -m32 was passing with gcc but not clang; it should have
failed for both. This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.

Backports commit 7f34ed4bcdfda55f978f51aadca64aa970c9f4b6 from qemu
2018-03-17 20:22:24 -04:00
Lioncash
b28c64ed34
tcg/i386: Amend bad merge 2018-03-12 10:11:03 -04:00
Richard Henderson
a16ee979fc
tcg/i386: Always use TZCNT when available
I think this is cleaner than sometimes using BSF.

Backports commit 39f099ec9d6d420b6fe6f7f4f8ed80ae29c65ff2 from qemu
2018-03-12 05:11:42 -04:00
Richard Henderson
7e327aaf84
util: Introduce include/qemu/cpuid.h
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.

Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu
2018-03-09 12:12:00 -05:00
Richard Henderson
b3e89e9996
tcg/i386: Add vector operations
The x86 vector instruction set is extremely irregular. With newer
editions, Intel has filled in some of the blanks. However, we don't
get many 64-bit operations until SSE4.2, introduced in 2009.

The subsequent edition was for AVX1, introduced in 2011, which added
three-operand addressing, and adjusts how all instructions should be
encoded.

Given the relatively narrow 2 year window between possible to support
and desirable to support, and to vastly simplify code maintainence,
I am only planning to support AVX1 and later cpus.

Backports commit 770c2fc7bb70804ae9869995fd02dadd6d7656ac from qemu
2018-03-07 08:07:40 -05:00
Emilio G. Cota
3cf23eb256
tcg/i386: constify tcg_target_callee_save_regs
Backports commit e268f4c036d2b47a4f8bf293c1371b328e03ca04 from qemu
2018-03-05 02:08:02 -05:00
Richard Henderson
fc8b4316a9
tcg: Remove tcg_regset_set32
It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.

Backports commit f46934df662182097dce07d57ec00f37e4d2abf1 from qemu
2018-03-04 23:42:59 -05:00
Richard Henderson
49d09d6888
tcg: Remove tcg_regset_clear
Backports commit ccb1bb66ea2a42e773bfa04178d8b383ff86d4d8 from qemu
2018-03-04 23:24:45 -05:00
Richard Henderson
b96f53e8a3
tcg/i386: Store out-of-range call targets in constant pool
Already it saves 2 bytes per call, but also the constant pool
entry may well be shared across multiple calls.

Backports commit 4e45f23943c0bb91588627de3801826546155ad8 from qemu
2018-03-04 22:22:49 -05:00
Richard Henderson
f96514a99c
tcg: Rearrange ldst label tracking
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.

Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu
2018-03-04 22:13:13 -05:00
Richard Henderson
31b8b67cd3
tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.

While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.

Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.

Backports commit a85833933628384d74ec412024d55cf012640287 from qemu
2018-03-04 21:52:35 -05:00
Emilio G. Cota
e4dfb7f807
tcg/i386: implement goto_ptr
Backports commit 5cb4ef80f65252dd85b86fa7f3c985015423d670 from qemu
2018-03-02 21:08:38 -05:00
Emilio G. Cota
8f4f15e5f5
tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr
Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.

Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu
2018-03-02 21:05:18 -05:00
Alex Bennée
caba238b5a
tcg: enable MTTCG by default for ARM on x86 hosts
This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:

- The ARM translate.c/translate-64.c have been converted to
- use MTTCG safe atomic primitives
- emit the appropriate barrier ops
- The ARM machine has been updated to
- hold the BQL when modifying shared cross-vCPU state
- defer powerctl changes to async safe work

All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.

Backports commit ca759f9e387db87e1719911f019bc60c74be9ed8 from qemu
2018-03-02 10:32:47 -05:00
Richard Henderson
4bec129626
tcg/i386: Handle ctpop opcode
Backports commit 993508e43e6d180e9ba9b747a9657eac69aec5bb from qemu
2018-03-01 18:49:43 -05:00
Richard Henderson
5f6e7bbdbd
tcg: Add opcode for ctpop
The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Backports commit a768e4e99247911f00c5c0267c12d4e207d5f6cc from qemu
2018-03-01 18:26:41 -05:00
Richard Henderson
246d891668
tcg/i386: Handle ctz and clz opcodes
Backports commit bbf25f90ba802a286fd72be9175a860ae5fec726 from qemu
2018-03-01 16:56:08 -05:00
Richard Henderson
73ab332185
tcg/i386: Allow bmi2 shiftx to have non-matching operands
Previously we could not have different constraints for different ISA levels,
which prevented us from eliding the matching constraint for shifts.

We do now have to make sure that the operands match for constant shifts.
We can also handle some small left shifts via lea.

Backports commit 6a5aed4bdc7078838a8098336588d56c9ce09d1d from qemu
2018-03-01 16:45:04 -05:00
Richard Henderson
9e3feebbfb
tcg/i386: Hoist common arguments in tcg_out_op
Backports commit 42d5b514928a8a0d2f55a4c243d1333f9675815b from qemu
2018-03-01 16:42:30 -05:00
Richard Henderson
142ca07077
tcg/i386: Fuly convert tcg_target_op_def
Use a switch instead of searching a table. Share constraints between
32-bit and 64-bit, when at all possible.

Backports commit cd26449a505f808e479af4fdd539e05767e09c06 from qemu
2018-03-01 16:32:31 -05:00
Richard Henderson
2cf34e1b55
tcg: Add clz and ctz opcodes
Backports commit 0e28d0063bbd9e59a981ea2d20f82f30c5d956a8 from qemu
2018-03-01 16:04:11 -05:00
Richard Henderson
3f38611159
tcg: Pass the opcode width to target_parse_constraint
This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit. Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.

Backports commit 069ea736b50b75fdec99c9b8cc603b97bd98419e from qemu
2018-03-01 15:45:40 -05:00
Richard Henderson
b8c93597b4
tcg: Transition flat op_defs array to a target callback
This will allow the target to tailor the constraints to the
auto-detected ISA extensions.

Backports commit f69d277ece43c42c7ab0144c2ff05ba740f6706b from qemu
2018-03-01 15:40:11 -05:00
Richard Henderson
7a7a5c640d
tcg/i386: Implement field extraction opcodes
Backports commit 78fdbfb94616f0391834d2eccabd16ea29e37da5 from qemu
2018-03-01 13:35:41 -05:00
Richard Henderson
8e0585dcb1
tcg: Add field extraction primitives
Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Backports commit 7ec8bab3deae643b1ce579c2d65a244f30708330 from qemu
2018-03-01 13:21:30 -05:00
Richard Henderson
2ab4b8fa4d
tcg/i386: Extend TARGET_PAGE_MASK to the proper type
TARGET_PAGE_MASK, as defined, has type "int". We need to extend
that to the proper target width before oring in an "unsigned".

Backports commit ebb90a005da67147245cd38fb04a965a87a961b7 from qemu
2018-02-26 03:32:38 -05:00
Pranith Kumar
d49bd55f52
tcg/i386: Add support for fence
Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of
mfence which has similar ordering semantics.

Backports commit a7d00d4effb58889ac6df64f98ac50c9d1594149 from qemu
2018-02-26 03:10:58 -05:00
Richard Henderson
91f5cf0417
tcg: Support arbitrary size + alignment
Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.

In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.

Backports commit 85aa80813dd9f5c1f581c743e45678a3bee220f8 from qemu
2018-02-26 02:47:26 -05:00
Markus Armbruster
25ec9ab016
tcg: Clean up tcg-target.h header guards
These use guard symbols like TCG_TARGET_$target.
scripts/clean-header-guards.pl doesn't like them because they don't
match their file name (they should, to make guard collisions less
likely).

Clean them up: use guard symbol $target_TCG_TARGET_H for
tcg/$target/tcg-target.h.

Backports commit 14e54f8ecfe9c5e17348f456781344737ed10b3b from qemu
2018-02-25 04:15:08 -05:00
Sergey Sorokin
e4d123caa9
tcg: Improve the alignment check infrastructure
Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.

Backports commit 1f00b27f17518a1bcb4cedca49eaec96a4d560bd from qemu
2018-02-25 02:23:28 -05:00
Richard Henderson
23586e2674
tcg: Optimize spills of constants
While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.

Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail). Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.

Backports commit 59d7c14eeff8d2ad7f61aed86ce5a176113bc153 from qemu
2018-02-25 01:45:29 -05:00
Sergey Fedorov
e60c24cecf
tcg: Clean up direct block chaining data fields
Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
tb_next_offset => jmp_reset_offset
tb_jmp_offset => jmp_insn_offset
tb_next => jmp_target_addr
jmp_next => jmp_list_next
jmp_first => jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.

Backports commit f309101c26b59641fc1aa8fb2a98a5441cdaea03 from qemu
2018-02-23 21:28:19 -05:00
Sergey Fedorov
5eb2d6618f
tcg/i386: Make direct jump patching thread-safe
Ensure direct jump patching in i386 is atomic by:
* naturally aligning a location of direct jump address;
* using atomic_read()/atomic_set() for code patching.

Backports commit 0d07abf05e98903c7faf204a9a90f7d45b7554dc from qemu
2018-02-23 21:28:17 -05:00
Aurelien Jarno
6060ab6596
tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG
Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.

Backports commit 8d8fdbae010aa75a23f0307172e81034125aba6e from qemu
2018-02-23 13:55:21 -05:00
Aurelien Jarno
355ed7cd08
tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Backports commit eabb7b91b36b202b4dac2df2d59d698e3aff197a from qemu
2018-02-23 13:52:13 -05:00
Peter Maydell
764c2d09e5
tcg: Remove unnecessary osdep.h includes from tcg-target.inc.c
Commit 757e725b58c57d added a number of #include "qemu/osdep.h"
files to the tcg-target.c files (as they were named at the time).
These are unnecessary because these files are not standalone C
files, and the tcg/tcg.c file which includes them will have
already included osdep.h on their behalf. Remove the unneeded
include directives.

Backports commit c3b7f66800fbf9f47fddbcf2e2cd30ea932e0aae from qemu
2018-02-20 20:41:00 -05:00
Peter Maydell
7784a25470
tcg: Rename tcg-target.c to tcg-target.inc.c
Rename the per-architecture tcg-target.c files to tcg-target.inc.c.
This makes it clearer that they are not intended to be standalone
C files, but are instead #included into another source file.

Backports commit ce151109813e2770fd3cee2f37bfa2cdd01a12b9 from qemu
2018-02-20 20:39:57 -05:00
Peter Maydell
4ca19f2cd6
tcg: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Backports commit 757e725b58c57d3ebb66a31fd2210df977a12154 from qemu
2018-02-19 01:04:30 -05:00
Aurelien Jarno
11cfddad05
tcg/i386: use softmmu fast path for unaligned accesses
Softmmu unaligned load/stores currently goes through through the slow
path for two reasons:
  - to support unaligned access on host with strict alignement
  - to correctly handle accesses crossing pages

x86 is only concerned by the second reason. Unaligned accesses are
avoided by compilers, but are not uncommon. We therefore would like
to see them going through the fast path, if they don't cross pages.

For that we can use the fact that two adjacent TLB entries can't contain
the same page. Therefore accessing the TLB entry corresponding to the
first byte, but comparing its content to page address of the last byte
ensures that we don't cross pages. We can do this check without adding
more instructions in the TLB code (but increasing its length by one
byte) by using the LEA instruction to combine the existing move with the
size addition.

On an x86-64 host, this gives a 3% boot time improvement for a powerpc
guest and 4% for an x86-64 guest.

Backports commit 8cc580f6a0d8c0e2f590c1472cf5cd8e51761760 from qemu
2018-02-17 15:23:33 -05:00
Paolo Bonzini
b34c233c2f
tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS
This will be used to size the TLB when more than 8 MMU modes are
used by the target. Limitations come from the limited size of
the immediate fields (which sometimes, as in the case of Aarch64,
extend to instructions that shift the immediate).

Backports commit 006f8638c62bca2b0caf609485f47fa5e14d8a3c from qemu
2018-02-13 08:28:29 -05:00
Richard Henderson
58e939b91f
tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts. This is all that was being used anyway.

Backports commit 609ad70562793937257c89d07bf7c1370b9fc9aa from qemu
2018-02-10 23:00:45 -05:00
Aurelien Jarno
f279c93768
tcg: implement real ext_i32_i64 and extu_i32_i64 ops
Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.

Backports commit 4f2331e5b67af8172419eb1c8db510b497b30a7b from qemu
2018-02-10 22:45:13 -05:00
Aurelien Jarno
80223e7ad5
tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Backports commit 0632e555fc4d281d69cb08d98d500d96185b041f from qemu
2018-02-10 22:29:30 -05:00