Commit Graph

339 Commits

Author SHA1 Message Date
Richard Henderson
5a23a1f020
target/arm: Convert SETEND
Backports commit 48c04a5dfaf2c08e00b659a22c502ec098999cf1 from qemu
2019-11-28 02:47:26 -05:00
Richard Henderson
55390520d6
target/arm: Convert CPS (privileged)
Backports commit 52f83b9c68bdde8d3da5f49292d3561bd474651d from qemu
2019-11-28 02:47:19 -05:00
Richard Henderson
15558cfa7e
target/arm: Convert Clear-Exclusive, Barriers
Backports commit 519b84711ea36bb8a339096e207b6b9b65cd2051 from qemu
2019-11-28 02:47:11 -05:00
Richard Henderson
eff475c9a9
target/arm: Convert RFE and SRS
Backports commit 885782a78c6d05932c5d8f463dc50fc8312e3eb9 from qemu
2019-11-28 02:47:04 -05:00
Richard Henderson
87ff6a8bdf
target/arm: Convert SVC
Backports commit 542f5188a14758d64f7504580a9bd3cae973f546 from qemu
2019-11-28 02:46:55 -05:00
Richard Henderson
a119870e57
target/arm: Convert B, BL, BLX (immediate)
Backports commit 360144f3b99f9a626ffcc6b9d76537e3a3e0e708 from qemu
2019-11-28 02:46:47 -05:00
Richard Henderson
ed9b8ad2ea
target/arm: Diagnose base == pc for LDM/STM
We have been using store_reg and not store_reg_for_load when writing
back a loaded value into the base register. At first glance this is
incorrect when base == pc, however that case is UNPREDICTABLE.

Backports commit b0e382b8cf365fed8b8c43482029ac7655961a85 from qemu
2019-11-28 02:46:40 -05:00
Richard Henderson
1a0986ee25
target/arm: Diagnose too few registers in list for LDM/STM
This has been a TODO item for quite a while. The minimum bit
count for A32 and T16 is 1, and for T32 is 2.

Backports commit 4b222545dbf30b60c033e1cd6eddda612575fd8c from qemu
2019-11-28 02:46:33 -05:00
Richard Henderson
fc81b12631
target/arm: Diagnose writeback register in list for LDM for v7
Prior to v7, for the A32 encoding, this operation wrote an UNKNOWN
value back to the base register. Starting in v7 this is UNPREDICTABLE.

Backports commit 3949f4675d13c587078f8f423845a3a537a22595 from qemu
2019-11-28 02:46:24 -05:00
Richard Henderson
a501800ba6
target/arm: Convert LDM, STM
This includes a minor bug fix to LDM (user), which requires
bit 21 to be 0, which means no writeback.

Backports commit c5c426d4c680f908a1e262091a17b088b5709200 from qemu
2019-11-28 02:46:04 -05:00
Richard Henderson
e4ca88f9d6
target/arm: Convert MOVW, MOVT
Backports commit 8f4451274b7010c1f50e0baa5bb608f19f02b90f from qemu
2019-11-28 02:46:04 -05:00
Richard Henderson
b35749e239
target/arm: Convert Signed multiply, signed and unsigned divide
Backports commit 2c7c4e090409189488149869797da4acf895bad0 from qemu
2019-11-28 02:45:33 -05:00
Richard Henderson
987641cf10
target/arm: Convert packing, unpacking, saturation, and reversal
Backports commit 46497f6af73bb33c1064d43a28a48cbb4d233a23 from qemu
2019-11-28 02:44:55 -05:00
Richard Henderson
83cced6170
target/arm: Convert Parallel addition and subtraction
Backports commit adf1a5662a47d5b5b96f4f1e440e34c26b14a154 from qemu
2019-11-28 02:44:20 -05:00
Richard Henderson
21df423e47
target/arm: Convert USAD8, USADA8, SBFX, UBFX, BFC, BFI, UDF
In op_bfx, note that tcg_gen_{,s}extract_i32 already checks
for width == 32, so we don't need to special case that here.

Backports commit 86d21e4b509a2835ed79f234f476a4c5191d435b from qemu
2019-11-28 02:44:20 -05:00
Richard Henderson
dbcc67ab20
target/arm: Diagnose UNPREDICTABLE ldrex/strex cases
Backports commit af2882289951e58363d714afd16f80050685fa29 from qemu
2019-11-28 02:44:20 -05:00
Richard Henderson
3ac019eb98
target/arm: Convert Synchronization primitives
Backports commit 1efdd407a25f617129e2e0d5c009c07cbe847990 from qemu
2019-11-28 02:44:18 -05:00
Richard Henderson
c794962c42
target/arm: Convert load/store (register, immediate, literal)
Backports commit 5e291fe16846d216d5a69569b1c59f497dff96e4 from qemu
2019-11-28 02:42:01 -05:00
Richard Henderson
d5d98450f3
target/arm: Convert T32 ADDW/SUBW
Backports commit 145952e87fb86aaa9434d768c31eedbd323f7157 from qemu
2019-11-28 02:42:01 -05:00
Richard Henderson
7b9025910d
target/arm: Convert the rest of A32 Miscelaneous instructions
Backports commit 2cde9ea57dbc4cdee3677a1a335574537810fe2e from qemu
2019-11-28 02:42:01 -05:00
Richard Henderson
be2a259d3c
target/arm: Convert ERET
Pass the T5 encoding of SUBS PC, LR, #IMM through the normal SUBS path
to make it clear exactly what's happening -- we hit ALUExceptionReturn
along that path.

Backports commit ef11bc3c461e2c650e8bef552146a4b08f81884e from qemu
2019-11-28 02:42:00 -05:00
Richard Henderson
74040da34c
target/arm: Convert CLZ
Document our choice about the T32 CONSTRAINED UNPREDICTABLE behaviour.
This matches the undocumented choice made by the legacy decoder.

Backports commit 4c97f5b2f0fa9b37f9ff497f15411d809e6fd098 from qemu
2019-11-28 02:42:00 -05:00
Richard Henderson
94968602b8
target/arm: Convert BX, BXJ, BLX (register)
Backports commit 4ed95abd700e43dee8e032f754b53bec2b047f75 from qemu
2019-11-28 02:42:00 -05:00
Richard Henderson
831e17d970
target/arm: Convert Cyclic Redundancy Check
Backports commit 6c35d53f1bde7fe327c074473c3048d6e6f15e95 from qemu
2019-11-28 02:42:00 -05:00
Richard Henderson
fdd135c7d2
target/arm: Convert MRS/MSR (banked, register)
The m-profile and a-profile decodings overlap. Only return false
for the case of wrong profile; handle UNDEFINED for permission failure
directly. This ensures that we don't accidentally pass an insn that
applies to the wrong profile.

Backports commit d0b26644502103ca97093ef67749812dc1df7eea from qemu
2019-11-28 02:42:00 -05:00
Richard Henderson
571d879c49
target/arm: Convert MSR (immediate) and hints
Backports commit 6313059623dc512308681ba160ed862ac387e2fb from qemu
2019-11-28 02:41:59 -05:00
Richard Henderson
a011318794
target/arm: Simplify op_smlawx for SMLAW*
By shifting the 16-bit input left by 16, we can align the desired
portion of the 48-bit product and use tcg_gen_muls2_i32.

Backports commit 485b607d4f393e0de92c922806a68aef22340c98 from qemu
2019-11-28 02:40:01 -05:00
Richard Henderson
201be7b8b1
target/arm: Simplify op_smlaxxx for SMLAL*
Since all of the inputs and outputs are i32, dispense with
the intermediate promotion to i64 and use tcg_gen_add2_i32.

Backports commit ea96b374641bc429269096d88d4e91ee544273e9 from qemu
2019-11-28 02:40:00 -05:00
Richard Henderson
543b598d45
target/arm: Convert Halfword multiply and multiply accumulate
Backports commit 26c6923de7131fa1cf223ab67131d1992dc17001 from qemu
2019-11-28 02:40:00 -05:00
Richard Henderson
44416a6794
target/arm: Convert Saturating addition and subtraction
Backports commit 6d0730a82417e3a4a1911eb8e0246f3ba996f932 from qemu
2019-11-28 02:40:00 -05:00
Richard Henderson
45566b2780
target/arm: Simplify UMAAL
Since all of the inputs and outputs are i32, dispense with
the intermediate promotion to i64 and use tcg_gen_mulu2_i32
and tcg_gen_add2_i32.

Backports commit 2409d56454f0d028619fb1002eda86bf240906dd from qemu
2019-11-28 02:40:00 -05:00
Richard Henderson
5e5ae4c0d0
target/arm: Convert multiply and multiply accumulate
Backports commit bd92fe353bda4412ffc46c0f7415207a684b45f2 from qemu
2019-11-28 02:40:00 -05:00
Richard Henderson
677cf191d2
target/arm: Convert Data Processing (immediate)
Convert the modified immediate form of the data processing insns.
For A32, we can finally remove any code that was intertwined with
the register and register-shifted-register forms.

Backports commit 581c6ebd17c8f56ad52772216e6c6d8cc8997e8b from qemu
2019-11-28 02:39:16 -05:00
Richard Henderson
1b21ced6a1
target/arm: Convert Data Processing (reg-shifted-reg)
Convert the register shifted by register form of the data
processing insns. For A32, we cannot yet remove any code
because the legacy decoder intertwines the immediate form.

Backports commit 5be2c12337f4cbdbda4efe6ab485350f730faaad from qemu
2019-11-28 02:39:16 -05:00
Richard Henderson
e151696a65
target/arm: Convert Data Processing (register)
Convert the register shifted by immediate form of the data
processing insns. For A32, we cannot yet remove any code
because the legacy decoder intertwines the reg-shifted-reg
and immediate forms.

Backports commit 25ae32c558182c07fc6ad01b936e9151cbf00c44 from qemu
2019-11-28 02:38:58 -05:00
Richard Henderson
9fc793b566
target/arm: Add stubs for aa32 decodetree
Add the infrastructure that will become the new decoder.
No instructions adjusted so far.

Backports commit 51409b9e8cfe997b1ac3365df7400e0c6e844437 from qemu
2019-11-28 02:38:49 -05:00
Richard Henderson
6ec6c71d50
target/arm: Use store_reg_from_load in thumb2 code
This function already includes the test for an interworking write
to PC from a load. Change the T32 LDM implementation to match the
A32 LDM implementation.

For LDM, the reordering of the tests does not change valid
behaviour because the only case that differs is has rn == 15,
which is UNPREDICTABLE.

Backports commit 69be3e13764111737e1a7a13bb0c231e4d5be756 from qemu
2019-11-28 02:38:42 -05:00
Richard Henderson
46a8dfff59
target/arm: Fix SMMLS argument order
The previous simplification got the order of operands to the
subtraction wrong. Since the 64-bit product is the subtrahend,
we must use a 64-bit subtract to properly compute the borrow
from the low-part of the product.

Fixes: 5f8cd06ebcf5 ("target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR")

Backports commit e0a0c8322b8ebcdad674f443a3e86db8708d6738 from qemu
2019-11-20 17:24:44 -05:00
Peter Maydell
56b54f361e
target/arm: Allow ARMCPRegInfo read/write functions to throw exceptions
Currently the only part of an ARMCPRegInfo which is allowed to cause
a CPU exception is the access function, which returns a value indicating
that some flavour of UNDEF should be generated.

For the ATS system instructions, we would like to conditionally
generate exceptions as part of the writefn, because some faults
during the page table walk (like external aborts) should cause
an exception to be raised rather than returning a value.

There are several ways we could do this:
* plumb the GETPC() value from the top level set_cp_reg/get_cp_reg
helper functions through into the readfn and writefn hooks
* add extra readfn_with_ra/writefn_with_ra hooks that take the GETPC()
value
* require the ATS instructions to provide a dummy accessfn,
which serves no purpose except to cause the code generation
to emit TCG ops to sync the CPU state
* add an ARM_CP_ flag to mark the ARMCPRegInfo as possibly
throwing an exception in its read/write hooks, and make the
codegen sync the CPU state before calling the hooks if the
flag is set

This patch opts for the last of these, as it is fairly simple
to implement and doesn't require invasive changes like updating
the readfn/writefn hook function prototype signature.

Backports commit 37ff584c15bc3e1dd2c26b1998f00ff87189538c from qemu
2019-11-20 17:24:37 -05:00
Richard Henderson
87c06b7fae
target/arm: Factor out unallocated_encoding for aarch32
Make this a static function private to translate.c.
Thus we can use the same idiom between aarch64 and aarch32
without actually sharing function implementations.

Backports commit 1ce21ba1eaf08b22da5925f3e37fc0b4322da858 from qemu
2019-11-18 23:51:45 -05:00
Richard Henderson
1f59a43544
Revert "target/arm: Use unallocated_encoding for aarch32"
Despite the fact that the text for the call to gen_exception_insn
is identical for aarch64 and aarch32, the implementation inside
gen_exception_insn is totally different.

This fixes exceptions raised from aarch64.

This reverts commit fb2d3c9a9a.
2019-11-18 23:49:47 -05:00
Richard Henderson
9d2a3064af
target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
Separate shift + extract low will result in one extra insn
for hosts like RISC-V, MIPS, and Sparc.

Backports commit 664b7e3b97d6376f3329986c465b3782458b0f8b from qemu
2019-11-18 20:36:19 -05:00
Richard Henderson
93c016a3e7
target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
All of the inputs to these instructions are 32-bits. Rather than
extend each input to 64-bits and then extract the high 32-bits of
the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.

Backports commit 5f8cd06ebcf57420be8fea4574de2e074de46709 from qemu
2019-11-18 20:31:12 -05:00
Richard Henderson
4a1cc16eef
target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
Rotate is the more compact and obvious way to swap 16-bit
elements of a 32-bit word.

Backports commit adefba76e8bf10dfb342094d2f5debfeedb1a74d from qemu
2019-11-18 20:27:12 -05:00
Richard Henderson
751ab7b24b
target/arm: Use ror32 instead of open-coding the operation
The helper function is more documentary, and also already
handles the case of rotate by zero.

Backports commit dd861b3f29be97a9e3cdb9769dcbc0c7d7825185 from qemu
2019-11-18 20:25:51 -05:00
Richard Henderson
df4c773ed2
target/arm: Remove redundant shift tests
The immediate shift generator functions already test for,
and eliminate, the case of a shift by zero.

Backports commit 464eaa9571fae5867d9aea7d7209c091c8a50223 from qemu
2019-11-18 20:24:39 -05:00
Richard Henderson
4dd30ebfbd
target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
Use deposit as the composit operation to merge the
bits from the two inputs.

Backports commit d1f8755fc93911f5b27246b1da794542d222fa1b from qemu
2019-11-18 20:22:00 -05:00
Richard Henderson
25ccd28e78
target/arm: Use tcg_gen_extract_i32 for shifter_out_im
Extract is a compact combination of shift + and.

Backports commit 191f4bfe8d6cf0c7d5cd7f84cd7076e32e3745dd from qemu
2019-11-18 20:19:40 -05:00
Richard Henderson
3d3d56056b
target/arm: Remove helper_double_saturate
Replace x = double_saturate(y) with x = add_saturate(y, y).
There is no need for a separate more specialized helper.

Backports commit 640581a06d14e2d0d3c3ba79b916de6bc43578b0 from qemu
2019-11-18 20:13:21 -05:00
Richard Henderson
fb2d3c9a9a
target/arm: Use unallocated_encoding for aarch32
Promote this function from aarch64 to fully general use.
Use it to unify the code sequences for generating illegal
opcode exceptions.

Backports commit 3cb36637157088892e9e33ddb1034bffd1251d3b from qemu
2019-11-18 20:10:50 -05:00
Richard Henderson
d562bea784
target/arm: Remove offset argument to gen_exception_bkpt_insn
Unlike the other more generic gen_exception{,_internal}_insn
interfaces, breakpoints always refer to the current instruction.

Backports commit 06bcbda3f64d464b6ecac789bce4bd69f199cd68 from qemu
2019-11-18 20:05:45 -05:00
Richard Henderson
f19b4df20d
target/arm: Replace offset with pc in gen_exception_internal_insn
The offset is variable depending on the instruction set.
Passing in the actual value is clearer in intent.

Backpors commit aee828e7541a5895669ade3a4b6978382b6b094a from qemu
2019-11-18 20:05:23 -05:00
Richard Henderson
00fbadf637
target/arm: Replace s->pc with s->base.pc_next
We must update s->base.pc_next when we return from the translate_insn
hook to the main translator loop. By incrementing s->base.pc_next
immediately after reading the insn word, "pc_next" contains the address
of the next instruction throughout translation.

All remaining uses of s->pc are referencing the address of the next insn,
so this is now a simple global replacement. Remove the "s->pc" field.

Backports commit a04159166b880b505ccadc16f2fe84169806883d from qemu
2019-11-18 17:32:53 -05:00
Richard Henderson
7d1fcef722
target/arm: Remove redundant s->pc & ~1
The thumb bit has already been removed from s->pc, and is always even.

Backports commit 4818c3743b0e0095fdcecd24457da9b3443730ab from qemu
2019-11-18 17:32:53 -05:00
Richard Henderson
a2e60445de
target/arm: Introduce add_reg_for_lit
Provide a common routine for the places that require ALIGN(PC, 4)
as the base address as opposed to plain PC. The two are always
the same for A32, but the difference is meaningful for thumb mode.

Backports commit 16e0d8234ef9291747332d2c431e46808a060472 from qemu
2019-11-18 17:32:49 -05:00
Richard Henderson
1c0914e58c
target/arm: Introduce read_pc
We currently have 3 different ways of computing the architectural
value of "PC" as seen in the ARM ARM.

The value of s->pc has been incremented past the current insn,
but that is all. Thus for a32, PC = s->pc + 4; for t32, PC = s->pc;
for t16, PC = s->pc + 2. These differing computations make it
impossible at present to unify the various code paths.

With the newly introduced s->pc_curr, we can compute the correct
value for all cases, using the formula given in the ARM ARM.

This changes the behaviour for load_reg() and load_reg_var()
when called with reg==15 from a 32-bit Thumb instruction:
previously they would have returned the incorrect value
of pc_curr + 6, and now they will return the architecturally
correct value of PC, which is pc_curr + 4. This will not
affect well-behaved guest software, because all of the places
we call these functions from T32 code are instructions where
using r15 is UNPREDICTABLE. Using the architectural PC value
here is more consistent with the T16 and A32 behaviour.

Backports commit fdbcf6329d0c2984c55d7019419a72bf8e583c36 from qemu
2019-11-18 17:04:50 -05:00
Richard Henderson
0048f3e887
target/arm: Introduce pc_curr
Add a new field to retain the address of the instruction currently
being translated. The 32-bit uses are all within subroutines used
by a32 and t32. This will become less obvious when t16 support is
merged with a32+t32, and having a clear definition will help.

Convert aarch64 as well for consistency. Note that there is one
instance of a pre-assert fprintf that used the wrong value for the
address of the current instruction.

Backports commit 43722a6d4f0c92f7e7e1e291580039b0f9789df1 from qemu
2019-11-18 16:58:40 -05:00
Richard Henderson
1aa3c685a8
target/arm: Pass in pc to thumb_insn_is_16bit
This function is used in two different contexts, and it will be
clearer if the function is given the address to which it applies.

Backports commit 331b1ca616cb708db30dab68e3262d286e687f24 from qemu
2019-11-18 16:52:35 -05:00
Peter Maydell
c61e22627d
target/arm: Fix routing of singlestep exceptions
When generating an architectural single-step exception we were
routing it to the "default exception level", which is to say
the same exception level we execute at except that EL0 exceptions
go to EL1. This is incorrect because the debug exception level
can be configured by the guest for situations such as single
stepping of EL0 and EL1 code by EL2.

We have to track the target debug exception level in the TB
flags, because it is dependent on CPU state like HCR_EL2.TGE
and MDCR_EL2.TDE. (That we were previously calling the
arm_debug_target_el() function to determine dc->ss_same_el
is itself a bug, though one that would only have manifested
as incorrect syndrome information.) Since we are out of TB
flag bits unless we want to expand into the cs_base field,
we share some bits with the M-profile only HANDLER and
STACKCHECK bits, since only A-profile has this singlestep.

Fixes: https://bugs.launchpad.net/qemu/+bug/1838913

Backports commit 8bd587c1066f4456ddfe611b571d9439a947d74c from qemu
2019-11-18 16:50:15 -05:00
Peter Maydell
3f531fac61
target/arm: Factor out 'generate singlestep exception' function
Factor out code to 'generate a singlestep exception', which is
currently repeated in four places.

To do this we need to also pull the identical copies of the
gen-exception() function out of translate-a64.c and translate.c
into translate.h.

(There is a bug in the code: we're taking the exception to the wrong
target EL. This will be simpler to fix if there's only one place to
do it.)

Backports commit c1d5f50f094ab204accfacc2ee6aafc9601dd5c4 from qemu
2019-11-18 16:47:08 -05:00
Peter Maydell
3fc86e1901
target/arm: Don't abort on M-profile exception return in linux-user mode
An attempt to do an exception-return (branch to one of the magic
addresses) in linux-user mode for M-profile should behave like
a normal branch, because linux-user mode is always going to be
in 'handler' mode. This used to work, but we broke it when we added
support for the M-profile security extension in commit d02a8698d7ae2bfed.

In that commit we allowed even handler-mode calls to magic return
values to be checked for and dealt with by causing an
EXCP_EXCEPTION_EXIT exception to be taken, because this is
needed for the FNC_RETURN return-from-non-secure-function-call
handling. For system mode we added a check in do_v7m_exception_exit()
to make any spurious calls from Handler mode behave correctly, but
forgot that linux-user mode would also be affected.

How an attempted return-from-non-secure-function-call in linux-user
mode should be handled is not clear -- on real hardware it would
result in return to secure code (not to the Linux kernel) which
could then handle the error in any way it chose. For QEMU we take
the simple approach of treating this erroneous return the same way
it would be handled on a CPU without the security extensions --
treat it as a normal branch.

The upshot of all this is that for linux-user mode we should never
do any of the bx_excret magic, so the code change is simple.

This ought to be a weird corner case that only affects broken guest
code (because Linux user processes should never be attempting to do
exception returns or NS function returns), except that the code that
assigns addresses in RAM for the process and stack in our linux-user
code does not attempt to avoid this magic address range, so
legitimate code attempting to return to a trampoline routine on the
stack can fall into this case. This change fixes those programs,
but we should also look at restricting the range of memory we
use for M-profile linux-user guests to the area that would be
real RAM in hardware.

Backports commit 9027d3fba605d8f6093342ebe4a1da450d374630 from qemu
2019-11-18 16:30:43 -05:00
Peter Maydell
0d89bce217
target/arm: Execute Thumb instructions when their condbits are 0xf
Thumb instructions in an IT block are set up to be conditionally
executed depending on a set of condition bits encoded into the IT
bits of the CPSR/XPSR. The architecture specifies that if the
condition bits are 0b1111 this means "always execute" (like 0b1110),
not "never execute"; we were treating it as "never execute". (See
the ConditionHolds() pseudocode in both the A-profile and M-profile
Arm ARM.)

This is a bit of an obscure corner case, because the only legal
way to get to an 0b1111 set of condbits is to do an exception
return which sets the XPSR/CPSR up that way. An IT instruction
which encodes a condition sequence that would include an 0b1111 is
UNPREDICTABLE, and for v8A the CONSTRAINED UNPREDICTABLE choices
for such an IT insn are to NOP, UNDEF, or treat 0b1111 like 0b1110.
Add a comment noting that we take the latter option.

Backports commit 5529de1e5512c05276825fa8b922147663fd6eac from qemu
2019-08-08 18:07:57 -04:00
Philippe Mathieu-Daudé
f77b60d7e9
target/arm: Fix coding style issues
Since we'll move this code around, fix its style first.

Backports commit 9798ac7162c8a720c5d28f4d1fc9e03c7ab4f015 from qemu
2019-08-08 15:05:57 -04:00
Lioncash
76d33b34e1
target/arm: Fix bad patch merge in arm_tr_init_disas_context 2019-08-08 14:37:38 -04:00
Peter Maydell
318a1ddf39
target/arm: Remove unused cpu_F0s, cpu_F0d, cpu_F1s, cpu_F1d
Remove the now unused TCG globals cpu_F0s, cpu_F0d, cpu_F1s, cpu_F1d.

cpu_M0 is still used by the iwmmxt code, and cpu_V0 and
cpu_V1 are used by both iwmmxt and Neon.

Backports commit d9eea52c67c04c58ecceba6ffe5a93d1d02051fa from qemu
2019-06-25 18:45:53 -05:00
Peter Maydell
74168c20f2
target/arm: Stop using deprecated functions in NEON_2RM_VCVT_F32_F16
Remove some old constructns from NEON_2RM_VCVT_F16_F32 code:
* don't use CPU_F0s
* don't use tcg_gen_st_f32

Backports commit b66f6b9981004bbf120b8d17c20f92785179bdf2 from qemu
2019-06-25 18:43:40 -05:00
Peter Maydell
8ae25f6e4c
target/arm: stop using deprecated functions in NEON_2RM_VCVT_F16_F32
Remove some old constructs from NEON_2RM_VCVT_F16_F32 code:
* don't use cpu_F0s
* don't use tcg_gen_ld_f32

Backports commit 58f2682eee738e8890f9cfe858e0f4f68b00d45d from qemu
2019-06-25 18:39:43 -05:00
Peter Maydell
d419fbc270
target/arm: Stop using cpu_F0s in Neon VCVT fixed-point ops
Stop using cpu_F0s in the Neon VCVT fixed-point operations.

Backports commit c253dd7832bc6b4e140a0da56410a9336cce05bc from qemu
2019-06-25 18:35:33 -05:00
Peter Maydell
46216ae382
target/arm: Stop using cpu_F0s for Neon f32/s32 VCVT
Stop using cpu_F0s for the Neon f32/s32 VCVT operations.
Since this is the last user of cpu_F0s in the Neon 2rm-op
loop, we can remove the handling code for it too.

Backports commit 60737ed5785b9c1c6f1c85575dfdd1e9eec91878 from qemu
2019-06-25 18:32:32 -05:00
Peter Maydell
2fbe9c1d1d
target/arm: Stop using cpu_F0s for NEON_2RM_VRECPE_F and NEON_2RM_VRSQRTE_F
Stop using cpu_F0s for NEON_2RM_VRECPE_F and NEON_2RM_VRSQRTE_F.

Backports commit 9a011fece7201f8e268c982df8c7836f3335bbe6 from qemu
2019-06-25 18:29:22 -05:00
Peter Maydell
f82ea34369
target/arm: Stop using cpu_F0s for NEON_2RM_VCVT[ANPM][US]
Stop using cpu_F0s for the NEON_2RM_VCVT[ANPM][US] ops.

Backports commit 30bf0a018f6c706913c8c0ea57b386907f4229be from qemu
2019-06-25 18:28:03 -05:00
Peter Maydell
0d4535bf16
target/arm: Stop using cpu_F0s for NEON_2RM_VRINT*
Switch NEON_2RM_VRINT* away from using cpu_F0s.

Backports commit 3b52ad1fae804acdc2fdc41b418a65249beae430 from qemu
2019-06-25 18:26:24 -05:00
Peter Maydell
a62cbc7ac5
target/arm: Stop using cpu_F0s for NEON_2RM_VNEG_F
Switch NEON_2RM_VABS_F away from using cpu_F0s.

Backports commit cedcc96fc7c8e520a190a010ac97dbb53e57d7d2 from qemu
2019-06-25 18:24:01 -05:00
Peter Maydell
63d7f92eba
target/arm: Stop using cpu_F0s for NEON_2RM_VABS_F
Where Neon instructions are floating point operations, we
mostly use the old VFP utility functions like gen_vfp_abs()
which work on the TCG globals cpu_F0s and cpu_F1s. The
Neon for-each-element loop conditionally loads the inputs
into either a plain old TCG temporary for most operations
or into cpu_F0s for float operations, and similarly stores
back either cpu_F0s or the temporary.

Switch NEON_2RM_VABS_F away from using cpu_F0s, and
update neon_2rm_is_float_op() accordingly.

Backports commit fd8a68cdcf81d70eebf866a132e9780d4108da9c from qemu
2019-06-25 18:22:05 -05:00
Peter Maydell
1a0d31c05e
target/arm: Convert float-to-integer VCVT insns to decodetree
Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.

Backports commit 3111bfc2da6ba0c8396dc97ca479942d711c6146 from qemu
2019-06-13 19:40:02 -04:00
Peter Maydell
f6c67559d4
target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.

Backports commit e3d6f4290c788e850c64815f0b3e331600a4bcc0 from qemu
2019-06-13 19:35:51 -04:00
Peter Maydell
c66d477359
target/arm: Convert VJCVT to decodetree
Convert the VJCVT instruction to decodetree.

Backports commit 92073e947487e2109f3dfebfeaa48d6323cbd981 from qemu
2019-06-13 19:31:35 -04:00
Peter Maydell
7be9e6f9b4
target/arm: Convert integer-to-float insns to decodetree
Convert the VCVT integer-to-float instructions to decodetree.

Backports commit 8fc9d8918cde342c71923e361b9f2193e36ed18b from qemu
2019-06-13 19:20:41 -04:00
Peter Maydell
e0e4f99103
target/arm: Convert double-single precision conversion insns to decodetree
Convert the VCVT double/single precision conversion insns to decodetree.

Backports commit 6ed7e49c3693ed8411773c4880f42b2932beb12d from qemu
2019-06-13 19:18:01 -04:00
Peter Maydell
ab9d0235ed
target/arm: Convert VFP round insns to decodetree
Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Backports commit e25155f55dc4abb427a88dfe58bbbc550fe7d643 from qemu
2019-06-13 19:15:05 -04:00
Peter Maydell
9e842a0f2a
target/arm: Convert the VCVT-to-f16 insns to decodetree
Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Backports commit cdfd14e86ab0b1ca29a702d13a8e4af2e902a9bf from qemu
2019-06-13 19:03:59 -04:00
Peter Maydell
7d927b2d0e
target/arm: Convert the VCVT-from-f16 insns to decodetree
Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Backports commit b623d803dda805f07aadcbf098961fde27315c19 from qemu
2019-06-13 19:00:23 -04:00
Peter Maydell
e6cc2616d2
target/arm: Convert VFP comparison insns to decodetree
Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations. This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT. (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Backports commit 386bba2368842fc74388a3c1651c6c0c0c70adbd from qemu
2019-06-13 18:55:53 -04:00
Peter Maydell
a75a3e321f
target/arm: Convert VMOV (register) to decodetree
Backports commit 17552b979ebb9848a534c25ebed18a1072710058 from qemu
2019-06-13 18:49:49 -04:00
Peter Maydell
ee30962891
target/arm: Convert VSQRT to decodetree
Convert the VSQRT instruction to decodetree.

Backports commit b8474540cbce4e2fa45010416375d1bcbe86dc15 from qemu
2019-06-13 18:47:32 -04:00
Peter Maydell
7aea3da6b7
target/arm: Convert VNEG to decodetree
Convert the VNEG instruction to decodetree.

Backports commit 1882651afdb0ca44f0631192fbe65a71c660d809 from qemu
2019-06-13 18:43:50 -04:00
Peter Maydell
1032d86ad3
target/arm: Convert VABS to decodetree
Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Backports commit 90287e22c987e9840704345ed33d237cbe759dd9 from qemu
2019-06-13 18:41:43 -04:00
Peter Maydell
7a16bc6876
target/arm: Convert VMOV (imm) to decodetree
Convert the VFP VMOV (immediate) instruction to decodetree.

Backports commit b518c753f0b94e14e01e97b4ec42c100dafc0cc2 from qemu
2019-06-13 18:37:58 -04:00
Peter Maydell
0ebb6b8b90
target/arm: Convert VFP fused multiply-add insns to decodetree
Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Backports commit d4893b01d23060845ee3855bc96626e16aad9ab5 from qemu
2019-06-13 18:24:36 -04:00
Peter Maydell
321bcc822b
target/arm: Convert VDIV to decodetree
Convert the VDIV instruction to decodetree.

Backports commit 519ee7ae31e050eb0ff9ad35c213f0bd7ab1c03e from qemu
2019-06-13 18:19:47 -04:00
Peter Maydell
76c74bc657
target/arm: Convert VSUB to decodetree
Convert the VSUB instruction to decodetree.

Backports commit 8fec9a119264b7936503abce3c106fad7e3ccb76 from qemu.
2019-06-13 18:18:00 -04:00
Peter Maydell
f56f0342ad
target/arm: Convert VADD to decodetree
Convert the VADD instruction to decodetree.

Backports commit ce28b303716e7eca3f3765bf6776d722ebbe1122 from qemu
2019-06-13 18:15:52 -04:00
Peter Maydell
06584edf61
target/arm: Convert VNMUL to decodetree
Convert the VNMUL instruction to decodetree.

Backports commit 43c4be1236c105090d134540da1036073d157cd4 from qemu
2019-06-13 18:14:16 -04:00
Peter Maydell
2c5e102017
target/arm: Convert VMUL to decodetree
Convert the VMUL instruction to decodetree.

Backports commit 88c5188ced60e9f2b8cc3af3b9bc4a8031c8c996 from qemu
2019-06-13 18:12:03 -04:00
Peter Maydell
b26b6a12a2
target/arm: Convert VFP VNMLA to decodetree
Convert the VFP VNMLA instruction to decodetree.

Backports commit 8a483533adc1bdc2decb8f456dbe930a2d245a8b from qemu
2019-06-13 18:09:57 -04:00
Peter Maydell
638b90de31
target/arm: Convert VFP VNMLS to decodetree
Convert the VFP VNMLS instruction to decodetree.

Backports commit c54a416cc6d60efbc79dd37aaf0c8918c05b5815 from qemu
2019-06-13 18:06:59 -04:00
Peter Maydell
67ad40ffa4
target/arm: Convert VFP VMLS to decodetree
Convert the VFP VMLS instruction to decodetree.

Backports commit e7258280d46af4ab6a0cc93ccfe8f6614defb4b7 from qemu
2019-06-13 18:02:37 -04:00
Peter Maydell
edf81eb214
target/arm: Convert VFP VMLA to decodetree
Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
vmla.f64 d10, d1, d16 with length 2 stride 2
is equivalent to the pair of scalar operations
vmla.f64 d10, d1, d16
vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Backports commit 266bd25c485597c94209bfdb3891c1d0c573c164 from qemu
2019-06-13 17:59:16 -04:00
Peter Maydell
93fe4cbe9e
target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.

Backports commit 3993d0407dff7233e42f2251db971e126a0497e9 from qemu
2019-06-13 17:31:28 -04:00
Peter Maydell
ff7042567e
target/arm: Convert the VFP load/store multiple insns to decodetree
Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Backports commit fa288de272c5c8a66d5eb683b123706a52bc7ad6 from qemu
2019-06-13 17:26:52 -04:00