unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-22 05:08:17 +02:00

Author	SHA1	Message	Date
Richard Henderson	5a23a1f020	target/arm: Convert SETEND Backports commit 48c04a5dfaf2c08e00b659a22c502ec098999cf1 from qemu	2019-11-28 02:47:26 -05:00
Richard Henderson	55390520d6	target/arm: Convert CPS (privileged) Backports commit 52f83b9c68bdde8d3da5f49292d3561bd474651d from qemu	2019-11-28 02:47:19 -05:00
Richard Henderson	15558cfa7e	target/arm: Convert Clear-Exclusive, Barriers Backports commit 519b84711ea36bb8a339096e207b6b9b65cd2051 from qemu	2019-11-28 02:47:11 -05:00
Richard Henderson	eff475c9a9	target/arm: Convert RFE and SRS Backports commit 885782a78c6d05932c5d8f463dc50fc8312e3eb9 from qemu	2019-11-28 02:47:04 -05:00
Richard Henderson	87ff6a8bdf	target/arm: Convert SVC Backports commit 542f5188a14758d64f7504580a9bd3cae973f546 from qemu	2019-11-28 02:46:55 -05:00
Richard Henderson	a119870e57	target/arm: Convert B, BL, BLX (immediate) Backports commit 360144f3b99f9a626ffcc6b9d76537e3a3e0e708 from qemu	2019-11-28 02:46:47 -05:00
Richard Henderson	ed9b8ad2ea	target/arm: Diagnose base == pc for LDM/STM We have been using store_reg and not store_reg_for_load when writing back a loaded value into the base register. At first glance this is incorrect when base == pc, however that case is UNPREDICTABLE. Backports commit b0e382b8cf365fed8b8c43482029ac7655961a85 from qemu	2019-11-28 02:46:40 -05:00
Richard Henderson	1a0986ee25	target/arm: Diagnose too few registers in list for LDM/STM This has been a TODO item for quite a while. The minimum bit count for A32 and T16 is 1, and for T32 is 2. Backports commit 4b222545dbf30b60c033e1cd6eddda612575fd8c from qemu	2019-11-28 02:46:33 -05:00
Richard Henderson	fc81b12631	target/arm: Diagnose writeback register in list for LDM for v7 Prior to v7, for the A32 encoding, this operation wrote an UNKNOWN value back to the base register. Starting in v7 this is UNPREDICTABLE. Backports commit 3949f4675d13c587078f8f423845a3a537a22595 from qemu	2019-11-28 02:46:24 -05:00
Richard Henderson	a501800ba6	target/arm: Convert LDM, STM This includes a minor bug fix to LDM (user), which requires bit 21 to be 0, which means no writeback. Backports commit c5c426d4c680f908a1e262091a17b088b5709200 from qemu	2019-11-28 02:46:04 -05:00
Richard Henderson	e4ca88f9d6	target/arm: Convert MOVW, MOVT Backports commit 8f4451274b7010c1f50e0baa5bb608f19f02b90f from qemu	2019-11-28 02:46:04 -05:00
Richard Henderson	b35749e239	target/arm: Convert Signed multiply, signed and unsigned divide Backports commit 2c7c4e090409189488149869797da4acf895bad0 from qemu	2019-11-28 02:45:33 -05:00
Richard Henderson	987641cf10	target/arm: Convert packing, unpacking, saturation, and reversal Backports commit 46497f6af73bb33c1064d43a28a48cbb4d233a23 from qemu	2019-11-28 02:44:55 -05:00
Richard Henderson	83cced6170	target/arm: Convert Parallel addition and subtraction Backports commit adf1a5662a47d5b5b96f4f1e440e34c26b14a154 from qemu	2019-11-28 02:44:20 -05:00
Richard Henderson	21df423e47	target/arm: Convert USAD8, USADA8, SBFX, UBFX, BFC, BFI, UDF In op_bfx, note that tcg_gen_{,s}extract_i32 already checks for width == 32, so we don't need to special case that here. Backports commit 86d21e4b509a2835ed79f234f476a4c5191d435b from qemu	2019-11-28 02:44:20 -05:00
Richard Henderson	dbcc67ab20	target/arm: Diagnose UNPREDICTABLE ldrex/strex cases Backports commit af2882289951e58363d714afd16f80050685fa29 from qemu	2019-11-28 02:44:20 -05:00
Richard Henderson	3ac019eb98	target/arm: Convert Synchronization primitives Backports commit 1efdd407a25f617129e2e0d5c009c07cbe847990 from qemu	2019-11-28 02:44:18 -05:00
Richard Henderson	c794962c42	target/arm: Convert load/store (register, immediate, literal) Backports commit 5e291fe16846d216d5a69569b1c59f497dff96e4 from qemu	2019-11-28 02:42:01 -05:00
Richard Henderson	d5d98450f3	target/arm: Convert T32 ADDW/SUBW Backports commit 145952e87fb86aaa9434d768c31eedbd323f7157 from qemu	2019-11-28 02:42:01 -05:00
Richard Henderson	7b9025910d	target/arm: Convert the rest of A32 Miscelaneous instructions Backports commit 2cde9ea57dbc4cdee3677a1a335574537810fe2e from qemu	2019-11-28 02:42:01 -05:00
Richard Henderson	be2a259d3c	target/arm: Convert ERET Pass the T5 encoding of SUBS PC, LR, #IMM through the normal SUBS path to make it clear exactly what's happening -- we hit ALUExceptionReturn along that path. Backports commit ef11bc3c461e2c650e8bef552146a4b08f81884e from qemu	2019-11-28 02:42:00 -05:00
Richard Henderson	74040da34c	target/arm: Convert CLZ Document our choice about the T32 CONSTRAINED UNPREDICTABLE behaviour. This matches the undocumented choice made by the legacy decoder. Backports commit 4c97f5b2f0fa9b37f9ff497f15411d809e6fd098 from qemu	2019-11-28 02:42:00 -05:00
Richard Henderson	94968602b8	target/arm: Convert BX, BXJ, BLX (register) Backports commit 4ed95abd700e43dee8e032f754b53bec2b047f75 from qemu	2019-11-28 02:42:00 -05:00
Richard Henderson	831e17d970	target/arm: Convert Cyclic Redundancy Check Backports commit 6c35d53f1bde7fe327c074473c3048d6e6f15e95 from qemu	2019-11-28 02:42:00 -05:00
Richard Henderson	fdd135c7d2	target/arm: Convert MRS/MSR (banked, register) The m-profile and a-profile decodings overlap. Only return false for the case of wrong profile; handle UNDEFINED for permission failure directly. This ensures that we don't accidentally pass an insn that applies to the wrong profile. Backports commit d0b26644502103ca97093ef67749812dc1df7eea from qemu	2019-11-28 02:42:00 -05:00
Richard Henderson	571d879c49	target/arm: Convert MSR (immediate) and hints Backports commit 6313059623dc512308681ba160ed862ac387e2fb from qemu	2019-11-28 02:41:59 -05:00
Richard Henderson	a011318794	target/arm: Simplify op_smlawx for SMLAW* By shifting the 16-bit input left by 16, we can align the desired portion of the 48-bit product and use tcg_gen_muls2_i32. Backports commit 485b607d4f393e0de92c922806a68aef22340c98 from qemu	2019-11-28 02:40:01 -05:00
Richard Henderson	201be7b8b1	target/arm: Simplify op_smlaxxx for SMLAL* Since all of the inputs and outputs are i32, dispense with the intermediate promotion to i64 and use tcg_gen_add2_i32. Backports commit ea96b374641bc429269096d88d4e91ee544273e9 from qemu	2019-11-28 02:40:00 -05:00
Richard Henderson	543b598d45	target/arm: Convert Halfword multiply and multiply accumulate Backports commit 26c6923de7131fa1cf223ab67131d1992dc17001 from qemu	2019-11-28 02:40:00 -05:00
Richard Henderson	44416a6794	target/arm: Convert Saturating addition and subtraction Backports commit 6d0730a82417e3a4a1911eb8e0246f3ba996f932 from qemu	2019-11-28 02:40:00 -05:00
Richard Henderson	45566b2780	target/arm: Simplify UMAAL Since all of the inputs and outputs are i32, dispense with the intermediate promotion to i64 and use tcg_gen_mulu2_i32 and tcg_gen_add2_i32. Backports commit 2409d56454f0d028619fb1002eda86bf240906dd from qemu	2019-11-28 02:40:00 -05:00
Richard Henderson	5e5ae4c0d0	target/arm: Convert multiply and multiply accumulate Backports commit bd92fe353bda4412ffc46c0f7415207a684b45f2 from qemu	2019-11-28 02:40:00 -05:00
Richard Henderson	677cf191d2	target/arm: Convert Data Processing (immediate) Convert the modified immediate form of the data processing insns. For A32, we can finally remove any code that was intertwined with the register and register-shifted-register forms. Backports commit 581c6ebd17c8f56ad52772216e6c6d8cc8997e8b from qemu	2019-11-28 02:39:16 -05:00
Richard Henderson	1b21ced6a1	target/arm: Convert Data Processing (reg-shifted-reg) Convert the register shifted by register form of the data processing insns. For A32, we cannot yet remove any code because the legacy decoder intertwines the immediate form. Backports commit 5be2c12337f4cbdbda4efe6ab485350f730faaad from qemu	2019-11-28 02:39:16 -05:00
Richard Henderson	e151696a65	target/arm: Convert Data Processing (register) Convert the register shifted by immediate form of the data processing insns. For A32, we cannot yet remove any code because the legacy decoder intertwines the reg-shifted-reg and immediate forms. Backports commit 25ae32c558182c07fc6ad01b936e9151cbf00c44 from qemu	2019-11-28 02:38:58 -05:00
Richard Henderson	9fc793b566	target/arm: Add stubs for aa32 decodetree Add the infrastructure that will become the new decoder. No instructions adjusted so far. Backports commit 51409b9e8cfe997b1ac3365df7400e0c6e844437 from qemu	2019-11-28 02:38:49 -05:00
Richard Henderson	6ec6c71d50	target/arm: Use store_reg_from_load in thumb2 code This function already includes the test for an interworking write to PC from a load. Change the T32 LDM implementation to match the A32 LDM implementation. For LDM, the reordering of the tests does not change valid behaviour because the only case that differs is has rn == 15, which is UNPREDICTABLE. Backports commit 69be3e13764111737e1a7a13bb0c231e4d5be756 from qemu	2019-11-28 02:38:42 -05:00
Richard Henderson	46a8dfff59	target/arm: Fix SMMLS argument order The previous simplification got the order of operands to the subtraction wrong. Since the 64-bit product is the subtrahend, we must use a 64-bit subtract to properly compute the borrow from the low-part of the product. Fixes: 5f8cd06ebcf5 ("target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR") Backports commit e0a0c8322b8ebcdad674f443a3e86db8708d6738 from qemu	2019-11-20 17:24:44 -05:00
Peter Maydell	56b54f361e	target/arm: Allow ARMCPRegInfo read/write functions to throw exceptions Currently the only part of an ARMCPRegInfo which is allowed to cause a CPU exception is the access function, which returns a value indicating that some flavour of UNDEF should be generated. For the ATS system instructions, we would like to conditionally generate exceptions as part of the writefn, because some faults during the page table walk (like external aborts) should cause an exception to be raised rather than returning a value. There are several ways we could do this: * plumb the GETPC() value from the top level set_cp_reg/get_cp_reg helper functions through into the readfn and writefn hooks * add extra readfn_with_ra/writefn_with_ra hooks that take the GETPC() value * require the ATS instructions to provide a dummy accessfn, which serves no purpose except to cause the code generation to emit TCG ops to sync the CPU state * add an ARM_CP_ flag to mark the ARMCPRegInfo as possibly throwing an exception in its read/write hooks, and make the codegen sync the CPU state before calling the hooks if the flag is set This patch opts for the last of these, as it is fairly simple to implement and doesn't require invasive changes like updating the readfn/writefn hook function prototype signature. Backports commit 37ff584c15bc3e1dd2c26b1998f00ff87189538c from qemu	2019-11-20 17:24:37 -05:00
Richard Henderson	87c06b7fae	target/arm: Factor out unallocated_encoding for aarch32 Make this a static function private to translate.c. Thus we can use the same idiom between aarch64 and aarch32 without actually sharing function implementations. Backports commit 1ce21ba1eaf08b22da5925f3e37fc0b4322da858 from qemu	2019-11-18 23:51:45 -05:00
Richard Henderson	1f59a43544	Revert "target/arm: Use unallocated_encoding for aarch32" Despite the fact that the text for the call to gen_exception_insn is identical for aarch64 and aarch32, the implementation inside gen_exception_insn is totally different. This fixes exceptions raised from aarch64. This reverts commit `fb2d3c9a9a`.	2019-11-18 23:49:47 -05:00
Richard Henderson	9d2a3064af	target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Separate shift + extract low will result in one extra insn for hosts like RISC-V, MIPS, and Sparc. Backports commit 664b7e3b97d6376f3329986c465b3782458b0f8b from qemu	2019-11-18 20:36:19 -05:00
Richard Henderson	93c016a3e7	target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR All of the inputs to these instructions are 32-bits. Rather than extend each input to 64-bits and then extract the high 32-bits of the output, use tcg_gen_muls2_i32 and other 32-bit generator functions. Backports commit 5f8cd06ebcf57420be8fea4574de2e074de46709 from qemu	2019-11-18 20:31:12 -05:00
Richard Henderson	4a1cc16eef	target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Rotate is the more compact and obvious way to swap 16-bit elements of a 32-bit word. Backports commit adefba76e8bf10dfb342094d2f5debfeedb1a74d from qemu	2019-11-18 20:27:12 -05:00
Richard Henderson	751ab7b24b	target/arm: Use ror32 instead of open-coding the operation The helper function is more documentary, and also already handles the case of rotate by zero. Backports commit dd861b3f29be97a9e3cdb9769dcbc0c7d7825185 from qemu	2019-11-18 20:25:51 -05:00
Richard Henderson	df4c773ed2	target/arm: Remove redundant shift tests The immediate shift generator functions already test for, and eliminate, the case of a shift by zero. Backports commit 464eaa9571fae5867d9aea7d7209c091c8a50223 from qemu	2019-11-18 20:24:39 -05:00
Richard Henderson	4dd30ebfbd	target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Use deposit as the composit operation to merge the bits from the two inputs. Backports commit d1f8755fc93911f5b27246b1da794542d222fa1b from qemu	2019-11-18 20:22:00 -05:00
Richard Henderson	25ccd28e78	target/arm: Use tcg_gen_extract_i32 for shifter_out_im Extract is a compact combination of shift + and. Backports commit 191f4bfe8d6cf0c7d5cd7f84cd7076e32e3745dd from qemu	2019-11-18 20:19:40 -05:00
Richard Henderson	3d3d56056b	target/arm: Remove helper_double_saturate Replace x = double_saturate(y) with x = add_saturate(y, y). There is no need for a separate more specialized helper. Backports commit 640581a06d14e2d0d3c3ba79b916de6bc43578b0 from qemu	2019-11-18 20:13:21 -05:00
Richard Henderson	fb2d3c9a9a	target/arm: Use unallocated_encoding for aarch32 Promote this function from aarch64 to fully general use. Use it to unify the code sequences for generating illegal opcode exceptions. Backports commit 3cb36637157088892e9e33ddb1034bffd1251d3b from qemu	2019-11-18 20:10:50 -05:00
Richard Henderson	d562bea784	target/arm: Remove offset argument to gen_exception_bkpt_insn Unlike the other more generic gen_exception{,_internal}_insn interfaces, breakpoints always refer to the current instruction. Backports commit 06bcbda3f64d464b6ecac789bce4bd69f199cd68 from qemu	2019-11-18 20:05:45 -05:00
Richard Henderson	f19b4df20d	target/arm: Replace offset with pc in gen_exception_internal_insn The offset is variable depending on the instruction set. Passing in the actual value is clearer in intent. Backpors commit aee828e7541a5895669ade3a4b6978382b6b094a from qemu	2019-11-18 20:05:23 -05:00
Richard Henderson	00fbadf637	target/arm: Replace s->pc with s->base.pc_next We must update s->base.pc_next when we return from the translate_insn hook to the main translator loop. By incrementing s->base.pc_next immediately after reading the insn word, "pc_next" contains the address of the next instruction throughout translation. All remaining uses of s->pc are referencing the address of the next insn, so this is now a simple global replacement. Remove the "s->pc" field. Backports commit a04159166b880b505ccadc16f2fe84169806883d from qemu	2019-11-18 17:32:53 -05:00
Richard Henderson	7d1fcef722	target/arm: Remove redundant s->pc & ~1 The thumb bit has already been removed from s->pc, and is always even. Backports commit 4818c3743b0e0095fdcecd24457da9b3443730ab from qemu	2019-11-18 17:32:53 -05:00
Richard Henderson	a2e60445de	target/arm: Introduce add_reg_for_lit Provide a common routine for the places that require ALIGN(PC, 4) as the base address as opposed to plain PC. The two are always the same for A32, but the difference is meaningful for thumb mode. Backports commit 16e0d8234ef9291747332d2c431e46808a060472 from qemu	2019-11-18 17:32:49 -05:00
Richard Henderson	1c0914e58c	target/arm: Introduce read_pc We currently have 3 different ways of computing the architectural value of "PC" as seen in the ARM ARM. The value of s->pc has been incremented past the current insn, but that is all. Thus for a32, PC = s->pc + 4; for t32, PC = s->pc; for t16, PC = s->pc + 2. These differing computations make it impossible at present to unify the various code paths. With the newly introduced s->pc_curr, we can compute the correct value for all cases, using the formula given in the ARM ARM. This changes the behaviour for load_reg() and load_reg_var() when called with reg==15 from a 32-bit Thumb instruction: previously they would have returned the incorrect value of pc_curr + 6, and now they will return the architecturally correct value of PC, which is pc_curr + 4. This will not affect well-behaved guest software, because all of the places we call these functions from T32 code are instructions where using r15 is UNPREDICTABLE. Using the architectural PC value here is more consistent with the T16 and A32 behaviour. Backports commit fdbcf6329d0c2984c55d7019419a72bf8e583c36 from qemu	2019-11-18 17:04:50 -05:00
Richard Henderson	0048f3e887	target/arm: Introduce pc_curr Add a new field to retain the address of the instruction currently being translated. The 32-bit uses are all within subroutines used by a32 and t32. This will become less obvious when t16 support is merged with a32+t32, and having a clear definition will help. Convert aarch64 as well for consistency. Note that there is one instance of a pre-assert fprintf that used the wrong value for the address of the current instruction. Backports commit 43722a6d4f0c92f7e7e1e291580039b0f9789df1 from qemu	2019-11-18 16:58:40 -05:00
Richard Henderson	1aa3c685a8	target/arm: Pass in pc to thumb_insn_is_16bit This function is used in two different contexts, and it will be clearer if the function is given the address to which it applies. Backports commit 331b1ca616cb708db30dab68e3262d286e687f24 from qemu	2019-11-18 16:52:35 -05:00
Peter Maydell	c61e22627d	target/arm: Fix routing of singlestep exceptions When generating an architectural single-step exception we were routing it to the "default exception level", which is to say the same exception level we execute at except that EL0 exceptions go to EL1. This is incorrect because the debug exception level can be configured by the guest for situations such as single stepping of EL0 and EL1 code by EL2. We have to track the target debug exception level in the TB flags, because it is dependent on CPU state like HCR_EL2.TGE and MDCR_EL2.TDE. (That we were previously calling the arm_debug_target_el() function to determine dc->ss_same_el is itself a bug, though one that would only have manifested as incorrect syndrome information.) Since we are out of TB flag bits unless we want to expand into the cs_base field, we share some bits with the M-profile only HANDLER and STACKCHECK bits, since only A-profile has this singlestep. Fixes: https://bugs.launchpad.net/qemu/+bug/1838913 Backports commit 8bd587c1066f4456ddfe611b571d9439a947d74c from qemu	2019-11-18 16:50:15 -05:00
Peter Maydell	3f531fac61	target/arm: Factor out 'generate singlestep exception' function Factor out code to 'generate a singlestep exception', which is currently repeated in four places. To do this we need to also pull the identical copies of the gen-exception() function out of translate-a64.c and translate.c into translate.h. (There is a bug in the code: we're taking the exception to the wrong target EL. This will be simpler to fix if there's only one place to do it.) Backports commit c1d5f50f094ab204accfacc2ee6aafc9601dd5c4 from qemu	2019-11-18 16:47:08 -05:00
Peter Maydell	3fc86e1901	target/arm: Don't abort on M-profile exception return in linux-user mode An attempt to do an exception-return (branch to one of the magic addresses) in linux-user mode for M-profile should behave like a normal branch, because linux-user mode is always going to be in 'handler' mode. This used to work, but we broke it when we added support for the M-profile security extension in commit d02a8698d7ae2bfed. In that commit we allowed even handler-mode calls to magic return values to be checked for and dealt with by causing an EXCP_EXCEPTION_EXIT exception to be taken, because this is needed for the FNC_RETURN return-from-non-secure-function-call handling. For system mode we added a check in do_v7m_exception_exit() to make any spurious calls from Handler mode behave correctly, but forgot that linux-user mode would also be affected. How an attempted return-from-non-secure-function-call in linux-user mode should be handled is not clear -- on real hardware it would result in return to secure code (not to the Linux kernel) which could then handle the error in any way it chose. For QEMU we take the simple approach of treating this erroneous return the same way it would be handled on a CPU without the security extensions -- treat it as a normal branch. The upshot of all this is that for linux-user mode we should never do any of the bx_excret magic, so the code change is simple. This ought to be a weird corner case that only affects broken guest code (because Linux user processes should never be attempting to do exception returns or NS function returns), except that the code that assigns addresses in RAM for the process and stack in our linux-user code does not attempt to avoid this magic address range, so legitimate code attempting to return to a trampoline routine on the stack can fall into this case. This change fixes those programs, but we should also look at restricting the range of memory we use for M-profile linux-user guests to the area that would be real RAM in hardware. Backports commit 9027d3fba605d8f6093342ebe4a1da450d374630 from qemu	2019-11-18 16:30:43 -05:00
Peter Maydell	0d89bce217	target/arm: Execute Thumb instructions when their condbits are 0xf Thumb instructions in an IT block are set up to be conditionally executed depending on a set of condition bits encoded into the IT bits of the CPSR/XPSR. The architecture specifies that if the condition bits are 0b1111 this means "always execute" (like 0b1110), not "never execute"; we were treating it as "never execute". (See the ConditionHolds() pseudocode in both the A-profile and M-profile Arm ARM.) This is a bit of an obscure corner case, because the only legal way to get to an 0b1111 set of condbits is to do an exception return which sets the XPSR/CPSR up that way. An IT instruction which encodes a condition sequence that would include an 0b1111 is UNPREDICTABLE, and for v8A the CONSTRAINED UNPREDICTABLE choices for such an IT insn are to NOP, UNDEF, or treat 0b1111 like 0b1110. Add a comment noting that we take the latter option. Backports commit 5529de1e5512c05276825fa8b922147663fd6eac from qemu	2019-08-08 18:07:57 -04:00
Philippe Mathieu-Daudé	f77b60d7e9	target/arm: Fix coding style issues Since we'll move this code around, fix its style first. Backports commit 9798ac7162c8a720c5d28f4d1fc9e03c7ab4f015 from qemu	2019-08-08 15:05:57 -04:00
Lioncash	76d33b34e1	target/arm: Fix bad patch merge in arm_tr_init_disas_context	2019-08-08 14:37:38 -04:00
Peter Maydell	318a1ddf39	target/arm: Remove unused cpu_F0s, cpu_F0d, cpu_F1s, cpu_F1d Remove the now unused TCG globals cpu_F0s, cpu_F0d, cpu_F1s, cpu_F1d. cpu_M0 is still used by the iwmmxt code, and cpu_V0 and cpu_V1 are used by both iwmmxt and Neon. Backports commit d9eea52c67c04c58ecceba6ffe5a93d1d02051fa from qemu	2019-06-25 18:45:53 -05:00
Peter Maydell	74168c20f2	target/arm: Stop using deprecated functions in NEON_2RM_VCVT_F32_F16 Remove some old constructns from NEON_2RM_VCVT_F16_F32 code: * don't use CPU_F0s * don't use tcg_gen_st_f32 Backports commit b66f6b9981004bbf120b8d17c20f92785179bdf2 from qemu	2019-06-25 18:43:40 -05:00
Peter Maydell	8ae25f6e4c	target/arm: stop using deprecated functions in NEON_2RM_VCVT_F16_F32 Remove some old constructs from NEON_2RM_VCVT_F16_F32 code: * don't use cpu_F0s * don't use tcg_gen_ld_f32 Backports commit 58f2682eee738e8890f9cfe858e0f4f68b00d45d from qemu	2019-06-25 18:39:43 -05:00
Peter Maydell	d419fbc270	target/arm: Stop using cpu_F0s in Neon VCVT fixed-point ops Stop using cpu_F0s in the Neon VCVT fixed-point operations. Backports commit c253dd7832bc6b4e140a0da56410a9336cce05bc from qemu	2019-06-25 18:35:33 -05:00
Peter Maydell	46216ae382	target/arm: Stop using cpu_F0s for Neon f32/s32 VCVT Stop using cpu_F0s for the Neon f32/s32 VCVT operations. Since this is the last user of cpu_F0s in the Neon 2rm-op loop, we can remove the handling code for it too. Backports commit 60737ed5785b9c1c6f1c85575dfdd1e9eec91878 from qemu	2019-06-25 18:32:32 -05:00
Peter Maydell	2fbe9c1d1d	target/arm: Stop using cpu_F0s for NEON_2RM_VRECPE_F and NEON_2RM_VRSQRTE_F Stop using cpu_F0s for NEON_2RM_VRECPE_F and NEON_2RM_VRSQRTE_F. Backports commit 9a011fece7201f8e268c982df8c7836f3335bbe6 from qemu	2019-06-25 18:29:22 -05:00
Peter Maydell	f82ea34369	target/arm: Stop using cpu_F0s for NEON_2RM_VCVT[ANPM][US] Stop using cpu_F0s for the NEON_2RM_VCVT[ANPM][US] ops. Backports commit 30bf0a018f6c706913c8c0ea57b386907f4229be from qemu	2019-06-25 18:28:03 -05:00
Peter Maydell	0d4535bf16	target/arm: Stop using cpu_F0s for NEON_2RM_VRINT* Switch NEON_2RM_VRINT* away from using cpu_F0s. Backports commit 3b52ad1fae804acdc2fdc41b418a65249beae430 from qemu	2019-06-25 18:26:24 -05:00
Peter Maydell	a62cbc7ac5	target/arm: Stop using cpu_F0s for NEON_2RM_VNEG_F Switch NEON_2RM_VABS_F away from using cpu_F0s. Backports commit cedcc96fc7c8e520a190a010ac97dbb53e57d7d2 from qemu	2019-06-25 18:24:01 -05:00
Peter Maydell	63d7f92eba	target/arm: Stop using cpu_F0s for NEON_2RM_VABS_F Where Neon instructions are floating point operations, we mostly use the old VFP utility functions like gen_vfp_abs() which work on the TCG globals cpu_F0s and cpu_F1s. The Neon for-each-element loop conditionally loads the inputs into either a plain old TCG temporary for most operations or into cpu_F0s for float operations, and similarly stores back either cpu_F0s or the temporary. Switch NEON_2RM_VABS_F away from using cpu_F0s, and update neon_2rm_is_float_op() accordingly. Backports commit fd8a68cdcf81d70eebf866a132e9780d4108da9c from qemu	2019-06-25 18:22:05 -05:00
Peter Maydell	1a0d31c05e	target/arm: Convert float-to-integer VCVT insns to decodetree Convert the float-to-integer VCVT instructions to decodetree. Since these are the last unconverted instructions, we can delete the old decoder structure entirely now. Backports commit 3111bfc2da6ba0c8396dc97ca479942d711c6146 from qemu	2019-06-13 19:40:02 -04:00
Peter Maydell	f6c67559d4	target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree Convert the VCVT (between floating-point and fixed-point) instructions to decodetree. Backports commit e3d6f4290c788e850c64815f0b3e331600a4bcc0 from qemu	2019-06-13 19:35:51 -04:00
Peter Maydell	c66d477359	target/arm: Convert VJCVT to decodetree Convert the VJCVT instruction to decodetree. Backports commit 92073e947487e2109f3dfebfeaa48d6323cbd981 from qemu	2019-06-13 19:31:35 -04:00
Peter Maydell	7be9e6f9b4	target/arm: Convert integer-to-float insns to decodetree Convert the VCVT integer-to-float instructions to decodetree. Backports commit 8fc9d8918cde342c71923e361b9f2193e36ed18b from qemu	2019-06-13 19:20:41 -04:00
Peter Maydell	e0e4f99103	target/arm: Convert double-single precision conversion insns to decodetree Convert the VCVT double/single precision conversion insns to decodetree. Backports commit 6ed7e49c3693ed8411773c4880f42b2932beb12d from qemu	2019-06-13 19:18:01 -04:00
Peter Maydell	ab9d0235ed	target/arm: Convert VFP round insns to decodetree Convert the VFP round-to-integer instructions VRINTR, VRINTZ and VRINTX to decodetree. These instructions were only introduced as part of the "VFP misc" additions in v8A, so we check this. The old decoder's implementation was incorrectly providing them even for v7A CPUs. Backports commit e25155f55dc4abb427a88dfe58bbbc550fe7d643 from qemu	2019-06-13 19:15:05 -04:00
Peter Maydell	9e842a0f2a	target/arm: Convert the VCVT-to-f16 insns to decodetree Convert the VCVTT and VCVTB instructions which convert from f32 and f64 to f16 to decodetree. Since we're no longer constrained to the old decoder's style using cpu_F0s and cpu_F0d we can perform a direct 16 bit store of the right half of the input single-precision register rather than doing a load/modify/store sequence on the full 32 bits. Backports commit cdfd14e86ab0b1ca29a702d13a8e4af2e902a9bf from qemu	2019-06-13 19:03:59 -04:00
Peter Maydell	7d927b2d0e	target/arm: Convert the VCVT-from-f16 insns to decodetree Convert the VCVTT, VCVTB instructions that deal with conversion from half-precision floats to f32 or 64 to decodetree. Since we're no longer constrained to the old decoder's style using cpu_F0s and cpu_F0d we can perform a direct 16 bit load of the right half of the input single-precision register rather than loading the full 32 bits and then doing a separate shift or sign-extension. Backports commit b623d803dda805f07aadcbf098961fde27315c19 from qemu	2019-06-13 19:00:23 -04:00
Peter Maydell	e6cc2616d2	target/arm: Convert VFP comparison insns to decodetree Convert the VFP comparison instructions to decodetree. Note that comparison instructions should not honour the VFP short-vector length and stride information: they are scalar-only operations. This applies to all the 2-operand instructions except for VMOV, VABS, VNEG and VSQRT. (In the old decoder this is implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.) Backports commit 386bba2368842fc74388a3c1651c6c0c0c70adbd from qemu	2019-06-13 18:55:53 -04:00
Peter Maydell	a75a3e321f	target/arm: Convert VMOV (register) to decodetree Backports commit 17552b979ebb9848a534c25ebed18a1072710058 from qemu	2019-06-13 18:49:49 -04:00
Peter Maydell	ee30962891	target/arm: Convert VSQRT to decodetree Convert the VSQRT instruction to decodetree. Backports commit b8474540cbce4e2fa45010416375d1bcbe86dc15 from qemu	2019-06-13 18:47:32 -04:00
Peter Maydell	7aea3da6b7	target/arm: Convert VNEG to decodetree Convert the VNEG instruction to decodetree. Backports commit 1882651afdb0ca44f0631192fbe65a71c660d809 from qemu	2019-06-13 18:43:50 -04:00
Peter Maydell	1032d86ad3	target/arm: Convert VABS to decodetree Convert the VFP VABS instruction to decodetree. Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or VFPGen2OpDPFn because none of the operations which use this format and support short vectors will need it. Backports commit 90287e22c987e9840704345ed33d237cbe759dd9 from qemu	2019-06-13 18:41:43 -04:00
Peter Maydell	7a16bc6876	target/arm: Convert VMOV (imm) to decodetree Convert the VFP VMOV (immediate) instruction to decodetree. Backports commit b518c753f0b94e14e01e97b4ec42c100dafc0cc2 from qemu	2019-06-13 18:37:58 -04:00
Peter Maydell	0ebb6b8b90	target/arm: Convert VFP fused multiply-add insns to decodetree Convert the VFP fused multiply-add instructions (VFNMA, VFNMS, VFMA, VFMS) to decodetree. Note that in the old decode structure we were implementing these to honour the VFP vector stride/length. These instructions were introduced in VFPv4, and in the v7A architecture they are UNPREDICTABLE if the vector stride or length are non-zero. In v8A they must UNDEF if stride or length are non-zero, like all VFP instructions; we choose to UNDEF always. Backports commit d4893b01d23060845ee3855bc96626e16aad9ab5 from qemu	2019-06-13 18:24:36 -04:00
Peter Maydell	321bcc822b	target/arm: Convert VDIV to decodetree Convert the VDIV instruction to decodetree. Backports commit 519ee7ae31e050eb0ff9ad35c213f0bd7ab1c03e from qemu	2019-06-13 18:19:47 -04:00
Peter Maydell	76c74bc657	target/arm: Convert VSUB to decodetree Convert the VSUB instruction to decodetree. Backports commit 8fec9a119264b7936503abce3c106fad7e3ccb76 from qemu.	2019-06-13 18:18:00 -04:00
Peter Maydell	f56f0342ad	target/arm: Convert VADD to decodetree Convert the VADD instruction to decodetree. Backports commit ce28b303716e7eca3f3765bf6776d722ebbe1122 from qemu	2019-06-13 18:15:52 -04:00
Peter Maydell	06584edf61	target/arm: Convert VNMUL to decodetree Convert the VNMUL instruction to decodetree. Backports commit 43c4be1236c105090d134540da1036073d157cd4 from qemu	2019-06-13 18:14:16 -04:00
Peter Maydell	2c5e102017	target/arm: Convert VMUL to decodetree Convert the VMUL instruction to decodetree. Backports commit 88c5188ced60e9f2b8cc3af3b9bc4a8031c8c996 from qemu	2019-06-13 18:12:03 -04:00
Peter Maydell	b26b6a12a2	target/arm: Convert VFP VNMLA to decodetree Convert the VFP VNMLA instruction to decodetree. Backports commit 8a483533adc1bdc2decb8f456dbe930a2d245a8b from qemu	2019-06-13 18:09:57 -04:00
Peter Maydell	638b90de31	target/arm: Convert VFP VNMLS to decodetree Convert the VFP VNMLS instruction to decodetree. Backports commit c54a416cc6d60efbc79dd37aaf0c8918c05b5815 from qemu	2019-06-13 18:06:59 -04:00
Peter Maydell	67ad40ffa4	target/arm: Convert VFP VMLS to decodetree Convert the VFP VMLS instruction to decodetree. Backports commit e7258280d46af4ab6a0cc93ccfe8f6614defb4b7 from qemu	2019-06-13 18:02:37 -04:00
Peter Maydell	edf81eb214	target/arm: Convert VFP VMLA to decodetree Convert the VFP VMLA instruction to decodetree. This is the first of the VFP 3-operand data processing instructions, so we include in this patch the code which loops over the elements for an old-style VFP vector operation. The existing code to do this looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since we are going to be converting instructions one at a time anyway we can take the opportunity to make the new loop use TCG temporaries, which means we can do that conversion one operation at a time rather than needing to do it all in one go. We include an UNDEF check which was missing in the old code: short-vector operations (with stride or length non-zero) were deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec field does not indicate that support for short vectors is present we UNDEF the operations that would use them. (This is a change of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which previously were all incorrectly allowing short-vector operations.) Note that the conversion fixes a bug in the old code for the case of VFP short-vector "mixed scalar/vector operations". These happen where the destination register is in a vector bank but but the second operand is in a scalar bank. For example vmla.f64 d10, d1, d16 with length 2 stride 2 is equivalent to the pair of scalar operations vmla.f64 d10, d1, d16 vmla.f64 d8, d3, d16 where the destination and first input register cycle through their vector but the second input is scalar (d16). In the old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d} as a temporary output for the multiply, which trashes the second input operand. For the fully-scalar case (where we never do a second iteration) and the fully-vector case (where the loop loads the new second input operand) this doesn't matter, but for the mixed scalar/vector case we will end up using the wrong value for later loop iterations. In the new code we use TCG temporaries and so avoid the bug. This bug is present for all the multiply-accumulate insns that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS. Note 2: the expression used to calculate the next register number in the vector bank is not in fact correct; we leave this behaviour unchanged from the old decoder and will fix this bug later in the series. Backports commit 266bd25c485597c94209bfdb3891c1d0c573c164 from qemu	2019-06-13 17:59:16 -04:00
Peter Maydell	93fe4cbe9e	target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans functions which perform the memory accesses by going via the TCG globals cpu_F0s and cpu_F0d, to use local TCG temps instead. Backports commit 3993d0407dff7233e42f2251db971e126a0497e9 from qemu	2019-06-13 17:31:28 -04:00
Peter Maydell	ff7042567e	target/arm: Convert the VFP load/store multiple insns to decodetree Convert the VFP load/store multiple insns to decodetree. This includes tightening up the UNDEF checking for pre-VFPv3 CPUs which only have D0-D15 : they now UNDEF for any access to D16-D31, not merely when the smallest register in the transfer list is in D16-D31. This conversion does not try to share code between the single precision and the double precision versions; this looks a bit duplicative of code, but it leaves the door open for a future refactoring which gets rid of the use of the "F0" registers by inlining the various functions like gen_vfp_ld() and gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }" conditionalisation. Backports commit fa288de272c5c8a66d5eb683b123706a52bc7ad6 from qemu	2019-06-13 17:26:52 -04:00

1 2 3 4 5 ...

339 Commits