unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-19 15:18:24 +02:00

Author	SHA1	Message	Date
Emilio G. Cota	8552d95c52	exec-all: extract tb->tc_* into a separate struct tc_tb In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Backports commit e7e168f41364c6e83d0f75fc1b3ce7f9c41ccf76 from qemu	2018-03-05 02:57:22 -05:00
Emilio G. Cota	b4a7d8b773	exec-all: bring tb->invalid into tb->cflags This gets rid of a hole in struct TranslationBlock. Backports commit 84f1c148da2b35fbb5a436597872765257e8914e from qemu	2018-03-05 02:46:21 -05:00
Emilio G. Cota	210d13ec49	tcg: consolidate TB lookups in tb_lookup__cpu_state This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches # 0.001 M/sec ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 10,558 page-faults # 0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions # 1.41 insns per cycle 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches # 0.001 M/sec ( +- 0.48% ) 1 CPU-migrations # 0.000 M/sec 10,708 page-faults # 0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions # 1.42 insns per cycle 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches # 0.001 M/sec ( +- 0.42% ) 1 CPU-migrations # 0.000 M/sec 10,563 page-faults # 0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions # 1.43 insns per cycle 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Backports commit f6bb84d53110398f4899c19dab4e0fe9908ec060 from qemu	2018-03-05 02:42:46 -05:00
Emilio G. Cota	921c71a9ee	cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Reusing the have_tb_lock name, which is also defined in translate-all.c, makes code reviewing unnecessarily harder. Avoid potential confusion by renaming the local have_tb_lock variable to something else. Backports commit 841710c78e022cbc1f798cced035789725702dac from qemu	2018-03-05 02:09:42 -05:00
Richard Henderson	31b8b67cd3	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. Backports commit a85833933628384d74ec412024d55cf012640287 from qemu	2018-03-04 21:52:35 -05:00
Emilio G. Cota	8f4f15e5f5	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu	2018-03-02 21:05:18 -05:00
Paolo Bonzini	11709d0afa	cpu-exec: remove unnecessary check of cpu->exit_request The cpu->exit_request check in cpu_loop_exec_tb is unnecessary, because cpu->tcg_exit_req is always set after cpu->exit_request. So let the TB exit and we will pick up the exit request later in cpu_handle_interrupt. Backports commit 55ac0a9bf4e1b1adfc7d73586a7aa085f58c9851 from qemu	2018-03-02 11:21:35 -05:00
Alex Bennée	d56a4b0be4	tcg: handle EXCP_ATOMIC exception for system emulation The patch enables handling atomic code in the guest. This should be preferably done in cpu_handle_exception(), but the current assumptions regarding when we can execute atomic sections cause a deadlock. The current mechanism discards the flags which were set in atomic execution. We ensure they are properly saved by calling the cc->cpu_exec_enter/leave() functions around the loop. As we are running cpu_exec_step_atomic() from the outermost loop we need to avoid an abort() when single stepping over atomic code since debug exception longjmp will point to the the setlongjmp in cpu_exec(). We do this by setting a new jmp_env so that it jumps back here on an exception. Backports relevant parts of commit 08e73c48b053566bfe0c994f154f73991cd0ff0e from qemu	2018-03-02 09:56:43 -05:00
Alex Bennée	6760605e1c	tcg: enable thread-per-vCPU There are a couple of changes that occur at the same time here: - introduce a single vCPU qemu_tcg_cpu_thread_fn One of these is spawned per vCPU with its own Thread and Condition variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old single threaded function. - the TLS current_cpu variable is now live for the lifetime of MTTCG vCPU threads. This is for future work where async jobs need to know the vCPU context they are operating in. The user to switch on multi-thread behaviour and spawn a thread per-vCPU. For a simple test kvm-unit-test like: ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi Will now use 4 vCPU threads and have an expected FAIL (instead of the unexpected PASS) as the default mode of the test has no protection when incrementing a shared variable. We enable the parallel_cpus flag to ensure we generate correct barrier and atomic code if supported by the front and backends. This doesn't automatically enable MTTCG until default_mttcg_enabled() is updated to check the configuration is supported. Backports relevant parts of commit 372579427a5040a26dfee78464b50e2bdf27ef26	2018-03-02 09:43:14 -05:00
Alex Bennée	632b853761	tcg: remove global exit_request There are now only two uses of the global exit_request left. The first ensures we exit the run_loop when we first start to process pending work and in the kick handler. This is just as easily done by setting the first_cpu->exit_request flag. The second use is in the round robin kick routine. The global exit_request ensured every vCPU would set its local exit_request and cause a full exit of the loop. Now the iothread isn't being held while running we can just rely on the kick handler to push us out as intended. We lightly re-factor the main vCPU thread to ensure cpu->exit_requests cause us to exit the main loop and process any IO requests that might come along. As an cpu->exit_request may legitimately get squashed while processing the EXCP_INTERRUPT exception we also check cpu->queued_work_first to ensure queued work is expedited as soon as possible. Backports commit e5143e30fb87fbf179029387f83f98a5a9b27f19 from qemu	2018-03-02 09:38:08 -05:00
Alex Bennée	4d90497d14	tcg: rename tcg_current_cpu to tcg_current_rr_cpu ..and make the definition local to cpus. In preparation for MTTCG the concept of a global tcg_current_cpu will no longer make sense. However we still need to keep track of it in the single-threaded case to be able to exit quickly when required. qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as well as qemu_kick_rr_cpu() which will become a no-op in MTTCG. For the time being the setting of the global exit_request remains. Backports commit 791158d93b27f22a17c2ada06621831d54f09a2c from qemu Also atomically sets the unicorn equivalents	2018-03-02 09:28:51 -05:00
Paolo Bonzini	e66da21a56	cpu-exec: remove outermost infinite loop Reorganize the sigsetjmp so that the restart case falls through to cpu_handle_exception and the execution loop. Backports commit 4515e58d60dc3aac53dbd5e53e4c3bec126967d8 from qemu	2018-03-02 08:13:43 -05:00
Paolo Bonzini	af524401ad	cpu-exec: avoid repeated sigsetjmp on interrupts The sigsetjmp only needs to be prepared once for the whole execution of cpu_exec. This patch takes care of the "== 0" side, using a nested loop so that cpu_handle_interrupt goes straight back to cpu_handle_exception without doing another sigsetjmp. Backports commit a42cf3f3f266a97ceb13e8b99bc7b13f7bf4192a from qemu	2018-03-02 08:09:50 -05:00
Paolo Bonzini	28b615a8b7	cpu-exec: avoid cpu_loop_exit in cpu_handle_interrupt The siglongjmp goes straight back to the beginning of cpu_exec's outermost loop. We do not need a siglongjmp, we can simply leave the inner TB execution loop. Backports commit 209b71b60ef3341246038e1c926c3b704969cdd3 from qemu	2018-03-02 08:03:18 -05:00
Paolo Bonzini	b39acfc3c6	cpu-exec: tighten barrier on TCG_EXIT_REQUESTED This seems to have worked just fine so far on weakly-ordered architectures, but I don't see anything that prevents the reordering from: store 1 to exit_request store 1 to tcg_exit_req load tcg_exit_req store 0 to tcg_exit_req load exit_request store 0 to exit_request store 1 to exit_request store 1 to tcg_exit_req to this: store 1 to exit_request store 1 to tcg_exit_req load tcg_exit_req load exit_request store 1 to exit_request store 1 to tcg_exit_req store 0 to tcg_exit_req store 0 to exit_request therefore losing a request. It's possible that other memory barriers (e.g. in rcu_read_unlock) are hiding it, but better safe than sorry. Backports commit a70fe14b7dddcb944fbd6c9f3739cd3a22089af5 from qemu	2018-03-02 08:01:08 -05:00
Paolo Bonzini	560515941a	target-i386: correctly propagate retaddr into SVM helpers Commit 2afbdf8 ("target-i386: exception handling for memory helpers", 2015-09-15) changed tlb_fill's cpu_restore_state+raise_exception_err to raise_exception_err_ra. After this change, the cpu_restore_state and raise_exception_err's cpu_loop_exit are merged into raise_exception_err_ra's cpu_loop_exit_restore. This actually fixed some bugs, but when SVM is enabled there is a second path from raise_exception_err_ra to cpu_loop_exit. This is the VMEXIT path, and now cpu_vmexit is called without a cpu_restore_state before. The fix is to pass the retaddr to cpu_vmexit (via cpu_svm_check_intercept_param). All helpers can now use GETPC() to pass the correct retaddr, too. Backports commit 823fb688ebc52a7d79c1308acb28c92b56820167 from qemu	2018-03-01 09:31:16 -05:00
Paolo Bonzini	9404dbf74e	cpu-exec: fix icount out-of-bounds access When icount is active, tb_add_jump is surprisingly called with an out of bounds basic block index. I have no idea how that can work, but it does not seem like a good idea. Clear *last_tb for all TB_EXIT_ICOUNT_EXPIRED cases, even when all you have to do is refill icount_extra. Backports commit d8dea6fbcbed177ca5d23ab77b3834a9437f0e88 from qemu	2018-03-01 09:17:26 -05:00
Richard Henderson	e35aacd5ae	tcg: Add EXCP_ATOMIC When we cannot emulate an atomic operation within a parallel context, this exception allows us to stop the world and try again in a serial context. Backports commit fdbc2b5722f6092e47181a947c90fd4bdcc1c121 from qemu Also backports parts of commit 02d57ea115b7669f588371c86484a2e8ebc369be	2018-02-27 11:57:58 -05:00
Alex Bennée	d4cb954102	cpu: atomically modify cpu->exit_request ThreadSanitizer picks up potential races although we already use barriers to ensure things are in the correct order when processing exit requests. For true C11 defined behaviour across threads we need to use relaxed atomic_set/atomic_read semantics to reassure tsan. Backports commit 027d9a7d2911e993cdcbd21c7c35d1dd058f05bb from qemu	2018-02-26 05:11:18 -05:00
Sergey Fedorov	58ff618708	tcg: rename tb_find_physical() In fact, this function does not exactly perform a lookup by physical address as it is descibed for comment on get_page_addr_code(). Thus it may be a bit confusing to have "physical" in it's name. So rename it to tb_htable_lookup() to better reflect its actual functionality. Backports commit b34de45fc40d01c14b31d3a682e284180a2ed8c5 from qemu	2018-02-26 02:07:06 -05:00
Sergey Fedorov	ab0c87bc6f	tcg: Merge tb_find_slow() and tb_find_fast() These functions are not too big and can be merged together. This makes locking scheme more clear and easier to follow. Backports commit bd2710d5da06ad7706d4864f65b3f0c9f7cb4d7f from qemu	2018-02-26 02:05:19 -05:00
Sergey Fedorov	9b6f287488	tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Backports commit 74d356dd48b64eaa2a6104ac1493ca64cb31fa16 from qemu	2018-02-26 02:01:40 -05:00
Alex Bennée	09c3ef656e	tcg: cpu-exec: remove tb_lock from the hot-path Lock contention in the hot path of moving between existing patched TranslationBlocks is the main drag in multithreaded performance. This patch pushes the tb_lock() usage down to the two places that really need it: - code generation (tb_gen_code) - jump patching (tb_add_jump) The rest of the code doesn't really need to hold a lock as it is either using per-CPU structures, atomically updated or designed to be used in concurrent read situations (qht_lookup). To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the locks become NOPs anyway until the MTTCG work is completed. Backports commit 518615c6503ad78d3bb67ddf1cd848c4a41de02e from qemu	2018-02-26 01:58:33 -05:00
Paolo Bonzini	30845ae475	tcg: Prepare TB invalidation for lockless TB lookup When invalidating a translation block, set an invalid flag into the TranslationBlock structure first. It is also necessary to check whether the target TB is still valid after acquiring 'tb_lock' but before calling tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in future. Note that we don't have to check 'last_tb'; an already invalidated TB will not be executed anyway and it is thus safe to patch it. Backports commit 6d21e4208f382dd8ca1f7995a6dd9ea7ca281163 from qemu	2018-02-26 01:48:13 -05:00
Sergey Fedorov	c0dda5fbe9	tcg: Prepare safe access to tb_flushed out of tb_lock Ensure atomicity and ordering of CPU's 'tb_flushed' access for future translation block lookup out of 'tb_lock'. This field can only be touched from another thread by tb_flush() in user mode emulation. So the only access to be sequential atomic is: * a single write in tb_flush(); * reads/writes out of 'tb_lock'. In future, before enabling MTTCG in system mode, tb_flush() must be safe and this field becomes unnecessary. Backports commit 118b07308a8cedc16ef63d7ab243a95f1701db40 from qemu	2018-02-25 23:33:58 -05:00
Sergey Fedorov	9eb02a540d	tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation block lookup out of 'tb_lock'. Note that this patch does not make CPU's TLB invalidation safe if it is done from some other thread while the CPU is in its execution loop. Backports commit 89a16b1e4294e3664667a151c2f70c84dfac6fd9 from qemu	2018-02-25 23:29:18 -05:00
Sergey Fedorov	371101a184	tcg: Pass last_tb by value to tb_find_fast() This is a small clean up. tb_find_fast() is a final consumer of this variable so no need to pass it by reference. 'last_tb' is always updated by subsequent cpu_loop_exec_tb() in cpu_exec(). This change also simplifies calling cpu_exec_nocache() in cpu_handle_exception(). Backports commit 4b7e69509df2fcbfdab8c62c294dbfcfdab8a6e1 from qemu	2018-02-25 23:23:22 -05:00
Emilio G. Cota	ae3e22a689	tb hash: hash phys_pc, pc, and flags with xxhash For some workloads such as arm bootup, tb_phys_hash is performance-critical. The is due to the high frequency of accesses to the hash table, originated by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's. More info: https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html To dig further into this I modified an arm image booting debian jessie to immediately shut down after boot. Analysis revealed that quite a bit of time is unnecessarily spent in tb_phys_hash: the cause is poor hashing that results in very uneven loading of chains in the hash table's buckets; the longest observed chain had ~550 elements. The appended addresses this with two changes: 1) Use xxhash as the hash table's hash function. xxhash is a fast, high-quality hashing function. 2) Feed the hashing function with not just tb_phys, but also pc and flags. This improves performance over using just tb_phys for hashing, since that resulted in some hash buckets having many TB's, while others getting very few; with these changes, the longest observed chain on a single hash bucket is brought down from ~550 to ~40. Tests show that the other element checked for in tb_find_physical, cs_base, is always a match when tb_phys+pc+flags are a match, so hashing cs_base is wasteful. It could be that this is an ARM-only thing, though. UPDATE: On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote: > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB > consisting of only a delay slot). > It may well still turn out to be reasonable to ignore cs_base for hashing. BTW, after this change the hash table should not be called "tb_hash_phys" anymore; this is addressed later in this series. This change gives consistent bootup time improvements. I tested two host machines: - Intel Xeon E5-2690: 11.6% less time - Intel i7-4790K: 19.2% less time Increasing the number of hash buckets yields further improvements. However, using a larger, fixed number of buckets can degrade performance for other workloads that do not translate as many blocks (600K+ for debian-jessie arm bootup). This is dealt with later in this series. Backports commit 42bd32287f3a18d823f2258b813824a39ed7c6d9 from qemu	2018-02-24 18:00:14 -05:00
Sergey Fedorov	3a9c5e7509	cpu-exec: Fix direct jump to TB spanning page It is not safe to make a direct jump to a TB spanning two pages in system emulation because the mapping for the second page can get changed but we don't take care of direct jumps in this case. However in user mode emulation, this is not the case because there's only static address translation and TBs are always invalidated properly. Backports commit c88c67e58b61618a904d2333ceebefc3c852d32e from qemu	2018-02-24 03:24:53 -05:00
Paolo Bonzini	9485b7c2e1	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu	2018-02-24 02:39:08 -05:00
Paolo Bonzini	2f4ae94b5c	target-i386: make cpu-qom.h not target specific Make X86CPU an opaque type within cpu-qom.h, and move all definitions of private methods, as well as all type definitions that require knowledge of the layout to cpu.h. This helps making files independent of NEED_CPU_H if they only need to pass around CPU pointers. Backports commit 4da6f8d954429c0cd1471d25cb9dbe909607374e from qemu	2018-02-24 00:55:22 -05:00
Sergey Fedorov	eab60b7c77	cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt() Backports commit 8b1fe3f439eaa2f0a6ee7737942bb6c405725867 from qemu	2018-02-24 00:27:05 -05:00
Sergey Fedorov	b4b7b88f69	cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec() Backports commit ba048a4ae15ba0f70c6dcb12ee05db120408de78 from qemu	2018-02-24 00:16:40 -05:00
Sergey Fedorov	aefb8935a9	cpu-exec: Move TB execution stuff out of cpu_exec() Simplify cpu_exec() by extracting TB execution code outside of cpu_exec() into a new static inline function cpu_loop_exec_tb(). Backports commit 928de9ee14b0b63ee9f9275732ed3e1c8b5f4790 from qemu	2018-02-24 00:15:24 -05:00
Sergey Fedorov	d4ef96abf2	cpu-exec: Move interrupt handling out of cpu_exec() Simplify cpu_exec() by extracting interrupt handling code outside of cpu_exec() into a new static inline function cpu_handle_interrupt(). Backports commit c385e6e49763c6dd5dbbd90fadde95d986f8bd38 from qemu	2018-02-24 00:09:06 -05:00
Sergey Fedorov	c1b52a4387	cpu-exec: Move exception handling out of cpu_exec() Simplify cpu_exec() by extracting exception handling code out of cpu_exec() into a new static inline function cpu_handle_exception(). Also make cpu_handle_debug_exception() inline as it is used only once. Backports commit ea284766ec6b9f1712369249566b4c372f3cec8b from qemu	2018-02-24 00:03:37 -05:00
Sergey Fedorov	fc3d135dac	cpu-exec: Move halt handling out of cpu_exec() Simplify cpu_exec() by extracting CPU halt state handling code out of cpu_exec() into a new static inline function cpu_handle_halt(). Backports commit 8b2d34e997371c9729a0f41e3cc624d4300bbe78 from qemu	2018-02-23 23:53:20 -05:00
Lioncash	88d00a75ca	cpu-exec: move cpu_exec to the bottom of the file Remove forward declarations	2018-02-23 23:50:28 -05:00
Sergey Fedorov	0088ca994f	cpu-exec: Remove relic orphaned comment This comment should have been deleted by commit 0ac087f1f3ae ("removed unused code") but somehow it is still here. There's no point to keep it. Backports commit c6f0d9f84c43ae973270df1a77482466558ee487 from qemu	2018-02-23 23:47:05 -05:00
Sergey Fedorov	1a768018c2	tcg: Remove needless CPUState::current_tb This field was used for telling cpu_interrupt() to unlink a chain of TBs being executed when it worked that way. Now, cpu_interrupt() don't do this anymore. So we don't need this field anymore. Backports commit 3213525f8ab48742db09dab18cb9ae6f36a6c921 from qemu	2018-02-23 23:45:42 -05:00
Sergey Fedorov	73c75b4cf7	cpu-exec: Move TB chaining into tb_find_fast() Move tb_add_jump() call and surrounding code from cpu_exec() into tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct chaining optimization details into tb_find_fast(). It also allows to move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer to tb_find_slow() which also manipulates the lock. Backports commit a0522c7a55cc8ac76d82884cf8e52f76daa664cc from qemu	2018-02-23 23:38:57 -05:00
Sergey Fedorov	ba9a237586	tcg: Rework tb_invalidated_flag 'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it was checked: * in cpu_exec() to ensure that the last executed TB can be safely linked to directly call the next one; * in cpu_exec_nocache() to decide if the original TB should be provided for further possible invalidation along with the temporarily generated TB. It is always safe to patch an invalidated TB since it is not going to be used anyway. It is also safe to call tb_phys_invalidate() for an already invalidated TB. Thus, setting this flag in tb_phys_invalidate() is simply unnecessary. Moreover, it can prevent from pretty proper linking of TBs, if any arbitrary TB has been invalidated. So just don't touch it in tb_phys_invalidate(). If this flag is only used to catch whether tb_flush() has been called then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using only 'true' and 'false' to set its value. Also, instead of setting it in tb_gen_code(), just after tb_flush() has been called, do it right inside of tb_flush(). In cpu_exec(), this flag is used to track if tb_flush() has been called and have made 'next_tb' (a reference to the last executed TB) invalid for linking it to directly call the next TB. tb_flush() can be called during the CPU execution loop from tb_gen_code(), during TB execution or by another thread while 'tb_lock' is released. Catch for translation buffer flush reliably by resetting this flag once before first TB lookup and each time we find it set before trying to add a direct jump. Don't touch in in tb_find_physical(). Each vCPU has its own execution loop in multithreaded mode and thus should have its own copy of the flag to be able to reset it with its own 'next_tb' and don't affect any other vCPU execution thread. So make this flag per-vCPU and move it to CPUState. In cpu_exec_nocache(), we only need to check if tb_flush() has been called from tb_gen_code() called by cpu_exec_nocache() itself. To do this reliably, preserve the old value of the flag, reset it before calling tb_gen_code(), check afterwards, and combine the saved value back to the flag. This patch is based on the patch "tcg: move tb_invalidated_flag to CPUState" from Paolo Bonzini <pbonzini@redhat.com>. Backports commit 6f789be56d3f38e9214dafcfab3bf9be7191f370 from qemu	2018-02-23 23:34:51 -05:00
Sergey Fedorov	c9700af2bd	tcg: Clean up from 'next_tb' The value returned from tcg_qemu_tb_exec() is the value passed to the corresponding tcg_gen_exit_tb() at translation time of the last TB attempted to execute. It is a little confusing to store it in a variable named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer and additional information in its two least significant bits. Break it down right away into two variables named 'last_tb' and 'tb_exit' which are a pointer to the last TB attempted to execute and the TB exit reason, correspondingly. This simplifies the code and improves its readability. Correct a misleading documentation comment for tcg_qemu_tb_exec() and fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in another couple of places. Backports commit 819af24b9c1e95e6576f1cefd32f4d6bf56dfa56 from qemu	2018-02-23 23:29:04 -05:00
Sergey Fedorov	73c59faad5	tcg: Clean up direct block chaining safety checks We don't take care of direct jumps when address mapping changes. Thus we must be sure to generate direct jumps so that they always keep valid even if address mapping changes. Luckily, we can only allow to execute a TB if it was generated from the pages which match with current mapping. Document tcg_gen_goto_tb() declaration and note the reason for destination PC limitations. Some targets with variable length instructions allow TB to straddle a page boundary. However, we make sure that both of TB pages match the current address mapping when looking up TBs. So it is safe to do direct jumps into the both pages. Correct the checks for some of those targets. Given that, we can safely patch a TB which spans two pages. Remove the unnecessary check in cpu_exec() and allow such TBs to be patched. Backports commit 5b053a4a28278bca606eeff7d1c0730df1b047e9 from qemu	2018-02-23 22:26:00 -05:00
Emilio G. Cota	170f6e0b3b	tb: consistently use uint32_t for tb->flags We are inconsistent with the type of tb->flags: usage varies loosely between int and uint64_t. Settle to uint32_t everywhere, which is superior to both: at least one target (aarch64) uses the most significant bit in the u32, and uint64_t is wasteful. Compile-tested for all targets. Backports commit 89fee74a0f066dfd73830a7b5fa137e87888c870 from qemu	2018-02-23 21:28:11 -05:00
Alex Bennée	3da7d9d9ae	qemu-log: dfilter-ise exec, out_asm, op and opt_op qemu-log: dfilter-ise exec, out_asm, op and opt_op This ensures the code generation debug code will honour -dfilter if set. For the "exec" tracing I've added a new inline macro for efficiency's sake. Backports commit d977e1c2dbc9e63454b2000f91954d02543bf43b from qemu	2018-02-22 10:06:19 -05:00
Peter Maydell	3f5e36e15f	qemu-log: Improve the exec TB execution logging Improve the TB execution logging so that it is easier to identify what is happening from trace logs: * move the "Trace" logging of executed TBs into cpu_tb_exec() so that it is emitted if and only if we actually execute a TB, and for consistency for the CPU state logging * log when we link two TBs together via tb_add_jump() * log when cpu_tb_exec() returns early from a chain of TBs The new style logging looks like this: Trace 0x7fb7cc822ca0 [ffffffc0000dce00] Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823420 [ffffffc000302688] Trace 0x7fb7cc8234a0 [ffffffc000302698] Trace 0x7fb7cc823520 [ffffffc0003026a4] Trace 0x7fb7cc823560 [ffffffc0000dce44] Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc822fd0 [ffffffc0000dd52c] Backports commit 1a830635229e14c403600167823ea6b3b79d3097 from qemu	2018-02-22 09:40:11 -05:00
Peter Maydell	293266a9d8	exec: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 7b31bbc2e68605ab2f10dc609dd54cf4c7b5f49a from qemu	2018-02-19 00:49:55 -05:00
Paolo Bonzini	3907ea1a3b	cpu-exec: Fix compiler warning (-Werror=clobbered) Reloading of local variables after sigsetjmp is only needed for some buggy compilers. The code which should reload these variables causes compiler warnings with gcc 4.7 when compiler optimizations are enabled: cpu-exec.c:204:15: error: variable ‘cpu’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:207:15: error: variable ‘cc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:202:28: error: argument ‘env’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Now this code is only used for compilers which need it (and gcc 4.5.x, x > 0 which does not need it but won't give warnings). There were bug reports for clang and gcc 4.5.0, while gcc 4.5.1 was reported to work fine without the reload code. For clang it is not clear which versions are affected, so simply keep the status quo for all clang compilations. This can be improved later. Backports commit 0448f5f8b816923b198ab6c32286fd1f3b2f3e45 from qemu	2018-02-17 15:24:15 -05:00
Richard Henderson	e9e8833da4	cpu-exec: Add nochain debug flag Respect it to avoid linking TBs together. Backports commit 89a82cd4b6a90fe117fa715e2abe51d5c607560c from qemu	2018-02-17 15:24:04 -05:00

1 2

83 Commits