unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-10-20 20:38:19 +02:00

History

Emilio G. Cota 3dc16ebca3 target-i386: remove helper_lock() It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 +Emaster +-N--+ ++ \|\| \| \|++ \| \|\| \| 15 +++ ++ \|N\| \| \|+\| \| 10 ++\| ++ \|+\|+ \| \| \| -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E\| \|+E+E+- +++ +E+------+E+-- \| 5 ++\|+ ++ \|+N+H+--- +++ \| ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- \| 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 ++master +-N--+ ++ \|E\| \| \|++ \| \|\|E \| 15 ++\| ++ \|N\|\| \| \|+\|\| ---+E+------+E+-----+E+------+E\| 10 ++\| \| ---+E+------+E+-----+E+--- +++ +++ \|\|H+E+--+E+-- \| \|+++++ \| \| \|\| \| 5 ++\|+H+-- +++ ++ \|+N+ - ---+H+------+H+------ \| + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H\| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| 35 +cmpxchg +-H--+ ++ \| master +-N--+ ---+E+------+E+------+E+-----+E+------+E\| 30 ++\| ---+E+-- +++ ++ \| \| -+E+--- \| 25 ++E ---- +++ ++ \|+++++ -+E+ \| 20 +E+ E-- +++ ++ \|H\|+++ \| \|+\| +H+------- \| 15 ++H+ ---+++ +H+------ ++ \|N++H+-- +++--- +H+------++\| 10 ++ +++ - +++ ---+H+ +++ +H+ \| \| +H+-----+H+------+H+-- \| 5 ++\| +++ ++ ++N+N+--+N++ + + + + + \| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 140 +cmpxchg +-H--+ +++ +++ ++ \| master +-N--+ E--------E------+E+------++\| 120 ++ --\| \| +++ E+ \| -- +++ +++ ++\| 100 ++ - ++ \| +++- +++ ++\| 80 ++ -+E+ -+H+------+H+------H--------++ \| ---- ---- +++ H\| \| ---+E+-----+E+- ---+H+ ++\| 60 ++ +E+--- +++ ---+H+--- ++ \| --+++ ---+H+-- \| 40 ++ +E+-+H+--- ++ \| +H+ \| 20 +EE+ ++ +N+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 300 +cmpxchg +-H--+ +++ \| master +-N--+ +++ \|\| \| +++ \| ----E\| 250 ++ \| ----E---- ++ \| ----E--- \| ---+H\| 200 ++ -+E+--- +++ ---+H+--- ++ \| ---- -+H+-- \| \| +E+ +++ ---- +++ \| 150 ++ ---+++ ---+H+- ++ \| --- -+H+-- \| 100 ++ ---+E+ ---- +++ ++ \| +++ ---+E+-----+H+- \| \| -+E+------+H+-- \| 50 ++ +E+ ++ +EE+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu		2018-02-27 23:43:22 -05:00
..
arch_memory_mapping.c
bpt_helper.c	cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc()	2018-02-24 17:25:28 -05:00
cc_helper_template.h
cc_helper.c	target-i386: Perform set/reset_inhibit_irq inline	2018-02-20 13:34:47 -05:00
cpu-qom.h	target-i386: List CPU models using subclass list	2018-02-26 08:17:04 -05:00
cpu.c	target-i386: Don't use cpu->migratable when filtering features	2018-02-26 09:51:14 -05:00
cpu.h	target-i386: Move xsave component mask to features array	2018-02-26 04:45:35 -05:00
excp_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2018-02-24 02:39:08 -05:00
fpu_helper.c	target-i386: Use struct X86XSaveArea in fpu_helper.c	2018-02-26 03:38:53 -05:00
helper.c	cpus: pass CPUState to run_on_cpu helpers	2018-02-26 04:54:55 -05:00
helper.h	target-i386: remove helper_lock()	2018-02-27 23:43:22 -05:00
int_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2018-02-24 02:39:08 -05:00
Makefile.objs	target-i386: Enable control registers for MPX	2018-02-20 13:27:46 -05:00
mem_helper.c	target-i386: remove helper_lock()	2018-02-27 23:43:22 -05:00
misc_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2018-02-24 02:39:08 -05:00
mpx_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2018-02-24 02:39:08 -05:00
ops_sse_header.h
ops_sse.h
seg_helper.c	target-i386: Fixed syscall posssible segfault	2018-02-26 02:36:09 -05:00
shift_helper_template.h
smm_helper.c	target-i386: Include log.h in smm_helper	2018-02-24 03:06:07 -05:00
svm_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2018-02-24 02:39:08 -05:00
svm.h	Clean up ill-advised or unusual header guards	2018-02-25 04:22:46 -05:00
TODO
topology.h	pc: Add x86_topo_ids_from_apicid()	2018-02-25 20:31:36 -05:00
translate.c	target-i386: remove helper_lock()	2018-02-27 23:43:22 -05:00
unicorn.c	qemu-common: push cpu.h inclusion out of qemu-common.h	2018-02-24 01:50:56 -05:00
unicorn.h