unicorn/qemu/include/exec
Emilio G. Cota 1677898a09
cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% )
31,574,905,303 cycles # 4.217 GHz ( +- 0.12% )
57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% )
10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% )
173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% )

7.504481349 seconds time elapsed ( +- 0.14% )

- After:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
31,478,476,520 cycles # 4.218 GHz ( +- 0.07% )
57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% )
10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% )
173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% )

7.474970463 seconds time elapsed ( +- 0.07% )

2. SPEC06int:
SPEC06int (test set)
[Y axis: Speedup over master]
1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
| |
1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+
| +++ | +++ tlb-lock-v3 (spinl|ck) |
| +++ | | +++ +++ | | |
1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
| ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### |
1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
| *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # |
0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
| * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # |
| * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # |
0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
| * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # |
0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
| * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
| * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Backports commit 403f290c0603f35f2d09c982bf5549b6d0803ec1 from qemu
2018-10-23 15:37:43 -04:00
..
address-spaces.h Clean up header guards that don't match their file name 2018-02-25 04:18:42 -05:00
cpu_ldst_template.h cputlb: read CPUTLBEntry.addr_write atomically 2018-10-23 15:37:43 -04:00
cpu_ldst.h cputlb: read CPUTLBEntry.addr_write atomically 2018-10-23 15:37:43 -04:00
cpu-all.h tcg: Define and use new tlb_hit() and tlb_hit_page() functions 2018-07-03 19:21:36 -04:00
cpu-common.h cpu: Introduce a wrapper for tlb_flush() that can be used in common code 2018-03-03 21:24:55 -05:00
cpu-defs.h cpu-defs.h: Document CPUIOTLBEntry 'addr' field 2018-06-15 12:07:39 -04:00
cputlb.h exec: Drop unnecessary code for unicorn 2018-03-12 10:11:46 -04:00
exec-all.h exec: introduce tlb_init 2018-10-23 14:41:29 -04:00
gen-icount.h tcg: Pass tb and index to tcg_gen_exit_tb separately 2018-06-07 11:56:32 -04:00
helper-gen.h target/arm: Implement SVE Integer Multiply-Add Group 2018-05-20 04:35:36 -04:00
helper-head.h tcg: Fix helper function vs host abi for float16 2018-06-02 10:10:12 -04:00
helper-proto.h tcg: Allow 6 arguments to TCG helpers 2018-03-17 18:29:04 -04:00
helper-tcg.h tcg: Allow 6 arguments to TCG helpers 2018-03-17 18:29:04 -04:00
hwaddr.h qemu-common: push cpu.h inclusion out of qemu-common.h 2018-02-24 01:50:56 -05:00
ioport.h hw: remove pio_addr_t 2018-02-24 02:43:16 -05:00
memattrs.h memory.h: Move MemTxResult type to memattrs.h 2018-03-04 13:10:47 -05:00
memory-internal.h memory: Rename mem_begin/mem_commit/mem_add helpers 2018-03-11 21:36:50 -04:00
memory.h memory: Remove old_mmio accessors 2018-10-04 04:45:30 -04:00
ram_addr.h exec: Drop unnecessary code for unicorn 2018-03-12 10:11:46 -04:00
ramlist.h memory: RCU ram_list.dirty_memory[] for safe RAM hotplug 2018-02-22 15:38:03 -05:00
semihost.h exec: Add semihosting stubs 2018-02-17 15:23:33 -05:00
tb-context.h tcg: allocate TB structs before the corresponding translated code 2018-03-03 17:05:49 -05:00
tb-hash-xx.h Clean up ill-advised or unusual header guards 2018-02-25 04:22:46 -05:00
tb-hash.h tb-hash: improve tb_jmp_cache hash function in user mode 2018-03-03 14:11:29 -05:00
tb-lookup.h exec-all: bring tb->invalid into tb->cflags 2018-03-05 02:46:21 -05:00
translator.h translator: merge max_insns into DisasContextBase 2018-05-11 13:59:17 -04:00