Commit Graph

24 Commits

Author SHA1 Message Date
Richard Henderson
fbf91a6535
cpu: Replace ENV_GET_CPU with env_cpu
Now that we have both ArchCPU and CPUArchState, we can define
this generically instead of via macro in each target's cpu.h.

Backports commit 29a0af618ddd21f55df5753c3e16b0625f534b3c from qemu
2019-06-12 11:16:16 -04:00
Lioncash
5ab9723787
cputlb: Filter flushes on already clean tlbs
Especially for guests with large numbers of tlbs, like ARM or PPC,
we may well not use all of them in between flush operations.
Remember which tlbs have been used since the last flush, and
avoid any useless flushing.

Backports much of 3d1523ced6060cdfe9e768a814d064067ccabfe5 from qemu
along with a bunch of updating changes.
2019-06-10 20:42:15 -04:00
Richard Henderson
2a4a7b9391
tcg: Use tlb_fill probe from tlb_vaddr_to_host
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.

But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB. Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception. (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)

Backports commit 4811e9095c0491bc6f5450e5012c9c4796b9e59d from qemu
2019-05-16 18:27:03 -04:00
Richard Henderson
dab0061a0d
tcg: Use CPUClass::tlb_fill in cputlb.c
We can now use the CPUClass hook instead of a named function.

Create a static tlb_fill function to avoid other changes within
cputlb.c. This also isolates the asserts within. Remove the
named tlb_fill function from all of the targets.

Backports commit c319dc13579a92937bffe02ad2c9f1a550e73973 from qemu
2019-05-16 17:35:37 -04:00
Richard Henderson
9a02741c13
cputlb: Do unaligned store recursion to outermost function
This is less tricky than for loads, because we always fall
back to single byte stores to implement unaligned stores.

Backports commit 4601f8d10d7628bcaf2a8179af36e04b42879e91 from qemu
2019-05-14 07:45:15 -04:00
Richard Henderson
bcab6f1719
cputlb: Do unaligned load recursion to outermost function
If we attempt to recurse from load_helper back to load_helper,
even via intermediary, we do not get all of the constants
expanded away as desired.

But if we recurse back to the original helper (or a shim that
has a consistent function signature), the operands are folded
away as desired.

Backports commit 2dd926067867c2dd19e66d31a7990e8eea7258f6 from qemu
2019-05-14 07:43:31 -04:00
Richard Henderson
f12f36aebd
cputlb: Drop attribute flatten
Going to approach this problem via __attribute__((always_inline))
instead, but full conversion will take several steps.

Backports commit fc1bc777910dc14a3db4e2ad66f3e536effc297d from qemu
2019-05-14 07:33:39 -04:00
Richard Henderson
7991cd601f
cputlb: Move TLB_RECHECK handling into load/store_helper
Having this in io_readx/io_writex meant that we forgot to
re-compute index after tlb_fill. It also means we can use
the normal aligned memory load path. It also fixes a bug
in that we had cached a use of index across a tlb_fill.

Backports commit f1be36969de2fb9b6b64397db1098f115210fcd9 from qemu
2019-05-14 07:28:15 -04:00
Alex Bennée
ccee796272
accel/tcg: demacro cputlb
Instead of expanding a series of macros to generate the load/store
helpers we move stuff into common functions and rely on the compiler
to eliminate the dead code for each variant.

Backports commit eed5664238ea5317689cf32426d9318686b2b75c from qemu
2019-05-14 07:28:11 -04:00
Shahab Vahedi
7f59d62f4a
cputlb: Fix io_readx() to respect the access_type
This change adapts io_readx() to its input access_type. Currently
io_readx() treats any memory access as a read, although it has an
input argument "MMUAccessType access_type". This results in:

1) Calling the tlb_fill() only with MMU_DATA_LOAD
2) Considering only entry->addr_read as the tlb_addr

Buglink: https://bugs.launchpad.net/qemu/+bug/1825359

Backports commit ef5dae6805cce7b59d129d801bdc5db71bcbd60d from qemu
2019-04-30 10:11:11 -04:00
Lioncash
5daabe55a4
cputlb: Synchronize with qemu
Synchronizes the code with Qemu to reduce a few differences.
2019-04-26 15:48:45 -04:00
Emilio G. Cota
f31764dd5b
cputlb: update TLB entry/index after tlb_fill
We are failing to take into account that tlb_fill() can cause a
TLB resize, which renders prior TLB entry pointers/indices stale.
Fix it by re-doing the TLB entry lookups immediately after tlb_fill.

Fixes: 86e1eff8bc ("tcg: introduce dynamic TLB sizing", 2019-01-28)

Backports commit 6d967cb86d5b4a60ba15b497126b621ce9ca6609 from qemu
2019-02-12 11:48:48 -05:00
Thomas Huth
85bc48fecd
tcg: Fix LGPL version number
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.

Backports commit fb0343d5b4dd4b9b9e96e563d913a3e0c709fe4e from qemu
2019-02-03 17:55:28 -05:00
Peter Maydell
1301becdab
tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE
Add support for MMU protection regions that are smaller than
TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
pages with a flag TLB_RECHECK. This flag causes us to always
take the slow-path for accesses. In the slow path we can then
special case them to always call tlb_fill() again, so we have
the correct information for the exact address being accessed.

This change allows us to handle reading and writing from small
regions; we cannot deal with execution from the small region.

Backports commit 55df6fcf5476b44bc1b95554e686ab3e91d725c5 from qemu
2018-11-16 21:35:54 -05:00
Lioncash
3a0ab1a64a
Partial backport of: exec.c: Handle IOMMUs in address_space_translate_for_iotlb()
We just want the parameter changes here.

Partial backport of commit 1f871c5e6b0f30644a60a81a6a7aadb3afb030ac from
qemu
2018-11-16 21:24:55 -05:00
Emilio G. Cota
1677898a09
cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% )
31,574,905,303 cycles # 4.217 GHz ( +- 0.12% )
57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% )
10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% )
173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% )

7.504481349 seconds time elapsed ( +- 0.14% )

- After:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
31,478,476,520 cycles # 4.218 GHz ( +- 0.07% )
57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% )
10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% )
173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% )

7.474970463 seconds time elapsed ( +- 0.07% )

2. SPEC06int:
SPEC06int (test set)
[Y axis: Speedup over master]
1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
| |
1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+
| +++ | +++ tlb-lock-v3 (spinl|ck) |
| +++ | | +++ +++ | | |
1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
| ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### |
1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
| *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # |
0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
| * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # |
| * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # |
0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
| * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # |
0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
| * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
| * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Backports commit 403f290c0603f35f2d09c982bf5549b6d0803ec1 from qemu
2018-10-23 15:37:43 -04:00
Richard Henderson
d74e00a30a
tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.

Backports commit e6cd4bb59b8154fa00da611200beef7eb4e8ec56 from qemu
2018-10-23 15:17:39 -04:00
Richard Henderson
c911ea7128
tcg: Add tlb_index and tlb_entry helpers
Isolate the computation of an index from an address into a
helper before we change that function.

Backports commit 383beda9cf32f795616c3b93f7d6154d70372d4b from qemu
2018-10-23 15:04:27 -04:00
Emilio G. Cota
dfb3954571
exec: introduce tlb_init
Paves the way for the addition of a per-TLB lock.

Backports commit 5005e2537d090bee87aca3b924dcd17920fd146a from qemu
2018-10-23 14:41:29 -04:00
Peter Maydell
0c6311f8cc
accel/tcg: Correct "is this a TLB miss" check in get_page_addr_code()
In commit 71b9a45330fe220d1 we changed the condition we use
to determine whether we need to refill the TLB in
get_page_addr_code() to
if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
(addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {

This isn't the right check (it will falsely fail if the
input addr happens to have the low bit corresponding to
TLB_INVALID_MASK set, for instance). Replace it with a
use of the new tlb_hit() function, which is the correct test.

Backports commit e4c967a7201400d7f76e5847d5b4c4ac9e2566e0 from qemu
2018-07-03 19:23:25 -04:00
Peter Maydell
6543f9ea26
tcg: Define and use new tlb_hit() and tlb_hit_page() functions
The condition to check whether an address has hit against a particular
TLB entry is not completely trivial. We do this in various places, and
in fact in one place (get_page_addr_code()) we have got the condition
wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline
functions (one for a known-page-aligned address and one for an
arbitrary address), and use them in all the places where we had the
condition correct.

This is a no-behaviour-change patch; we leave fixing the buggy
code in get_page_addr_code() to a subsequent patch

Backports commit 334692bce7f0653a93b8d84ecde8c847b08dec38 from qemu
2018-07-03 19:21:36 -04:00
Peter Maydell
61a7ac6948
cpu-defs.h: Document CPUIOTLBEntry 'addr' field
The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious
use; add a comment documenting it (reverse-engineered from what
the code that sets it is doing).

Backports commit ace4109011b4912b24e76f152e2cf010e78819c5 from qemu
2018-06-15 12:07:39 -04:00
Peter Maydell
7a6ae26346
cputlb: Pass cpu_transaction_failed() the correct physaddr
The API for cpu_transaction_failed() says that it takes the physical
address for the failed transaction. However we were actually passing
it the offset within the target MemoryRegion. We don't currently
have any target CPU implementations of this hook that require the
physical address; fix this bug so we don't get confused if we ever
do add one.

Backports commit 2d54f19401bc54b3b56d1cc44c96e4087b604b97 from qemu
2018-06-15 12:03:23 -04:00
Lioncash
035f1afa7d
tcg: move tcg backend files into accel/tcg/
move tcg-runtime.c, translate-all.(ch) and translate-common.c into
accel/tcg/ subdirectory and updated related trace-events file.

Backports commit 244f144134d0dd182f1af8654e7f9a79fe770368 and applies
relevant changes made in db432672dc50ed86dda17ac821b7eb07411a90af and
d9bb58e51068dfc48746c6af0179926c8dc05bce from qemu
2018-03-13 11:48:15 -04:00