This will make easier to add future counter-measures in a single place.
In practice this change means that:
- compute_public_key() now uses projective coordinate randomisation, which it
should as this is a protection against Template Attacks for example.
- mult_safer() now checks that the result is not the point at infinity, which
it can as the result is indeed never expected to be that
In the previous version, it was enough for the attacker to glitch the
top-level 'if' to skip the entire block. We want two independent blocks here,
so that an attacker can only succeed with two successive glitches.
Before this commit, if a certificate only had one issue (for example, if the
"untrusted" bit was the only set in flags), an attacker that could flip this
single bit between the moment it's set and the moment flags are checked before
returning from mbedtls_x509_crt_verify() could make the entire verification
routine appear to succeed (return 0 with no bit set in flags).
Avoid that by making sure that flags always has either 0 or at least 9 bits
set during the execution of the function. However, to preserve the API, clear
the 8 extra bits before returning. This doesn't open the door to other
attacks, as fortunately the API already had redundancy: either both flags and
the return value are 0, or flags has bits set and the return value is non-zero
with at least 16 bits set (assuming 32-bit 2-complement ints).
If signature_is_good is 0 (invalid) of 1 (valid), then it's all too easy for
an active physical attacker to turn invalid into valid by flipping a single
bit in RAM, on the bus or in a CPU register.
Use a special value to represent "valid" that can't easily be reached by
flipping a few bits.
x509_crt_check_signature() directly returns the return value of
pk_verify_xxx() without looking at it, so nothing to do here. But its caller
compares the value to 0, which ought to be double-checked.
Inspection of the generated assembly showed that before this commit, armcc 5
was optimizing away the successive reads to the volatile local variable that's
used for double-checks. Inspection also reveals that inserting a call to an
external function is enough to prevent it from doing that.
The tested versions of ARM-GCC, Clang and Armcc 6 (aka armclang) all keep the
double read, with our without a call to an external function in the middle.
The inserted function can also be changed to insert a random delay if
desired in the future, as it is appropriately places between the reads.
This can be used by Mbed TLS functions in any module to signal that a fault
attack is likely happening, so this can be appropriately handled by the
application (report, fall back to safer mode or even halt, etc.)
This hardens against attacks that glitch the conditional branch by making it
necessary for the attacker to inject two consecutive faults instead of one. If
desired, we could insert a random delay in order to further protect against
double-glitch attacks.
Also, when a single glitch is detected we report it.
Previously it was returning 0 or 1, so flipping a single bit in the return
value reversed its meaning. Now it's returning the diff itself.
This is safe because in the two places it's used (signature verification and
point validation), invalid values will have a large number of bits differing
from the expected value, so diff will have a large Hamming weight.
An alternative would be to return for example -!(diff == 0), but the
comparison itself is prone to attacks (glitching the appropriate flag in the
CPU flags register, or the conditional branch if the comparison uses one). So
we'd need to protect the comparison, and it's simpler to just skip it and
return diff itself.
This is a first step in protecting against fault injection attacks: the
attacker can no longer change failure into success by flipping a single bit.
Additional steps are needed to prevent other attacks (instruction skip etc)
and will be the object of future commits.
The return value of uECC_vli_equal() should be protected as well, which will
be done in a future commit as well.
Currently functions that may return success or failure tend to do so by
returning 0 or 1. If an active physical attacker can flip a bit in memory or
registers at the right time, they may easily change a failure value into a
success value, with potentially catastrophic security consequences.
As typical attackers can only flip a few bits, an element of protection
against such attacks is to ensure a sufficient Hamming distance between
failure values and the success value. This commit introduces such values,
which will put to use in critical functions in future commits.
In addition to SUCCESS and FAILURE, a third value ATTACK_DETECTED is
introduced, which can be used later when suspicious-looking events are noticed
(static data changed when it shouldn't, double condition checking returning
inconsistent results, etc.).
Values are chosen so that Hamming distances are large, and that no value is
the complement of another, in order to avoid unwanted compiler optimisations.
Note: the error values used by Mbed TLS are already safe (assuming 32-bit
integers) as they are of the form -x with x in the range [1, 2^15) so their
Hamming distance with the success value (0) is at least 17, so it's hard for
an attacker to turn an error value into the success value (or vice-versa).
This is a temporary work-around for an integration issue.
A future task will re-integrate randomness into these functions are their
entire point is to be randomized; this is really just temporary.
This avoids the need for each calling site to manually regularize the scalar
and randomize coordinates, which makes for simpler safe use and saves 50 bytes
of code size in the library.
Even though this is type name is purely internal to a single C file, let's
reduce the potential for clashes with other wait state types which might be
added elsewhere in the library and become visible here (for example through
platform_util.h).
Previous size was 3584 bytes which is not acceptable on constrained systems
(especially on the stack). This was a misguided attempt at minimizing the
number of calls to the RNG function in order to minimize impact on
performance, but clearly this does not justify using that much RAM and a
compromise had to be found.
While at it, loose the 'curve' argument in internal randomized functions, for
the same reasons we lost 'num_words' in uECC_vli_mult_rnd(): we only have one
curve so we don't need this, and hardcoding it saves a bit of code size and
speed, which is welcome to slightly reduce the impact of the counter-measure
on both of them.
This is a counter-measure to make horizontal attacks harder. Horizontal
attacks work with a single trace by noticing when intermediate computations
within that trace happen on the same operands.
We'll try to make that harder for an attacker to achieve that by introducing
random delays based on extra computation and extra random accesses to input in
the multi-precision multiplication (which is the dominant operation and the target of
horizontal attacks known so far). This should make it hard for the attacker to
compare two multiplications.
This first commit introduces the new function for multiplication with random
delay - future commits will ensure it is used all the way up to the top-level
scalar multiplication routine.
Why: this protects against potential side-channels attacks. This
counter-measure is for example effective against Template SPA. Also, the
bignum arithmetic as implemented in TinyCrypt isn't entirely regular, which
could in principle be exploited by an attacker; randomizing the coordinates
makes this less likely to happen.
Randomizing projective coordinates is also a well-known countermeasure to DPA.
In the context of the scalar multiplication in ECDSA, DPA isn't a concern
since it requires multiple measurements with various base points and the same
scalar, and the scalar mult in ECDSA is the opposite: the base point's always
the same and the scalar is always unique. But we want protection against the
other attacks as well.
How: we use the same code fragment as in uECC_shared_secret in ecc_dh.c,
adapted as follows: (1) replace p2 with k2 as that's how it's called in this
function; (2) adjust how errors are handled.
The code might not be immediately clear so here are a few more details:
regularize_k() takes two arrays as outputs, and the return value says which one
should be passed to ECCPoint_mult(). The other one is free for us to re-use to
generate a random number to be used as the initial Z value for randomizing
coordinates (otherwise the initial Z value is 1), thus avoiding the use of an
extra stack buffer.