From Hanno:
When a server replies to a cookieless ClientHello with a HelloVerifyRequest,
it is supposed to reset the connection and wait for a subsequent ClientHello
which includes the cookie from the HelloVerifyRequest.
In testing environments, it might happen that the reset of the server
takes longer than for the client to replying to the HelloVerifyRequest
with the ClientHello+Cookie. In this case, the ClientHello gets lost
and the client will need retransmit. This may happen even if the underlying
datagram transport is reliable.
We previously observed random-looking failures from this test. I think they
were caused by a race condition where the client tries to reconnect while the
server is still closing the connection and has not yet returned to an
accepting state. In that case, the server would fail to see and reply to the
ClientHello, and the client would have to resend it.
I believe logs of failing runs are compatible with this interpretation:
- the proxy logs show the new ClientHello and the server's closing Alert are
sent the same millisecond.
- the client logs show the server's closing Alert is received after the new
handshake has been started (discarding message from wrong epoch).
The attempted fix is for the client to wait a bit before reconnecting, which
should vastly enhance the probability of the server reaching its accepting
state before the client tries to reconnect. The value of 1 second is arbitrary
but should be more than enough even on loaded machines.
The test was run locally 100 times in a row on a slightly loaded machine (an
instance of all.sh running in parallel) without any failure after this fix.
Depends on the current transform, which might change when retransmitting a
flight containing a Finished message, so compute it only after the transform
is swapped.
Use the same values as other 3d tests: this makes the test hopefully a bit
faster than the default values, while not increasing the failure rate.
While at it:
- adjust "needs_more_time" setting for 3d interop tests (we can't set the
timeout values for other implementations, so the test might be slow)
- fix some supposedly DTLS 1.0 test that were using dtls1_2 on the command
line
This setting belongs to the individual connection, not to a configuration
shared by many connections. (If a default value is desired, that can be handled
by the application code that calls mbedtls_ssl_set_mtu().)
There are at least two ways in which this matters:
- per-connection settings can be adjusted if MTU estimates become available
during the lifetime of the connection
- it is at least conceivable that a server might recognize restricted clients
based on range of IPs and immediately set a lower MTU for them. This is much
easier to do with a per-connection setting than by maintaining multiple
near-duplicated ssl_config objects that differ only by the MTU setting.
Adds a requirement for GNUTLS_NEXT (3.5.3 or above, in practice we should
install 3.6.3) on the CI.
See internal ref IOTSSL-2401 for analysis of the bugs and their impact on the
tests.
For now, just check that it causes us to fragment. More tests are coming in
follow-up commits to ensure we respect the exact value set, including when
renegotiating.
Note: no interop tests in ssl-opt.sh for now, as some of them make us run into
bugs in (the CI's default versions of) OpenSSL and GnuTLS, so interop tests
will be added later once the situation is clarified. <- TODO
This will allow fragmentation to always happen in the same place, always from
a buffer distinct from ssl->out_msg, and with the same way of resuming after
returning WANT_WRITE
- take advantage of the fact that we're only called for first send
- put all sanity checks at the top
- rename and constify shortcut variables
- improve comments