Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge BoringSSL through d605df5b6f8462c1f3005da82d718ec067f46b70 #1659

Merged
merged 74 commits into from
Sep 29, 2023

Conversation

briansmith
Copy link
Owner

No description provided.

davidben and others added 30 commits May 3, 2023 16:49
This isn't captured by the comments, the ABI tests have technically been
going out of bounds, and is entirely unnecessary. It can just take
Htable as a parameter.

Change-Id: Iad748d5b649333985ebaa1f84031fbe9a2339a85
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59505
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
…4.pl

This is a little trickier because Intel architectures are so
inconveniently register-starved. This code was already using every
register. However, since Xi is only needed at the start and end of the
function, I just swapped the Xi and Htable parameters. Xi is passed on
the stack, so we don't need to explicitly spill it.

Change-Id: I2ef4552fc181a5350c9b1c733cf2319377a06b74
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59525
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
This was originally removed in
https://boringssl-review.googlesource.com/12477, but restored in
https://boringssl-review.googlesource.com/c/boringssl/+/13122, which
also restored a comment the assembly code secretly relies on the struct
layout.

Our comment references the MOVBE-based assembly, which could mean either
the stitched AES+GCM code, or the GHASH-only code. I don't see an
obvious place where the GHASH-only code does this. The stitched ones
(both x86_64 and aarch64, counter to the comment) did this, but the
preceding CLs fix this. I think we can now remove this comment.

In OpenSSL, this comment dates to
d8d958323bb116bf9f88137ba46948dcb1691a77 in upstream. That commit also
removed a field, so we can assume no assembly prior to that change
relied on that layout.

That commit immediately preceded
1e86318091817459351a19e72f7d12959e9ec570, which was likely the
motivation. At the time, gcm_gmult_neon had some code with a comment
"point at H in GCM128_CTX". At the time, Htable wasn't even filled in,
and the function relied on H being 16 bytes before Htable.
gcm_ghash_neon also relies on them being reachable from Xi.

This code changed in f8cee9d08181f9e966ef01d3b69ba78b6cb7c8a8 and, at a
glance, the file doesn't seem to be making this assumption anymore. I
think, prior to that change, Htable wasn't filled in for the NEON code
at all.

With all this now resolved, remove the comment and unused copy of H in
GCM128_KEY.

Change-Id: I4eb16cfff5dd41a59a0231a998d5502057336215
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59526
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
This can be done with just memcpy, which tempts the compiler slightly
less.

Bug: 574
Change-Id: I7b46c2f2578abc85621834426de20d5eaf492a74
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59527
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
The initial codepoint is called X25519Kyber786Draft00 in the draft, so
align with that name for this version. Also remove the placeholder bits
for the other combinations, which haven't gotten that far yet.

Update-Note: Update references to NID_X25519Kyber768 to
NID_X25519Kyber768Draft00. For now, the old name is available as an
alias.

Change-Id: I2e531947f41e589cec61607944dca844722f0947
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59605
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Fixed: 605
Change-Id: Id2b7d0221c3e43c102ce77c800f7ab65c74e0582
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59625
Commit-Queue: Adam Langley <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
EC_RAW_POINT is a confusing name. It's mostly about whether this is
stack-allocated EC_POINT without the EC_GROUP pointer. Now that we have
EC_AFFINE, EC_JACOBIAN captures what it's doing a bit better.

Fixed: 326
Change-Id: I5b71a387e899a94c79be8cd5e0b54b8432f7d5da
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59565
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
This was a bit of a mess. There are three assembly functions to juggle
here. Their current type signatures are:

 void gcm_init_v8(u128 Htable[16], const uint64_t H[2]);
 void gcm_gmult_v8(uint64_t Xi[2], const u128 Htable[16]);
 void gcm_ghash_v8(uint64_t Xi[2], const u128 Htable[16], const uint8_t *inp,
                   size_t len);

Except for gcm_nohw.c, this is all assembly, so they don't follow the C
abstract machine's theory of typed memory. That means types are mostly
arbitrary and we have room to rearrange them. They do carry an implicit
alignment requirement, but none of these assembly files care about
this[*].

Values passed to gcm_gmult and gcm_ghash get XORed byte-by-byte in
places, which is inconvenient to do as uint64_t. They also get passed to
AES functions, which want bytes. Thus I think uint8_t[16] is the most
natural and convenient type to use.

H in gcm_init is interesting. gcm_init already doesn't take a GHASH key
in the natural byte representation. The two 8-byte halves are
byte-swapped, but the halves are not swapped, so it's not quite a byte
reversal. I opted to leave that as uint64_t[2], mostly to capture that
something odd is happening here.

[*] We only have GHASH assembly for x86, x86_64, armv7, and aarch64. We
used to have armv4 GHASH assembly, but that's been removed from
gcm_nohw.c. Thus we can assume none of these files care about alignment
for plain scalar loads. Alignment does matter for vmovdqa vs vmovdqu,
but that requires 16-byte alignment and uint64_t only implies 4- or
8-byte alignment on these architectures.

Bug: 574
Change-Id: If7dba9b41ff62204f4cf8fcd54eb4a4c54214c6e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59528
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
We cap e in RSA for DoS reasons. draft-amjad-cfrg-partially-blind-rsa
needs to create RSA keys with very large e. To support this, add an API
which disables this check.

Also add some missing checks for negative n and negative e. (Already
rejected by the parser, just not at this layer.)

Change-Id: Ia996bb1b46fc8b73db704f492b3df72b254a15a4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59645
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
This generalizes the scheme we previously had with
TLS_RSA_WITH_NULL_SHA, in preparation for TLS_RSA_WITH_3DES_EDE_CBC_SHA
(to be deprecated in a later CL) and
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (to regretably be added back in,
but in a deprecated state).

The story is that deprecated ciphers can always be listed by name, but
by default, selectors will not match them when adding ciphers. They will
still match them when removing ciphers. This is so instructions like
"@strength" or "!RSA" will still sort or disable the deprecated ciphers,
rather than accidentally leaving them at the front or enabled.

Additionally, a selector can mark itself as including deprecated
ciphers. This is specifically for TLS_RSA_WITH_3DES_EDE_CBC_SHA, because
many callers reference it with just "3DES". As an added quirk,
"RSA+3DES" will also match it. (The rule is that, if any selector matches
deprecated ciphers, we'll allow the overall expression to match it. This
is slightly weird, but keeps "RSA+3DES" working.)

Note, in this CL, 3DES is not actually deprecated. This just sets up the
machinery and doesn't do anything with it. But the blockers for
deprecating that should hopefully be resolved soon.

Bug: 599
Change-Id: I7212bdc879b0e49c6742025644f3100026f24228
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59646
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
rsa_test.cc and a few of the tests in evp_tests.txt still do, but
refresh the easy one. I switched it to a P-256 key to keep the input
small.

Bug: 607
Change-Id: Ie0614280d12012bbd928f095eb4531e4d7550ae1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59666
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
This TLS 1.2 algorithm is substantially inferior to AES-GCM and should
never be used. It will not be available unless configured by name.
However, in can be used to provide backwards-compatibility with devices
that cannot be updated if so needed.

Change-Id: I1fd78efeb33aceca76ec2e7cb76b70f761ed1af8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59585
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
Auto-Submit: Adam Langley <[email protected]>
Envoy needs to have the possible cipher, etc., strings predeclared to
reduce synchronization needs in the steady state. It currently does this
by (1) iterating over SSL_CTX_get_ciphers at SSL_CTX creation time and
(2) hard-coding a lists of known TLS 1.3 ciphers, TLS versions,
NamedGroups, etc.

(1) would work for some applications, but it breaks any applications
that configure ciphers on the SSL on a certificate callback, etc. If the
callback configures a cipher that wasn't configured on the SSL_CTX (e.g.
if the SSL_CTX were left at defaults), Envoy's logging breaks and we hit
an ENVOY_BUG assertion.

(2) breaks whenever BoringSSL adds a new feature. In principle, we could
update Envoy when updating BoringSSL, but this is an unresasonable
development overhead for just one of many BoringSSL consumers to impose.
Such costs are particularly high when considering needing to coordinate
updates to Envoy and BoringSSL across different repositories.

Add APIs to enumerate the possible strings these functions can return.
These string lists are a superset of those that any one application may
care about (e.g. we may have a deprecated cipher that Envoy no longer
needs, or an experimental cipher that's not yet ready for Envoy's
stability goals), but this is fine provided this is just used to
initialize the table. In particular, they are *not* intended to
enumerate supported features.

Bump BORINGSSL_API_VERSION to aid in patching these into Envoy.

Bug: b:280350955
Change-Id: I4d11db980eebed5620d3657778c09dbec004653c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59667
Commit-Queue: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
This aligns with google/oss-policies-info#8 and
grpc/grpc#32614. VS2019 adds a C11 mode, which
is useful for us, because it means stdalign.h works correctly.

Also bump the minimum Windows SDK to
https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/.
If you have a new MSVC, CMake will enable C11 mode by default. But if
C11 mode is enabled but your Windows SDK is too old, things break.

After this change, the CI will include some redundant configurations.
All the VS2017 configurations will start testing on VS2019, so the
VS2019-specific configurations won't do anything. I'll follow this up
with a change to bump those to VS2022, where we're currently missing
coverage.

Update-Note: BoringSSL now requires VS2019 or later and no longer
supports VS2017. VS2017 has been past its "mainstream end date" for over
a year now, per
https://learn.microsoft.com/en-us/lifecycle/products/visual-studio-2017

Change-Id: I3f359e8ea7c9428ddaa9fcc4ffead2ef903398be
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59665
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Bug: 574
Change-Id: Ief8030a4c9257a05defdc2dc02d496c63f06626a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59545
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Includes bits by me and Robert Nagy <[email protected]> who has a
google CLA.

Update-Note: Additionally, BoringSSL now requires macOS 10.12 or later
for getentropy support. This is consistent with
https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md

WANT_LGTM=all

Change-Id: I5ab74fa8a6677fac29c316aa29a954df401ba647
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59225
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Change-Id: I1a4914cc4859ed1e0797a046a1abec8921c4a9cf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59746
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
It's been a couple years since we did that.

I had to upstream a fix to CreateArgvFromArgs, but that landed quickly,
so we're actually carrying no googletest patches with this.

Change-Id: Ifaf3476f2079d444512a235756cfe54f492b9c92
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59745
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
This causes avcptool to send requests without blocking on responses. See
the diff in ACVP.md for details of how to use this feature.

Change-Id: I922b3bd2383cb7d22a5d12ead49d2fa733ee1b97
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55345
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
Trying to fix all the places where these formats go quadratic isn't a
good use of time. We've already documented that they're not safe for use
with untrusted inputs. Even without such DoS issues, they cannot be
safely used anyway. (E.g. RUSTSEC-2023-0023.)

Just cap the fuzzer input. It'd be nice if we could avoid this more
systematically in the function, but they're not structured to make this
easy to do, and anyone concerned about DoS in this function has worse
problems.

Bug: chromium:1444420, oss-fuzz:56048, 611
Change-Id: I53eeb346f59278ec2db3aac4a92573b927ed8003
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59785
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Saves some duplicated logic.

Change-Id: I202fa92a88101f9ad735648bc414ab05752641da
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59685
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
This is inspired somewhat from how https://github.com/google/benchmark's
threaded benchmark support works. (It seems to spawn a bunch of threads,
latch them all together, and then run.)

This adds a TimeFunctionParallel which runs multiple copies of the
benchmark in parallel, after waiting for all the threads to synchronize.
Some functions had to be tweaked so they don't write to a single, shared
output buffer.

This probably could use some improvement. In playing with it, the
numbers are pretty unstable. We also don't currently benchmark anything
that captures EVP's internal refcounts. But hopefully it's enough to get
a start. I am able to measure impacts from the PRNG locks at least.

Bug: 570
Change-Id: I92c29a05ba082fc45701afd6f0effe23f7b148bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59845
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Change-Id: Ibcc3faa4c3e03713a0c8f6dc24119e4579a5b4e4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59025
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
RSATest.MissingParameters tests this case a bit more extensively.

Change-Id: I61e38bd139c6334ff9d5c9636a159cb20bcd7f7b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59825
Commit-Queue: Adam Langley <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
We currently don't enforce rsa_check_public_key invariants on private
key operations, only public key operations and RSA_check_key. This
means it was actually possible, in some corner cases, to perform
operations with oversized e or n. Fix this.

This gets us a bit closer to aligning the mess of RSA invariants.
(RSA_check_key still performs some extra checks, but most of those
should be redundant with the CRT self-check.)

Update-Note: Manually constructed RSA private keys with invalid n or e
will now fail private key operations. Such keys would always fail at
public key operations (so the signatures would never verify). They also
already failed RSA_check_key and parsing.

The one incompatibility of note is keys with only n and d, constructed
by reaching into the internal RSA struct, no longer work. Instead, use
RSA_new_private_key_no_e. Conscrypt used to do this but has since been
migrated to the new API.

Bug: 316
Change-Id: I062fdad924b8698e257dab9760687e4b381c970d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59826
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
512-bit RSA was factored in 1999, so this limit barely means anything.
But establish some limit now to ratchet in what we can. We'll raise this
limit as we clear through further rounds of bad keys in tests.

As part of this, I've touched up rsa_test.cc a bit. All the functions
that made assumptions on key size now use std::vector with RSA_size.
kKey1 and kKey2 were also 512- and 400-bit RSA, respectively. In
principle, we could keep kKey1 for now, but the next stage will break it
anyway. I've replaced them with kFIPSKey (which was "FIPS-compliant" but
actually 1024-bit) and kTwoPrime (remnant of multi-prime RSA, 2048-bit).
As neither name makes sense, they're just the new kKey1 and kKey2.

I've also switched from string literals to arrays, which avoids the
pesky trailing NUL. Sadly, it is a bit more verbose. Maybe we should
switch to writing something like:

  const std::vector<uint8_t> kKey1 = MustDecodeHex("abcdef1234...");

Static initializers don't matter in tests, after all.

Update-Note: We no longer accept 511-bit RSA and below. If you run into
this, update test keys to more modern sizes as we plan to raise the
limit beyond 512-bit RSA in the future. 512-bit RSA was factored in
1999, so keys at or near this limit have been obsolete for a very, very
long time.

Bug: 607
Change-Id: I13c3366d7e5f326710f1d1b298f4150a4e8e4d78
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59827
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Right now, MSVC has to fallback to refcount_lock.c, which uses a single,
global lock for all refcount operations. Instead, use the Interlocked*
APIs to implement them.

The motivation is two-fold. First, this removes a performance cliff when
building for Windows on a non-Clang compiler. (Although I've not been
able to measure it in an end-to-end EVP benchmark, only a synthetic
refcount-only benchmark.)

More importantly, it gets us closer to assuming atomics support on all
non-NO_THREADS configurations. (The next CL will clear through that.)
That, in turn, will make it easier to add an atomics-like abstractions
to some of our hotter synchronization points. (Even in newer glibc, with
its better rwlock, read locks fundamentally need to write to memory, so
we have some cacheline contention on shared locks.)

Annoyingly, the Windows atomic_load replacement is not quite right. I've
used a "no-op" InterlockedCompareExchange(p, 0, 0) which, empirically,
still results in a write. But a write to the refcount cacheline is
surely better than taking a global exclusive lock. See comments in file
for details. OpenSSL uses InterlockedOr(p, 0), but that actually results
in even worse code. (InterlockedOr needs a retry loop when the
underlying cmpxchg fails, whereas InterlockedCompareExchange is a single
cmpxchg.)

Hopefully, in the future (perhaps when we require VS 2022's successor,
based on [1]), this can be removed in favor of C11 atomics everywhere.

[1] https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/

Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I125da139e2fd3ae51e54309309fda16ba97ccf20
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59846
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
On Windows, we can rely on Interlocked APIs. On non-Windows builds, we
currently require C11 but permit C11 atomics to be missing, via
__STDC_NO_ATOMICS__. This CL tightens this so C11 atomics are required
on non-MSVC builds.

My hope is that, now that we require C11 on non-Windows, this is a
fairly safe requirement. We already require pthreads on any platform
where this might apply, and it's hard to imagine someone has C11,
pthreads, but not C11 atomics.

This change means that, in later work, we can refactor the refcount
logic to instead be a <stdatomic.h> compatibility layer, and then an
atomics-targetting CRYPTO_refcount_t implementation. With a
<stdatomic.h> compatibility layer, we can use atomics in more places,
notably where our uses of read locks are causing cacheline contention.

The platform restriction isn't *strictly* necessary. We could, like with
refcounts, emulate <stdatomic.h> with a single, global lock. Indeed any
platforms in this situation have already been living with that lock for
refcounts without noticing. But then later work to add "atomics" to read
locks would regress contention for those platforms. So I'm starting by
rejecting this, so if any such platform exists, we can understand their
performance needs before doing that.

Update-Note: On non-Windows platforms, we now require C11 atomics
support. Note we already require C11 itself. If this affects your build,
get in touch with BoringSSL maintainers.

Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I868fa4ba87ed73dfc9d52e80d46853ef56715a5f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59847
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
refcount.c is now a single, generic file that calls into C11-atomic-like
APIs. Behind the scenes, this selects one of C11 atomics, Windows
interlocked APIs, or unsynchronized reads/writes (in the no-threads
build).

This frees us up to use atomics elsewhere in the library. For now, this
only binds sequentially consistent atomics, but we can add other memory
orders if needed. In particular, I believe up_ref only needs relaxed
atomics. Some of the later change I think only need acquire and release,
but I'm not positive.

Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ifcd7357611bb7a8cd14b82c23ad080d1a2df1386
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59848
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
We don't take write locks in the PRNG, steady state, but we do take some
read locks: computing fork generation, reading the fork-unsafe buffering
flag, and a FIPS-only artifact of some global state clearing mess. That
last one is completely useless, but it's a consequence of FIPS's
understanding of process exit being comically inconsistent with reality.

Taking read locks is, in principle, parallel, but the cacheline write
causes some contention, even in newer glibcs with faster read locks. Fix
these:

- Use atomic reads to check the fork generation. We only need to lock
  when we observe a fork.

- Replace the fork-unsafe buffering flag with an atomic altogether.

- Split state_clear_all_lock into a per-rand_thread_state lock. We still
  need a read lock, but a completely uncontended one until process exit.

With many threads, this gives a significant perf boost.

x86_64, non-FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 45131875 RNG (16 bytes) operations in 300039649us (150419.7 ops/sec): 2.4 MB/s
Did 44089000 RNG (32 bytes) operations in 300053237us (146937.3 ops/sec): 4.7 MB/s
Did 43328000 RNG (256 bytes) operations in 300058423us (144398.5 ops/sec): 37.0 MB/s
Did 45857000 RNG (1350 bytes) operations in 300095943us (152807.8 ops/sec): 206.3 MB/s
Did 43249000 RNG (8192 bytes) operations in 300102698us (144114.0 ops/sec): 1180.6 MB/s
After:
Did 296204000 RNG (16 bytes) operations in 300009524us (987315.3 ops/sec): 15.8 MB/s
Did 311347000 RNG (32 bytes) operations in 300014396us (1037773.5 ops/sec): 33.2 MB/s
Did 295104000 RNG (256 bytes) operations in 300012657us (983638.5 ops/sec): 251.8 MB/s
Did 255721000 RNG (1350 bytes) operations in 300016481us (852356.5 ops/sec): 1150.7 MB/s
Did 103339000 RNG (8192 bytes) operations in 300040059us (344417.3 ops/sec): 2821.5 MB/s

(Smaller PRNG draws are more impacted because they spend less time in the
DRBG. But they're also more likely because you rarely need to pull 8K of
data out at once.)

x86_64, FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 29060000 RNG (16 bytes) operations in 300081190us (96840.5 ops/sec): 1.5 MB/s
Did 31882000 RNG (32 bytes) operations in 300118031us (106231.5 ops/sec): 3.4 MB/s
Did 30925000 RNG (256 bytes) operations in 300113646us (103044.3 ops/sec): 26.4 MB/s
Did 31969000 RNG (1350 bytes) operations in 300096688us (106529.0 ops/sec): 143.8 MB/s
Did 33434000 RNG (8192 bytes) operations in 300093240us (111412.0 ops/sec): 912.7 MB/s
After:
Did 299013000 RNG (16 bytes) operations in 300012167us (996669.6 ops/sec): 15.9 MB/s
Did 289788000 RNG (32 bytes) operations in 300014611us (965913.0 ops/sec): 30.9 MB/s
Did 298699000 RNG (256 bytes) operations in 300013443us (995618.7 ops/sec): 254.9 MB/s
Did 247061000 RNG (1350 bytes) operations in 300018215us (823486.7 ops/sec): 1111.7 MB/s
Did 100479000 RNG (8192 bytes) operations in 300037708us (334887.9 ops/sec): 2743.4 MB/s

On an M1 Pro, it's mostly a wash by default (fewer threads because this chip has fewer cores)

aarch64, M1 Pro, 8 threads:
Before:
Did 23218000 RNG (16 bytes) operations in 80009131us (290191.9 ops/sec): 4.6 MB/s
Did 23021000 RNG (256 bytes) operations in 80007544us (287735.4 ops/sec): 73.7 MB/s
Did 22853000 RNG (1350 bytes) operations in 80013184us (285615.4 ops/sec): 385.6 MB/s
Did 25407000 RNG (8192 bytes) operations in 80008371us (317554.3 ops/sec): 2601.4 MB/s
Did 22128000 RNG (16384 bytes) operations in 80013269us (276554.1 ops/sec): 4531.1 MB/s
After:
Did 23303000 RNG (16 bytes) operations in 80011433us (291245.9 ops/sec): 4.7 MB/s
Did 23072000 RNG (256 bytes) operations in 80008755us (288368.4 ops/sec): 73.8 MB/s
Did 22807000 RNG (1350 bytes) operations in 80013355us (285039.9 ops/sec): 384.8 MB/s
Did 23759000 RNG (8192 bytes) operations in 80010212us (296949.6 ops/sec): 2432.6 MB/s
Did 23193000 RNG (16384 bytes) operations in 80011537us (289870.7 ops/sec): 4749.2 MB/s

This is likely because, without RDRAND or MADV_WIPEONFORK, we draw from
the OS on every call. We're likely bottlenecked by getentropy, whether
it's some internal synchronization or syscall overherad. With
fork-unsafe buffering enabled, this change shows even more significant
wins on the M1 Pro.

aarch64, fork-unsafe buffering, M1 Pro, 8 threads:
Before:
Did 25727000 RNG (16 bytes) operations in 80010579us (321545.0 ops/sec): 5.1 MB/s
Did 25776000 RNG (32 bytes) operations in 80008587us (322165.4 ops/sec): 10.3 MB/s
Did 25780000 RNG (256 bytes) operations in 80006127us (322225.3 ops/sec): 82.5 MB/s
Did 33171250 RNG (1350 bytes) operations in 80002532us (414627.5 ops/sec): 559.7 MB/s
Did 54784000 RNG (8192 bytes) operations in 80005706us (684751.2 ops/sec): 5609.5 MB/s
After:
Did 573826000 RNG (16 bytes) operations in 80000668us (7172765.1 ops/sec): 114.8 MB/s
Did 571329000 RNG (32 bytes) operations in 80000423us (7141574.7 ops/sec): 228.5 MB/s
Did 435043750 RNG (256 bytes) operations in 80000214us (5438032.3 ops/sec): 1392.1 MB/s
Did 229536000 RNG (1350 bytes) operations in 80001888us (2869132.3 ops/sec): 3873.3 MB/s
Did 57253000 RNG (8192 bytes) operations in 80004974us (715618.0 ops/sec): 5862.3 MB/s

Note that, on hardware with RDRAND, the read lock in
rand_fork_unsafe_buffering_enabled() doesn't do much. But without
RDRAND, we hit that on every RAND_bytes call. More importantly, the
subsequent CL will fix a bug that will require us to hit it more
frequently.

I've removed the volatile on g_fork_detect_addr because I think we
didn't need it and this avoids thinking about the interaction between
volatile and atomics. The pointer is passed into madvise, so the
compiler knows the pointer escapes. For it to be invalid, the compiler
would need to go out of its way to model madvise as not remembering the
pointer, which would be incorrect of it for MADV_WIPEONFORK.

Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ie6977acd1b8e7639aaa419cf6f4f5f0645bde9d1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59849
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Nabil Wadih and others added 25 commits May 26, 2023 20:55
Change-Id: I7458b1d7aa1736d586dc80660d59c07fa2ac1c8a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59805
Reviewed-by: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Update-Note: SSL_CIPHER_get_value was our original name for the
function. OpenSSL later called it SSL_CIPHER_get_protocol_id. I believe
all external callers have since been updated to use the new function.
(If I missed a few stragglers, replace with SSL_CIPHER_get_protocol_id
to fix.)

Change-Id: I956fb49bf2d13a898eed73177493d2c8d50778ad
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60205
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
We're this awkward mix of "group" and "curve" right now. On the spec
side, this is because they used to be "curves", but then RFC 7919
renamed to "group" in an attempt to generalize FFDH and ECDH. The
negotiated FFDH stuff never really went anywhere (the way it used cipher
suite values in TLS 1.2 made it unusable), but the name change stuck.

In our implementation and API, we originally called it "curve". In
preparation for TLS 1.3, we renamed the internals to "group" to match
the spec in
https://boringssl-review.googlesource.com/c/boringssl/+/7955, but the
public API was still "curve".

Then we exported a few more things in
https://boringssl-review.googlesource.com/c/boringssl/+/8565, but I left
it at "curve" to keep the public API self-consistent.

Then we added OpenSSL's new "group" APIs in
https://boringssl-review.googlesource.com/c/boringssl/+/54306, but
didn't go as far to deprecate the old ones yet.

Now I'd like to add new APIs to clear up the weird mix of TLS codepoints
and NIDs that appear in our APIs. But our naming is a mess, so either
choice of "group" or "curve" for the new API looks weird.

In hindsight, we probably should have left it at "curve". Both terms are
equally useless for the future post-quantum KEMs, but at least "curve"
is more unique of a name than "group". But at this point, I think we're
too far through the "group" rename to really backtrack:

- Chromium says "group" in its internals
- QUICHE says "group" in its internals and public API
- Our internals say "group"
- OpenSSL has switched to "group" and deprecated "curve", so new APIs
  will be based on "group"

So align with all this and say "group". This CL handles set1_curves and
set1_curves_list APIs, which already have "group" replacements from
OpenSSL. A follow-up CL will handle our APIs. This is a soft deprecation
because I don't think updating things is particularly worth the churn,
but get the old names out of the way, so new code can have a simpler API
to target.

Also rewrite the documentation for that section accordingly. I don't
think we need to talk about how it's always enabled now. That's a
reference to some very, very old OpenSSL behavior where ECDH negotiation
needed to be separately enabled.

Change-Id: I7a356793d36419fc668364c912ca7b4f5c6c23a2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60206
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
This adds "group" versions of our codepoint-based APIs. This is mostly
because it looks weird to switch terminology randomly in the
implementation. See I7a356793d36419fc668364c912ca7b4f5c6c23a2 for
additional context.

I've not bothered renaming the bssl_shim flags. That seems a waste of
time.

Change-Id: I566ad132f5a33d99efd8cb2bb8b76d9a6038c825
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60207
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Right now we use NIDs to configure the group list, but group IDs (the
TLS codepoints) to return the negotiated group. The NIDs come from
OpenSSL, while the group ID was original our API. OpenSSL has since
added SSL_get_negotiated_group, but we don't implement it.

To add Kyber to QUIC, we'll need to add an API for configuring groups to
QUICHE. Carrying over our inconsistency into QUICHE's public API would
be unfortunate, so let's use this as the time to align things.

We could either align with OpenSSL and say NIDs are now the group
representation at the public API, or we could add a parallel group ID
API. (Or we could make a whole new SSL_NAMED_GROUP object to pattern
after SSL_CIPHER, which isn't wrong, but is even more new APIs.)

Aligning with OpenSSL would be fewer APIs, but NIDs aren't a great
representation. The numbers are ad-hoc and even diverge a bit between
OpenSSL and BoringSSL. The TLS codepoints are better to export out to
callers. Also QUICHE has exported the negotiated group using the
codepoints, so the natural solution would be to use codepoints on input
too.

Thus, this CL adds SSL_CTX_set1_group_ids and SSL_set1_group_ids. It
also rearranges the API docs slightly to put the group ID ones first,
and leaves a little note about the NID representation before introducing
those.

While I'm here, I've added SSL_get_negotiated_group. NGINX seems to use
it when available, so we may as well fill in that unnecessary
compatibility hole.

Bug: chromium:1442377
Change-Id: I47ca8ae52c274133f28da9893aed7fc70f942bf8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60208
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Many of our point addition functions internally check for the doubling
case and branch because the addition formulas are incomplete. This
branch is fine because the multiplication formulas are arranged to not
hit this case. However, we don't want to leak the couple of intermedate
values that determine whether to branch. Previously, we ran into this
with https://boringssl-review.googlesource.com/c/boringssl/+/36465.

This wasn't sufficient. The compiler understands if (a & b) enough to
compile into two branches. Thanks to Moritz Schneider, Nicolas Dutly,
Daniele Lain, Ivan Puddu, and Srdjan Capkun for reporting this!

Fix the leak by adding a value barrier on the final value. As we're also
intentionally leaking the result of otherwise secret data flow, I've
used the constant_time_declassify functions, which feed into our
valgrind-based constant-time validation and double as barriers.

Accordingly, I've also added some CONSTTIME_SECRET markers around the
ECDSA nonce value, so we can check with valgrind the fix worked. The
marker really should be at a lower level, at ec_random_nonzero_scalar or
further (maybe RAND_bytes?), but for now I've just marked the nonce.
To then clear valgrind, add constant_time_declassify in a few other
places, like trying to convert infinity to affine coordinates. (ECDH
deals with secret points, but it is public that the point isn't
infinity.)

Valgrind now says this code is constant-time, at least up to compilation
differences introduced by the annotations. I've also inspected the
compiler output. This seems to be fine, though neither test is quite
satisfying. Ideally we could add annotations in ways that don't
influence compiler output.

Change-Id: Idfc413a75d92514717520404a0f5424903cb4453
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60225
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
All inputs are marked as secret. This is not to support a use case for
calling X25519 with a secret *point* as the input, but rather to ensure
that the choice of the point cannot influence whether the scalar is
leaked or not. Same for the initial contents of the output buffer.

This is a conservative choice and may be revised in the future.

Change-Id: I595d454a8e1fdc409912aee751bb0b3cf46f5430
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60186
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
NGINX requires this constant if SSL_get_negotiated_group is defined.
OpenSSL returns this to indicate a named group constant it doesn't
understand, which doesn't make sense out of SSL_get_negotiated_group
because the library wouldn't negotiate a group it doesn't know about.
Nonetheless, define it for compatibility.

Fixed: 615
Change-Id: I05a6d607912bb8507be9ac6ff5fe1699b4715f6b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60326
Commit-Queue: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Andres Erbsen noticed we didn't actually have tests to catch when the
format macros were wrong.

In doing so, remove BN_DEC_FMT2. It was unused and only makes sense in
the context of the bignum-to-decimal conversion, where we no longer use
it anyway. None of these macros are exported in OpenSSL at all, so it
should be safe to remove it. (We possibly can remove the others too. I
see one use outside the library, and that use would probably be better
written differently anyway.)

Change-Id: I4ffc7f9f7dfb399ac060af3caff0778000010303
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60325
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
BN_nnmod() can deal with the situation that the first and the second
arguments are the same, but it cannot deal with the first and the
second argument being equal. In that situation, if BN_mod(x, y, x, ctx)
results in a negative x, then the result of BN_nnmod() is zero. This
breaks the strange BN_mod_inverse(m, a, m, ctx).

Reported by Guido Vranken in
openssl/openssl#21110

Change-Id: I8584720660f214f172b3b33716a5e3b29e8f2fd8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60365
Reviewed-by: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Bug: 285222360
Change-Id: I7f35bcc734dd853e99cb691bdc681f75c9f137e4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60265
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Did 59000 Ed25519 key generation operations in 1004188us (58753.9 ops/sec) [+8.3%]
Did 57000 Ed25519 signing operations in 1005649us (56679.8 ops/sec) [+7.9%]
Did 19000 Ed25519 verify operations in 1054380us (18020.1 ops/sec) [-2.0%]
Did 61000 Curve25519 base-point multiplication operations in 1007401us (60551.9 ops/sec) [+8.3%]
Did 22000 Curve25519 arbitrary point multiplication operations in 1022882us (21507.9 ops/sec) [+0.5%]

Change-Id: I14668f658b1ae99850cb0f8938f90f988d0edd0b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60107
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
…M128_CONTEXT layout in aesv8-gcm-armv8.pl.

This is modifying not-yet-used code.
Commit 784fa29 should have deleted
this comment. The analogous comment was deleted in BoringSSL in that
merged commit.
…r curve25519.

Don't add the constant-time validation tests since we need to develop the
framework for it first.

Do add the public-from-private test.
Originally I was trying to be pedantic and avoid any use of `_t`-
suffixed names. However, this hasn't really accomplished anything
except annoying me, so just do what BoringSSL does.
Bring these in as they were in 4a0393f.
The next merge will modify these.
@briansmith briansmith self-assigned this Sep 29, 2023
*ring* doesn't use the BoringSSL code that uses these constant-time
utilities.
@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Merging #1659 (2e6d759) into main (bc5d2c3) will decrease coverage by 0.03%.
Report is 4 commits behind head on main.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##             main    #1659      +/-   ##
==========================================
- Coverage   92.09%   92.07%   -0.03%     
==========================================
  Files         132      132              
  Lines       18869    18921      +52     
  Branches      196      199       +3     
==========================================
+ Hits        17378    17421      +43     
- Misses       1455     1463       +8     
- Partials       36       37       +1     
Files Coverage Δ
crypto/constant_time_test.c 58.82% <100.00%> (ø)
crypto/curve25519/curve25519.c 99.65% <100.00%> (-0.01%) ⬇️
crypto/fipsmodule/bn/internal.h 100.00% <ø> (ø)
crypto/fipsmodule/ec/ecp_nistz384.inl 100.00% <100.00%> (ø)
crypto/fipsmodule/ec/gfp_p384.c 100.00% <100.00%> (ø)
crypto/fipsmodule/ec/p256-nistz.c 95.89% <100.00%> (ø)
crypto/fipsmodule/ec/p256.c 99.60% <100.00%> (ø)
crypto/fipsmodule/ec/util.h 75.00% <100.00%> (ø)
crypto/limbs/limbs.c 95.38% <100.00%> (ø)
crypto/limbs/limbs.h 100.00% <ø> (ø)
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@briansmith briansmith merged commit 3a77fe1 into main Sep 29, 2023
@briansmith briansmith deleted the b/merge-boringssl-16 branch September 29, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants