Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to encryption #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 85 additions & 52 deletions details/encryption.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ required in some cases, such as between helpers.
Formally, this HTTPS only applies to those places where specifying
interoperability requirements are specified, which is:

* Between a record collector and the helper party network.
* Between a report collector and the helper party network.

* Between helpers in a helper party network.

Expand Down Expand Up @@ -158,41 +158,46 @@ composed from:
1. The fixed string "private-attribution", encoded in ASCII, terminated with a
single zero-valued byte.

2. The [ASCII serialization of the match key provider
origin](https://datatracker.ietf.org/doc/html/rfc6454#section-6.2),
terminated with a single zero-valued byte.

3. The [ASCII serialization of the helper party
origin](https://datatracker.ietf.org/doc/html/rfc6454#section-6.2),
terminated with a single zero-valued byte.

4. The [ASCII serialization of the registrable domain for the current
2. The [ASCII serialization of the registrable domain for the current
site](https://url.spec.whatwg.org/#host-registrable-domain) encoded in ASCII
as a period-separated sequence of
[A-](https://datatracker.ietf.org/doc/html/rfc5890#section-2.3.2.1) or
[NR-LDH](https://datatracker.ietf.org/doc/html/rfc5890#section-2.3.2.2)
labels (that is, the ASCII version of a domain name), terminated with a
single zero-valued byte.

3. The [ASCII serialization of the match key provider
origin](https://datatracker.ietf.org/doc/html/rfc6454#section-6.2),
terminated with a single zero-valued byte.

4. The [ASCII serialization of the helper party
origin](https://datatracker.ietf.org/doc/html/rfc6454#section-6.2),
terminated with a single zero-valued byte.

5. The single-byte key identifier from the key configuration for the helper
party.

6. The current epoch, encoded as an two-byte integer in network byte order.

7. A single byte that indicates the event type, set to either 0 (for a source
event) or 1 (for a trigger event).

This produces the following process in pseudocode:

```python
def ipa_info(mkp_origin, helper_origin, site_origin, key_id, epoch):
def ipa_info(site_registrable_domain, mkp_origin, helper_origin,
key_id, epoch, event_type):
return concat(encode_str("private-attribution"),
encode(0, 1),
ascii_origin(site_registrable_domain),
encode(0, 1),
ascii_origin(mkp_origin),
encode(0, 1),
ascii_origin(helper_origin),
encode(0, 1),
ascii_origin(site_registrable_domain),
encode(0, 1),
encode(key_id, 1),
encode(epoch, 2))
encode(epoch, 2),
encode(event_type, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are sending an event_type here then do we still need secret shares of is_trigger_bit to be passed? Since now all helpers would know that in clear?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is absolutely right. The helper parties can share the is_trigger bit. No savings to be had from it being sparse though because we only single it at first

```

### Encryption
Expand All @@ -207,7 +212,8 @@ key to produce the message for that helper.
This process is applied for each helper, as follows:

```python
info = ipa_info(mkp_origin, helper_origin, site_origin, key_id, epoch)
info = ipa_info(site_registrable_domain, mkp_origin, helper_origin,
key_id, epoch, event_type)
enc, sctxt = SetupBaseS(pkH, info)
ct = sctxt.Seal("", concat(mk[i], mk[i+1]))
enc_mk = concat(enc, ct)
Expand Down Expand Up @@ -239,7 +245,8 @@ the secret key, the encapsulation key, and the same `info` string.

```python
enc, ct = parse(enc_mk)
info = ipa_info(mkp_origin, helper_origin, site_origin, key_id, epoch)
info = ipa_info(site_registrable_domain, mkp_origin, helper_origin,
key_id, epoch, event_type)
rctxt = SetupBaseR(enc, skR, info)
request, error = rctxt.Open("", ct)
```
Expand Down Expand Up @@ -268,53 +275,73 @@ typically 16-20 bytes long in practice.

# Query Encryption

A record collector submits queries to a chosen helper party. This uses HTTPS,
but this is not sufficient because the content of a query is secret shares that
are intended for each of three helper parties.
A report collector submits queries to a chosen helper party. This uses HTTPS,
which protects information from network adversaries.

If data for all three helpers were submitted to the chosen helper, additional
protection would be required to protect the information intended for other
helper parties in the network.

Each query requires that the record collector provide multiple items or records.
Each record is split into three components, each that is sent to one of the
three helper parties. Each component includes one part of the encrypted match
key (which does not necessarily require further encryption), information
necessary for decryption (the origin of the site where this match key was
requested, the key identifier used in match key encryption and the epoch), and
supplementary information provided by the record collector: event type (source
or trigger), the trigger value, the breakdown key, and attribution constraint
ID.
Each record contains data for each of the three helper parties. Each component
includes:

Of these fields, the current design only permits the decryption information to
be directly exposed; all other values are secret shared. Decryption information
is also likely to be stable over time, so using a form of run-length encoding
for these values should make the overall encoding more efficient.
* one part of the encrypted match key (which does not necessarily require
further encryption),

* information necessary for decryption:
* the registrable domain of the site where this match key was requested,
* the event type (source or trigger),
* the epoch, and
* the key identifier for the helper party public key that was used.

Note that, other than the key identifier, the same data is sent to all three
helper parties.

* the query information provided by the report collector:
* the trigger value or the breakdown key,
* attribution constraint ID, and
* a timestamp.

This last group of fields is secret shared.

The information used in decryption is also likely to be stable over time and
shared between many records, so using a form of run-length encoding for these
values should make the overall encoding more efficient.


## Query Submission Options

It is necessary to separate the process of creating a query from the process of
uploading records for that query.

The query creation process is not particularly relevant to this discussion. We
shall assume that the record collector creates a query somehow. We assume that
there is some entity (or set of entities), likely one of the helper parties,
that is authorized to create a query and able to coordinate the process.
The query creation process is not particularly relevant to this discussion. How
a report collector creates a query willl be detailed in another note. We assume
that there is some entity (or set of entities), likely one of the helper
parties, that is authorized to create a query and able to coordinate the
process.

At the highest level, submitting data for a query can follow two basic patterns:

1. Query creation produces a record submission endpoint at each of the involved
helpers. The record collector submits records to each of the helper parties
helpers. The report collector submits records to each of the helper parties
separately. This approach is relatively simple as it can rely on the
protections offered by TLS to ensure that record data is only visible to the
correct helper party. However, it requires that helpers all expose a public
endpoint capable of accepting data for active queries.

There is also a tiny amount of data that is repeated to all three helper
parties (event type, epoch, and site). These values are repeated often, so
we assume that we can design a compression scheme that will eliminate most of
this overhead.

2. Record submission all flows through a single entity, which might be a helper
party. This might allow record submission to be coupled to query creation,
which simplifies the query process. The single entity might be a helper
(like the [PPM leader
role](https://dt.ietf.org/doc/html/draft-ietf-ppm-dap#name-system-architecture)).
However, it means that TLS protection is not sufficient. Data that is
destined for helper parties needs to be encrypted so that the receiving
entity is unable to see it.
destined for helper parties needs an additional layer of encryption so that
the receiving entity is unable to see it.

We originally intended to adopt the latter model. Having a single point of
contact allows for an asymmetric deployment of helper parties, where some helper
Expand All @@ -327,16 +354,19 @@ could be delivered sequentially or interleaved.

Note that any architectural decisions might be distinct from business
arrangements. A single point of contact might be desirable for things like
simplifying billing interactions. This could be provided with either model.
simplifying billing interactions. The query creation process should provide
report collectors with a single interface for creating queries. This can also
provide a single interface for retrieving results.


### Interleaving
### Interleaving Challenges

Fully sequential delivery is likely to produce some difficulties for helper
processing, because no multi-party computation can occur until all helper
parties have their shares. A sequential upload delays processing until more
than two thirds of the records are uploaded. For a large dataset, that is
undesirable.
undesirable, especially when large amounts of input needs to be retained at the
first help until other helpers start to receive data.

The question then becomes how to best interleave records. Interleaving at the
transport layer, by submitting requests as separate flows, provides strong
Expand All @@ -352,34 +382,36 @@ records thus far are shared with all helpers, which can validate that both their
own and their peers have inputs that hash correctly.

Replicated secret sharing offers a simpler option: helpers can also calculate a
running hash of shares and periodically compare that with their peers.
running hash of shares and periodically compare that with their peers. Our MPC
protocols use a MAC for validating certain actions (like the multiplication or
reveal subprotocols), so we might even reuse those protocols for input
validation.

Interleaving of records for different helper parties in the same stream ensures
that data is more tightly synchronized and might offer less risk of corruption,
but it complicates the data format considerably. There are also very few
opportunities for size savings. The only datum that might be shared between the
parallel flows of data to each helper party is the epoch, which is small and
changes only infrequently.
but it complicates the data format considerably. There are also only limited
opportunities for size savings through sharing data, and those savings are lost
when the data is forwarded to other helper parties.


## Proposal

This proposes that the protocol use a separate flow of data for each helper
party.

The simplest design has data sent directly from the record collector to each
The simplest design has data sent directly from the report collector to each
helper party. This requires a three stage query process:

1. In the preparation stage, a query is created. This first stage establishes
parameters for the query and provides the record collector with an endpoint
parameters for the query and provides the report collector with endpoints
where data can be submitted to each helper party.

2. In the upload stage, the record collector concurrently submits data to all of
2. In the upload stage, the report collector concurrently submits data to all of
the helper parties. The helper parties then execute the MPC protocol, parts
of which can commence as soon as the first data becomes available.

3. In the final stage, shares of the results are published by each helper and
retrieved by the record collector. The record collector combines these to
retrieved by the report collector. The report collector combines these to
obtain the results.

This system only depends on the protections afforded by TLS.
Expand Down Expand Up @@ -408,4 +440,5 @@ though [RFC 8291](https://datatracker.ietf.org/doc/html/rfc8291) does, a more
complete design is still needed.

A more modern design based on HPKE might be preferable. Details for this
additional encryption can be arranged later.
additional encryption can be arranged later if this is found to be the better
architectural choice.