diff --git a/rfcs/uniform-array.md b/rfcs/uniform-array.md new file mode 100644 index 0000000..4b3758f --- /dev/null +++ b/rfcs/uniform-array.md @@ -0,0 +1,143 @@ +# `(Camlinternal)Uniform_array`: runtime- and compiler-supported uniform arrays + +This RFC proposes to add minimal support in the OCaml implementation for *uniform arrays*, which always use the standard `Array_tag` representation and never `Double_array_tag`. + +The compiler would offer specialized built-in and C primitives to work with uniform arrays, and a minimal `CamlinternalUniform_array` module in the standard library exposing these primitives. + +This would make it easier to write some unsafe idioms that enforce uniform arrays in more complex ways today, simplifying the code of `Domain.DLS`, the unboxed implementation of `Dynarray`, and the `Uniform_array` module of Jane Street's Base library and making it easier to reason about their correctness. + + +## Current status + +The OCaml representations of arrays is currently as follows: + +1. There are two forms of arrays supported by the runtime: + - blocks with tag `Array_tag` followed by `value` arguments + - blocks with tag `Double_array_tag` followed by `double` arguments + +2. Elements of type `floatarray` are always represented as `Double_array_tag` blocks. + +3. If "flat float arrays" is disabled by the `-no-flat-float-array` flag, elements of some types `foo array` are always represented as `Array_tag` blocks. + +4. If "flat float arrays" is enabled (this is the default setting), there is a dynamic check on array creation, and `Array_tag` or `Double_array_tag` is used depending on whether the initializing value is a double. + + Then all array operations perform a dynamic check on the tag to perform the right computation depending on the operation. + + +Note: the runtime supports storing `float` values inside `Array_tag` blocks, although there is no safe-OCaml way to do it. This is used implicitly by our current implementation of `Domain.DLS`, which uses an `Obj.t array` as an heterogeneous array, where some elements may have type `float`. This is okay as long as the array was not initialized with a `float` element. + +## Proposal + +We propose to add a new OCaml type of *uniform* arrays, that are always represented as blocks with the tag `Array_tag`, never `Double_array_tag`. + +Runtime support functions (in `runtime/array.c`) and compiler primitives are extended to offer specialized versions for uniform arrays. + +## Rationale + +Having uniform arrays available make it easier to reason about the safety of some unsafe use-cases, similar to the `Obj.t array` of `Domain.DLS`: using `Obj.t CamlinternalUniform_array.t` makes it easier to reason about the code. + +We know of the following potential users: + +- Jane Street's Base library exposes a [Uniform_array module](https://github.com/janestreet/base/blob/master/src/uniform_array.mli) precisely for this reason -- they use it to implement "unboxed option arrays" for example. Upstream support would not remove the need for this module, but it would simplify the implementation and may provide some speed benefits in some case -- for example they could reuse the runtime-provided `fill`, `blit`, `sub`, `append` instead of reimplementing them in OCaml. + +- `Domain.DLS` could be defined using an `Obj.t CamlinternalUniform_array.t`, making its code easier to reason about. In particular, #[12889](https://github.com/ocaml/ocaml/pull/12889) would benefit as it would make it easier to write thread-safe code manipulating DLS arrays -- currently the code generated by `a.(i)` contains an allocation in the `float` code path, preventing the use of `[@poll error]` on the relevant function. + +- `CamlinternalUniform_array.t` could also be used in the unboxed implementation of `Dynarray`, #[12885](https://github.com/ocaml/ocaml/pull/12885). For example, the implementation currently contains the following functions in a `Dummy.Array` module: + + ```ocaml + let make n x ~dummy = + if Obj.(tag (repr x) <> double_tag) then + Array.make n (of_val x) + else begin + let arr = Array.make n (of_dummy dummy) in + Array.fill arr 0 n (of_val x); + arr + end + + let copy a ~dummy = + if Obj.(tag (repr a) <> double_array_tag) then + Array.copy a + else begin + let n = Array.length a in + let arr = Array.make n (of_dummy dummy) in + Array.blit a 0 arr 0 n; + arr + end + ``` + + None of this would be necessary if uniform arrays were available. + +## Design questions/answers + +### Where to define this + +This would be implemented as an internal module of the standard library, `CamlinternalUniform_array`. "Proper" support as a `Uniform_array` in the standard library is not planned for now. + + +### Scope of the module + +We propose that `CamlinternalUniform_array` remain a minimal module that exposes only the operations that benefit from runtime support. This is not a user-facing module so it needs not offer all the niceties of `Array` and `Float.Array`. + + +### Type definition + +We are considering three different approaches to define the uniform array type: + +- abstract type: `type 'a t` +- built-in type: `type 'a t = 'a uniform_array` +- private type: `type 'a t = private 'a array` + +Having a built-in type requires more work and adds more baggage to the compiler, but it makes it easy for optimizers to reason specifically on the `uniform_array` type. + +Having a private type exposes to users the fact that the representation is compatible with `'a array`, which could be convenient for some use-cases -- `Array` functions that consume arrays could be called on uniform arrays; but this does not work for functions which return arrays, such as `Array.map`. + +Having an abstract type leaves the door open to use the built-in or private approach later. This is the default proposal. + + +## Alternative approaches + +### Enable `-no-flat-float-array` by default + +This would not be necessary if `-no-flat-float-array` became the default, and over time the only supported mode. This would have various benefits (it simplifies the type theory) but also the usual, obvious downsides that it makes a lot of simple, natural code written using `float array` much slower overnight. + +There is no consensus towards this invasive change today, and good arguments against it. + +Note that the current proposal would not be particularly costly if `-no-flat-float-array` later became the default. At that point, "uniform arrays" and "arrays" would be the exact same type, and we end up with a trivial and useless `camlinternalUniform_array` module. This is no big deal. + + +### Implement this sort of code in C + +The rationale for OCaml uniform arrays is to make it easier to write certain kinds of unsafe code. The alternative is to write this code in C: the core of `Domain.DLS` and `Dynarray` could be implemented in C rather than OCaml. + +Currently my impression is that writing FFI code from C is even more unsafe and difficult than writing unsafe code in OCaml, in particular due to the very weak typing of OCaml values on the C side that makes it very easy for simple mistakes to slip in. For both `Domain.DLS` and `Dynarray` I personally preferred "OCaml code with weird hacks and some informal static reasoning about array creation" over "C code". + +(Note: C code arguably remains the better choice when reasoning about asynchronous callbacks is necessary, as the runtime functions currently provide much better control over asynchronous callbacks than plain OCaml code.) + + +### Trick the OCaml compiler using array types + +It may be possible to trick the compiler into selecting the non-float representation by lying about the array type: + +```ocaml +module Lying_array : sig + type 'a t + val get : 'a t -> int -> 'a + val set : 'a t -> int -> 'a -> unit +end = struct + type 'a lie = ('a * unit) array (* some type the compiler knows is non-float *) + external of_lie : 'a lie -> 'a = "%identity" + external to_lie : 'a -> 'a lie = "%identity" + + type 'a t = 'a lie array + let get a i = of_lie a.(i) (* [@poll error] works here *) + let set a i v = a.(i) <- to_lie v +end +``` + +I see two downsides: + +- I do not know whether this choice would work well with optimizers making assumptions based on type information. (In general lying about types in this way is not a good idea because of Flambda and other potential optimizers.) On the other hand, once types are erased this approach only lies in that it uses the `Paddrarray` kind for array operations, and: + 1. This is what the `-no-flat-float-array` mode uses for generic array primitives today, so it must be safe (assuming Flambda people also test that mode), and + 2. This is also what our proposal would use for the `uniform_array` primitives, so we want optimizers to not assume to much from `Paddrarray` anyway -- it may in particular contain `int` or even `double` values. + +- For array operations that are implemented in the runtime (`blit`, `fill`, etc.), we still need to carefully think about whether to reimplement them in OCaml for that type, or somehow cast our lying arrays into normal arrays to call the runtime. The end result is more complexity rather than less complexity, something that we hope to avoid by adding first-class support for uniform arrays in the runtime.