This is an initial attemp at mapping all the requirements that the interoperability layer will have in Verona to use external code written in other languages (like C++, Rust, etc) in a safe manner (sanbdox).
The aim is to have external code in Verona with a native interface that looks and feels like Verona code.
We want to abstract away the entirety of the sandbox functionality, the foreign function dispatch and marshalling, the execution models, etc.
The syntax is still largely undefined, but this is an example of using the C++ std::vector
class inside Verona as pseudo-Verona code:
// import a module in ./extern/StdVec with a config file exposing the includes and libraries.
// Declares the "StdVec" module with specific language and sandbox declarations
using "extern/StdVec";
// Create a new region (foreign, so parametrised on the sandbox type)
// This syntax is not defined yet, here we assume this is how we get a particular sandbox
// "int" here is relative to "StdVec", which defined all C++ types (from config) as an alias to a Verona type
vec = new StdVec[ProcessSandbox]::std::vector[int]();
// Use the object as if it was native
// "42" is "int", since it comes from the template parameter above
vec.push_back(int(42))
// The result will be used by a native Verona function directly, as it's an alias to a known Verona type
// "0" is "size_t" as defined by std::vector, which alias a known Verona type
print(vec.at(size_t(0)))
From the point of view of the Verona program, StdVec
is a module just like any other in Verona. Language, sandbox and platform declarations (types, behaviour) are done in Verona declarations.
The "extern/StdVec" directory has a configuration file, that will tell us all we need to know about the language, headers, libraries, etc. That's what drives the correct declarations in the "StdVec" module.
Not all foreign language functionality will be available to the Verona code, especially in the beginning.
Programmers are expected to create thin layers of foreign code to wrap functionality that can't be used in Verona (like what "extern C" does in C++).
Different sandbox technologies and platforms have different binary representation for the same types. We can use type aliases when constructing the modules so that int
alias the right Verona type (ex. I32
or I64
).
Programmers are also expected to wrap existing Verona classes (using fixed-width Verona types) with explicit types (and casts) that can work with any sandbox technology.
Each external language needs explicit support in the Verona compiler to be parsed at compile time and generate the correct stubs for the interoperability layer to correctly implement the remote procedure call (RPC).
Each execution model (child process, web assembly, etc) will need explicit support for deploying the functionality at run time.
But the overall architecture will be the same.
- The code will be in pure Verona language. There will be a compiler generated translation layer between Verona syntax and any supported language.
- The
parser
will generate Verona AST, creating a module for each foreign sandbox and adding all functions and types called from Verona to it as Verona nodes. - Verona AST will be lowered to
1IR
as is. - The
type checker
will query the foreign layer for the existence and validity of the corresponding Verona types. 1IR
is lowered to2IR
(MLIR
) as is.- The
codegen
layer will take concrete declarations and generateLLVM IR
for each foreign function and wrappers for each module in a new object file, as well asMLIR
code for each Verona function and wrapper. MLIR
will be lowered toLLVM IR
as is.- LLVM will generate machine code for the main code (Verona), for each foreign module in a separate object, and will statically link all auto-generated sandbox objects to the main object.
- Any dynamic libraries specified in the module configuration will be linked at run time.
Each step below in more details.
The Verona parser doens't know about any foreign language. The source code is in Verona, so Verona AST nodes are created.
The Verona AST allows for unknown types, so we can't yet query concrete foreign types. It also doens't have all concrete functions (from templates), so we also can't query about specific functions.
Verona modules are directories in the file system, with source files to be parsed and added as a type. Foreign modules can't be parsed by the Verona parser, so we keep a configuration file with the details, in addition to any foreign source files.
For each foreign module (language/sandbox/region), the parser creates a Verona module (with associated configuration file).
For each class and function call made to a foreign module, the parser creates a declaration on the respective module, keeping the source location of the call to use on error messages.
The end result is an AST that has foreign nodes as Verona nodes on Verona modules without the actual implementation.
For all purposes, this is a plain Verona AST with calls to functions that have been declared, but not defined.
Definition (and implementation in object code) will be done later by the compiler once we know all functions that have survived the initial round of cleanups (ex. reachability analysis).
During type inference and reification, the type checker will have to make sure the foreign types are valid, just like Verona types. However, the Verona compiler doesn't know anything about foreign types.
On sandbox modules, language types are defined as aliases to known Verona types. This is constructed by the compiler with knowledge about the language, the sandbox technology and the target platform.
At this stage, a foreign translation layer will have to be created, with the main functionality to:
- Parse the foreign source files in the module directory and create a foreign AST representation.
- Verify that types exist and are valid (including template types such as
std::vector<int>
). - Verify that specific function signatures (name, arity, types) exist in the foreign module (ex.
void std::vector<int>::push_back(int)
).
With the foreign layer in hand, the type checker can make sure all inferred types are valid in the foreign language (correct and available).
As a first implementation, we won't allow Verona template arguments in foreign code template arguments, to avoid combinatorial explosion of type checks.
But with all types concrete, the compiler can now check each function declaration for their foreign code implementations.
For example, from the Verona perspective, std::vector
is a generic class that takes a T
constrained by a union type (int
, float
, FooBar
, ...).
Built-in types are added to the union by default, user-declared types are added for each case.
You can check if a Verona union type is a subtype of this union type by checking whether the template can be instantiated with each Verona type (fast fail if the Verona type is not a C/C++ type).
Any Verona type that is less constrained than a union type over a set of concrete C/C++ types is not a subtype of the constrained type for T
.
With the source location propagated from the module, which came from the original Verona call line, we can inform programmers what are the actual Verona and foreign types from that specific line in case of errors.
With all concrete types and calls validated by both Verona and foreign checks, we now know that we can implement all foreign calls and type conversions in the compiler.
The compiler will generate code in two ways:
- All Verona code referring to a concrete call and argument marshalling will be lowered by the Verona compiler in the main object file.
- All foreign code referring to specialisation and argument marshalling will be lowered by the foreign layer compiler in a separate object.
So, for each sandbox region, the compiler:
- Generates calls to the runtime to setup the region and sandbox.
- Creates a dispatcher for the RPC mechanism, with index and a buffer for arguments and return values.
- Creates a foreign compile unit (via foreign interface) with required boilerplate for the language.
The foreign sources and headers have already being compiled into AST at type checking, so we still have all the information we need to lower the concrete implementations.
Within a particular sandbox, for each function called into the foreign module:
- If the call is a foreign template specialisation, the language driver implements the actual specialised function on the foreign object.
- The foreign module generates a marshalling function from buffer to foreign types and calls the final function, returning an auto-increment index.
- The foreign module adds that function to a dispatch table and return the index to the compiler.
- The compiler implements the Verona module function version with Verona arguments and return values, marshalling them into a memory buffer and calling the dispatcher with that index.
- The appropriate object files are created and later linked with the main executable.
- Dynamic libraries will be linked at run time.
The compiler is responsible for emitting errors that inform the programmer what type/function cannot be represented and potentially why.
The programmer is responsible for implementing a wrapper function that abstracts away the complexity of the language in a public function that Verona can call.
The language design is responsible for providing a way to allow portable code to be written in Verona that can run in different sandbox technologies, from different languages and on different target platforms.
Language support will be always evolving. This is expected to be a balance between "best effort" and "appropriate support".
When the main process starts, it will dynamically link all foreign libraries that were mentioned in the foreign modules configurations.
All auto-generated foreign code has already been statically linked to the main object at compile time.
When the sandbox is initialised, its allocator sets up a new isolated heap that is used for all allocations in the sandbox (including Verona foreign objects that belong to that region).
Verona passes the heap to snmalloc
on the sandbox side as its memory to manage. All calls to allocate and free are local but done on that region only.
Other sandbox code calls may be done directly (if safe, like reading from an already open file descriptor) or redirected to the sandbox driver (upcall) to decide what to do.
For each sandbox function call:
- The first call is made to the Verona module's wrapper, which will marshall the arguments and call the dispatcher with a compile-time constant index.
- The dispatcher will call the foreign wrapper to marshall back into foreign types and call the actual function.
- The return value, if any, is marshalled back into the buffer.
- The dispatcher returns the buffer to the Verona module's wrapper.
- The wrapper extracts the return value and return to the caller.
This sequence is entirely defined at compile time: How to marshall arguments and return values on either end, what is the constant index and the actual function to call.
For both C and C++, we use clang
to parse the header file to know what types are declared, functions are exported, etc.
This allows us to know which of the Verona code lines calling foreign code are correct, and to generate the correct marshalling routines and, for C++, instantiating the correct template implementations based on the types used in Verona.
The C++ interoperability layer parses the file and generates the clang AST.
We then use the Verona code function names and types to create a function declaration (context, name, argument types) and, if valid, we add that function to the dispatcher, giving it its own (auto-incremented) index.
Additional code needs to be generated to marshall the Verona arguments and return values into a buffer, so that the dispatcher can be generic, only taking an index and a memory buffer.
That code is part of the sandbox library, as a template function on the arguments and return values, that is generated by the compiler upon encountering each specific foreign function.
For C++, specialisation code will be created (via AST construction and LLVM code generation) if the types involved are parametrised.
Note that, for the first implementation, we won't allow Verona generic parameters to be passed to template instantiations.
This means that all possible template instantiations that a Verona generic may use must exist before reification, which means we have the same guarantees for Verona generics that use C++ templates as we do for other Verona generics (i.e. they either type check and work for all instantiations or they don't type check).
Our initial implementation is to separate the sandbox execution by creating a child process. This is not strictly speaking a sandbox, but gives us an easy way to test the rest of the framework without much additional work.
Calls will be forwarded across the parent/child barrier via the dispatcher, by means of a function pointer table, where the position is the index in the code generation stage above, and the buffer is the contents of the arguments and a place for the return value.
Argument decomposition will be done by the sandbox code that will call the actual function with the actual arguments and, if there is a return value, set it on the right place of the buffer and return.
The main driver in the child process will be a loop taking calls into the sandbox, calling them and returning the buffer to the dispatcher.
To avoid the sandbox from requesting memory in the wrong places (accidentally or maliciously), we use snmalloc
's ability to allocate memory in slabs that aren't managed by themselves.
The allocator in the Verona side manages the slabs and pass their ownership to the sandbox allocator, which then allocates directly on their reserved heap.
This is part of the sandbox library's functionality, which is passed along on the creation of the sandbox.
Because that heap is in a Verona sandbox region, it is safe to assume there are no race conditions introduced by the compiler onto the external library, due to the use of cowns to execute the foreign code.
This is the description of a mock implementation of the child process sandbox running the C++ snippet in the beginning of this document.
In this example, we'll go line by line and describe what the compiler and the generated code will do for each line. This is a orthogonal view as described above (per line, not per compiler stage).
using "extern/StdVec";
The compiler will:
- Create a Verona module called
StdVec
for the foreign module. - Read the module description file and discover that it's a C++ module.
- Call the C++ driver (clang wrapper) to parse the include file (and all its includes) and keep as a C++ AST.
- Expose an interface to query types and functions.
No discernible differences at run time.
vec = new StdVec[ProcessSandbox]::std::vector[int]();
The compiler will:
- Create a new region using
ProcessSandbox
:- Recognise the type as coming from a foreign module of a specific sandbox type (syntax pending).
- Lower code to create a region with a sandbox allocator (RT calls).
- Lower code to create the sandbox memory area (RT allocator calls), taking the maximum heap size from the sandbox type.
- Create a wrapper object for function dispatch and general sandbox utilities.
- Setup the constructor calls:
- Recognise that is a class constructor and declare the class with the right number of template parameters.
- Assemble the foreign call signature:
std::vector<int> std::vector<int>::vector<int>()
- Instantiate the signature as a template specialisation in the AST, using the query interface to validate the code and get an actual implementation.
- Create a constructor in Verona called
create(): vector[int]
withint
being the specific sandbox's own alias. - Generate the wrapper function from memory buffer to return value.
- Add that function to the sandbox dispatcher's table and associate the index with this particular specialisation, increment the index.
- Bind the result of the Verona call to the variable
vec
in the new region.
At run time:
- Initialise the region:
- A Cown is created with a specific sandbox allocator.
- Memory allocated by the parent allocator, used by the sandbox allocator.
- Any pending libraries (shared objects?) are loaded.
- The Cown is pushed to the queue.
- Call the constructor:
StdVec::std::vector[int]::create()
will be called, which calls the dispatcher for the region with index0
and no args.- The sandbox dispatcher calls the function pointer, which is a template object that has the actual function pointer plus the templated argument/return handling.
- The dispatcher returns, with the return value in the memory buffer.
- The Verona code reconstructs a Verona type from the return value in the buffer and returns from
StdVec::std::vector[int]
as an object. - Stores the return value in the memory pointed by the variable
vec
.
obj.push_back(int(42))
The compiler will:
- Validate the type of
42
(something likeC::int
, alias to something likeI32
). - Assemble the foreign call signature:
void std::vector<int>::push_back(int)
- Same as above, creates a new Verona function in the
StdVec
class that will marshall arguments and return value and call the dispatcher. - The Verona signature would be something like
StdVec::std::vector[int]::push_back(int)
. - Same as above, lowers C++ implementations of the functions on the foreign object file and append the function pointer to the dispatcher's table.
- Associate the Verona call to the newly created in
StdVec
class.
At run time:
StdVec::std::obj.push_back(int)
will be called, which calls the dispatcher for the region with index1
and marshalledint(42)
, for example, a 32-bit signed integer.- Sandbox dispatcher calls the function, which is from a template object that has the actual function pointer plus the templated argument/return handling.
- There is no return value, so
obj.push_back(int)
just returns.
print(obj.at(size_t(0)))
The compiler will:
- Validate the type of
0
(something likeC::size_t
) and the return type (C::int
). - Try to call a function on the foreign module.
- Assemble the foreign call signature:
int& std::vector<int>::at(size_t)
- Same as above, creates a new Verona function in the
StdVec
class that will marshall arguments and return value and call the dispatcher. - The Verona signature would be something like
StdVec::std::vector[int]::at(size_t) : int
. - Same as above, lowers C++ implementations of the functions on the foreign object file and append the function pointer to the dispatcher's table.
- Associate the Verona call to the newly created in
StdVec
class. - Uses the returned value to pass as an argument to the
S32::cast()
function. - Implement the cast function (via the specific sandbox library, from the specific sandbox type).
- Passes the cast value to the
print
function.
At run time:
- Calls
obj.at(size_t)
, which calls the dispatcher for the region with index2
and marshalledsize_t(0)
, for example, an unsigned 64-bit integer. - Sandbox dispatcher calls the function, which is from a template object that has the actual function pointer plus the templated argument/return handling.
- Dispatcher returns, with the return value in the memory buffer.
- The Verona code reconstructs a Verona type from the return value in the buffer and returns from
obj.at(size_t)
asint(42)
. - Calls the function
print
with the argument from the return value after a potential cast.
Note that the functions are almost identical, mainly:
- They all have a Verona definition (types, functions) which is what other Verona code will see, and is what converts types and call the dispatcher.
- They all find the dispatcher based on the region (which knows function indices and have the right buffer handling wrappers).
- They all call the dispatcher with a (run-time) constant index, defined at compile time, and the memory buffer created on the fly.
- They all compute the signature and instantiate the code. However, if the function has been called already, it already has a Verona implementation, so we just call it.
- They all end up in the dispatcher as an index and a buffer, and it's up to marshalling code to make sure the shape of that memory region is compatible from both sides, Verona and the external language.
- All foreign calls go through the dispatcher.
We generate marshalling code on the fly to allow for better compiler optimisations.