Model Builder API #23223

skottmckay · 2024-12-30T06:16:09Z

Description

Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.

TODO: Validation API is feature complete and additional tests.

Motivation and Context

Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes.

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

+// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
+// "0": Disabled. [DEFAULT]
+// "1": Enable Model Builder Session
+static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";


onnxruntime/core/session/utils.cc

+#include "core/framework/error_code_helper.h"
+#include "core/framework/execution_provider.h"
+#include "core/session/abi_session_options_impl.h"
+// #include "core/session/environment.h"


skottmckay · 2024-12-30T08:22:43Z

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

+// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
+// "0": Disabled. [DEFAULT]
+// "1": Enable Model Builder Session
+static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";


This isn't currently used. Initially I was thinking it would enable copying inputs/initializers to the correct device AOT, but...

a) that requires knowing where the value would be used, which is dependent on partitioning that happens later and depends on which EPs are enabled so easy to get wrong;
and
b) would be counter-productive if an optimizer wanted to update the initializer as we'd have to copy it back to CPU to do that.

TBD if needed.

Craigacp · 2024-12-31T03:29:31Z

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

skottmckay · 2024-12-31T05:53:35Z

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 909 to 910 in 6e76179

    
           ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options, 
        
                           _In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

huningxin · 2024-12-31T06:50:58Z

onnxruntime/test/shared_lib/test_model_builder_api.cc

+    api.ReleaseTensorTypeAndShapeInfo(tensor_type_info);  // input_type_info took a copy
+
+    // create ValueInfo and release the type info as CreateValueInfo takes a copy.
+    OrtValueInfo* input_value_info = nullptr;


Where to release OrtValueInfo?

It seems SetGraphInputs() (and SetGraphOutputs()) would take ownership of OrtValueInfos. Resolving.

The ORT_CLASS_RELEASE macro defines ReleaseValueInfo if needed.

https://github.com/microsoft/onnxruntime/pull/23223/files#diff-5845a5c76fb64abdc8f0cffe21b37f8da1712674eb3abc4cd87190891be1bd48R5082

Craigacp · 2025-01-01T04:00:58Z

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 909 to 910 in 6e76179

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,

_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

skottmckay · 2025-01-02T00:13:30Z

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

You can specify the optimization level. If you keep it to level 1 (GraphOptimizationLevel.ORT_ENABLE_BASIC) it will only use standard ONNX ops.

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 343 to 345 in a3833a5

    
           typedef enum GraphOptimizationLevel { 
        
             ORT_DISABLE_ALL = 0, 
        
             ORT_ENABLE_BASIC = 1,

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 1020 to 1029 in a3833a5

    
             /** \brief Set the optimization level to apply when loading a graph 
        
              * 
        
              * Please see https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html for an in-depth explanation 
        
              * \param[in,out] options The session options object 
        
              * \param[in] graph_optimization_level The optimization level 
        
              * 
        
              * \snippet{doc} snippets.dox OrtStatus Return Value 
        
              */ 
        
             ORT_API2_STATUS(SetSessionGraphOptimizationLevel, _Inout_ OrtSessionOptions* options, 
        
                             GraphOptimizationLevel graph_optimization_level);

https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html

Import from microsoft/onnxruntime#23223

Import https://github.com/microsoft/onnxruntime/tree/main/include Commit is based on microsoft/onnxruntime#23223

* Pass ORT_API_VERSION to `OrtApiBase::GetApi()` Also removes the inclusion of onnx.pb.h header. * Add third_party/onnxruntime_headers Import https://github.com/microsoft/onnxruntime/tree/main/include Commit is based on microsoft/onnxruntime#23223 * Use ORT Model Builder API * Refactor scoped ORT type ptr 1. Rename to ScopedOrtTypePtr 2. Use macros 3. Introduce `operator T*()` 4. Introduce `Release()` method 5. Rename `get_ptr()` to `Get()` 6. Rename `get_pptr()` to `GetAddressOf()` * Remove ONNX Runtime headers from third_party/microsoft_dxheaders

skottmckay · 2025-01-02T05:09:13Z

onnxruntime/core/session/model_builder_c_api.cc

+  if (attributes != nullptr) {
+    n->attributes.reserve(attribs_len);
+    for (size_t i = 0; i < attribs_len; ++i) {
+      n->attributes.push_back(*reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(attributes[i]));


Should we call ReleaseOpAttr after it's copied into the node so the user doesn't have to? Would be more consistent with the rest of the API to 'take ownership' of them.

The same question applies to the CreateXXXTypeInfo/CreateValueInfo calls.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.) but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.)

OpAttr might also be re-used?

but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

I feel the consistency is for those AddXxxToXxx() methods who do ownership transferring? SetGraphInputs()/SetGrapOutputs() also take ownership (I was not aware of it when I initially used the API), should they be renamed to AddInputsToGraph()/AddOutputsToGraph()?

huningxin · 2025-01-02T08:15:28Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+   * Pre-existing memory:
+   *    Use CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue to create an OrtValue
+   *    with a tensor that contains a pointer to the existing data.
+   *    User must keep pointer valid for lifetime of the inference session.


Is it true if using CreateTensorWithDataAndDeleterAsOrtValue()?

Ownership transfers to ORT so the pointer must remain valid, but in that case the user shouldn't be freeing the memory at any point. Will update the comment to clarify.

huningxin · 2025-01-02T08:21:00Z

onnxruntime/core/session/ort_apis.h

+ORT_API(const OrtModelBuilderApi*, GetModelBuilderApi);
+
+ORT_API_STATUS_IMPL(CreateTensorWithDataAndDeleterAsOrtValue, _In_ OrtAllocator* deleter,
+                    _In_ void* p_data, size_t p_data_len,


Would p_data be written by ORT? Or should it take const void* if it is only read by ORT? A similar question is for CreateTensorWithDataAsOrtValue() which takes _Inout_ void* p_data.

https://github.com/microsoft/onnxruntime/blob/skottmckay/ModelBuilderAPI_PR/include/onnxruntime/core/session/onnxruntime_c_api.h#L1359C83-L1359C103

ORT wouldn't write to p_data itself, but a user could use update the data in the OrtValue returned using the ORT API (e.g. call GetTensorMutableData and make changes).

snnn · 2025-01-03T01:42:44Z

include/onnxruntime/core/graph/graph.h

@@ -27,6 +27,7 @@
 #include "core/common/span_utils.h"
 #include "core/common/status.h"
 #include "core/common/logging/logging.h"
+#include "core/framework/ort_value.h"


It introduces a circular dependency between onnxruntime_graph and onnxruntime_framework. ort_value is a core concept in onnxruntime_framework, which also depends on MemoryInfo, Allocators, etc. It means that the lifetime of a graph will be bound to an allocator. Furthermore, people may ask if the OrtValue can be put on GPU devices, etc.

That dependency already existed with InjectExternalInitializedTensors in this file. And there are lots of places in the graph code where we use types from the framework library. If we want to fix that we might need to limit the graph library to fairly pure ONNX related types, and have ORT things built on top of those in the framework library. e.g. you'd have an ONNX Graph class as well as an onnxruntime Graph class, and things like OrtValue usage would be in the latter.

Long term I think it would be better to convert initializers to OrtValue when loading from the ONNX model so we detach from the protobuf types asap. There are many reasons for doing so. Having to add things like InjectExternalInitializedTensors to efficiently manage memory is a good sign the current setup isn't working well.

Can you elaborate on how the lifetime of the Graph is bound to an allocator? The OrtValue instances internally have a Tensor instance where the deleter is in a shared_ptr, so I would have thought the Graph instance can go away at any time, and the shared_ptr for the allocator in the Tensor deleter would also keep the allocator alive for as long as needed.

The problem I'm trying to address is that there's pre-existing memory where we want to transfer ownership to ORT. e.g. to free CPU based memory if we copy it to GPU. Because we have protobuf based initializers there's no way to attach the deleter to them, and the ORT API deals in OrtValue. So this was the best option I could find to essentially pass through that OrtValue to session state finalization.

The OrtValue could theoretically be on GPU. If you did that you could avoid a copy (if you knew for sure the value would be used on GPU) but you'd break the current setup with optimizers as they expect initializer data to be on CPU. Not clear we want to allow that.

A tensor cannot live longer than the allocator that allocated the buffer.
An allocator cannot live longer than the corresponding EP(e.g. CUDA EP). Because the EP needs to manage a lot of handlers, and the allocator may need to use a device handler to do malloc/free. All such handlers get destroyed when the EP is destroyed.
That could make things complicated. For example, in InferenceSession class, we have:

std::shared_ptr<onnxruntime::Model> model_; // The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx PathString model_location_; // The list of execution providers. ExecutionProviders execution_providers_; //... std::unique_ptr<SessionState> session_state_;

The model_ variable contains a graph, which contains OrtValues, which should be deleted before the execution_providers_ . But they are not ordered in that way. We had similar issues with "execution_providers_" and "session_state_". So, this is very subtle.

Ah ok. So whilst the Tensor has a shared_ptr for the allocator, if the allocator depends on internals of the EP, and the EP goes away, it breaks due to that?

And if we add OrtValue to Graph, which is in InferenceSession::model_, which will be released after execution_providers_ it may break?

Should execution_providers_ therefore be declared prior to model_ in InferenceSession?

Yes, I think so.
However, the code is ok as for now if the graph's OrtValues only use CPU allocators which are relatively simple.

huningxin · 2025-01-03T06:28:06Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+   *    If using CreateTensorWithDataAsOrtValue you must keep the pointer valid for lifetime of the inference session.
+   *    Set `data_is_external` to true.
+   *
+   * Allocated memory:


According to our testing, some ops seem to require shape-inference-related initializers being in allocated memory, including:

Reshape's shape

Reduce's axes

Expand's shape

Slice's starts, ends and steps

If using pre-existing memory, there will be shape inference error, e.g.

[ShapeInferenceError] Cannot parse data from external tensors. Please load external data into raw data for tensor: x

If that is the case, it would be helpful to be documented.

As a general approach, there's about 60 bytes or so of overhead to use the external memory structure for pre-allocated memory, so if the value is less than say 128 bytes you're probably better off using allocated memory.

I think doing so could almost guaranteed that shape inferencing isn't going to break as I can't think of an input that shape inferencing would read that would have 128 bytes of data (16x int64_t dimension or indices values).

If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.

Model Builder API

6e76179

Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes.

github-advanced-security bot found potential problems Dec 30, 2024

View reviewed changes

skottmckay commented Dec 30, 2024

View reviewed changes

huningxin reviewed Dec 31, 2024

View reviewed changes

huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025

Import onnxruntime_headers into third_party

945eab1

Import from microsoft/onnxruntime#23223

huningxin mentioned this pull request Jan 2, 2025

Use ORT Model Builder API shiyi9801/chromium#24

Merged

huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025

Add third_party/onnxruntime_headers

d3b0a77

Import https://github.com/microsoft/onnxruntime/tree/main/include Commit is based on microsoft/onnxruntime#23223

skottmckay commented Jan 2, 2025

View reviewed changes

huningxin reviewed Jan 2, 2025

View reviewed changes

Minor updates

147c574

snnn reviewed Jan 3, 2025

View reviewed changes

huningxin reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Builder API #23223

Model Builder API #23223

skottmckay commented Dec 30, 2024

skottmckay Dec 30, 2024

Craigacp commented Dec 31, 2024

skottmckay commented Dec 31, 2024

huningxin Dec 31, 2024

huningxin Jan 1, 2025

skottmckay Jan 2, 2025

Craigacp commented Jan 1, 2025

skottmckay commented Jan 2, 2025

skottmckay Jan 2, 2025

skottmckay Jan 2, 2025

huningxin Jan 3, 2025

huningxin Jan 2, 2025

skottmckay Jan 2, 2025

huningxin Jan 2, 2025

skottmckay Jan 2, 2025

snnn Jan 3, 2025

skottmckay Jan 3, 2025

snnn Jan 3, 2025 •

edited

Loading

skottmckay Jan 3, 2025

snnn Jan 3, 2025

huningxin Jan 3, 2025

skottmckay Jan 3, 2025

Model Builder API #23223

Are you sure you want to change the base?

Model Builder API #23223

Conversation

skottmckay commented Dec 30, 2024

Description

Motivation and Context

Choose a reason for hiding this comment

Craigacp commented Dec 31, 2024

skottmckay commented Dec 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Craigacp commented Jan 1, 2025

skottmckay commented Jan 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snnn Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snnn Jan 3, 2025 •

edited

Loading