-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Promote the new Oobabooga repo/package (and remove from SK) #3153
Comments
Agreed, |
Based on @dmytrostruk, the Oobabooga connector now has its own repo. We should mark all of the existing Oobabooga functionality deprecated and redirect to the new repo. Will change name/description to reflect the change. |
#3176) ### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> Related: #3153 This PR contains changes to deprecate Oobabooga functionality in favor of new separate NuGet package: https://www.nuget.org/packages/MyIA.SemanticKernel.Connectors.AI.Oobabooga/ ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> Marked types and tests as `Obsolete`. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄
Adding my 2 cents here since I'm currently trying to update the new Oobabooga connector, the latest Nuget package of which seems to have issues that didn't appear upon initial tests and were recently reported. I went ahead updating the reference to the new Beta Nuget package, and now I'm scratching my head trying to adapt to the new AIRequestSettings model, not sure to understand the way forward, especially for the MultiConnector. I mean I've read about the 3 evolution options considered and the final choice but the current implementation has me puzzled by the way the new OpenAI settings we implemented. I understand the new ExtensionData dictionary is supposed to be a way to share properties like temperature or max tokens between prompt settings and various connectors, in a more flexible way that the former rigid structure but the OpenAI implementation only seems to use it one way through this weird json serialization/deserialization with a custom converter. Accordingly, that seems to mean to me there won't be any simple way to serialize/deserialize AIRequestSettings in a general manner, nor to recover the actual settings apart from when the new ExtensionData property is used, which does not seem to be the case when OpenAI request settings are created. This is going to make updating the Multiconnector pretty hard, and I'm a bit lost on what to do with Oobabooga. Shouldn't all concrete implementation of AIRequestSettings have ensured that dictionary property was used in the first place? |
@jsboige Thanks for reporting, I would like to understand your use-case better, if possible. In general, you should be able to define
Custom converter was implemented with purposes to support different naming cases for settings in
It would be really helpful, if you could share some code examples where
So, dictionary property was introduced as additional feature for flexibility. You can use only this property and access your settings in connector like var function = kernel.CreateSemanticFunction(prompt, requestSettings: new OobaboogaRequestSettings { MyProperty = "value" }); Please, let me know if this information is helpful! Thank you! |
@dmytrostruk thanks for your answer. I find it a bit cumbersome that we don't know really what's in this dictionary depending on the original AIRequestSettings, whether actual "soft" properties manually inserted at some point, JsonElements, as a result of the JsonExtensionData attribute upon deserialization of prompt settings, or nothing at all if settings are of a concrete type that do not make use of them. As for the multiconnector, it is supposed to work seemlessly with request settings regardless of their kinds, with several stages where they can be passed over unchanged or updated depending on the model-specific configurations, and there are several places where custom settings are serialized or deserialized. I will have to think about how to upgrade that... Anyway, I believe I have the global picture, and will try my best with the current state. |
@jsboige As Connector developer, you define how you want to work with settings, and at the point when you receive For example, if you add new Connector and you say that it should work with But if you say that there is no concrete type of settings and users should set them in If users pass some settings, which are not compatible with Oobabooga settings pattern, then you should probably throw an exception in Connector, or try to use default settings, if it's acceptable scenario.
Please let us know if any help is needed. Your use-case is very helpful, because it will allow us to understand if current |
Sure, dealing with the concrete Oobabooga settings should be straightforward and the documented regular case. It's the other cases that I find difficult. Again, having to serialize and deserialize using a custom converter because you don't really know what's in the object but you expect it might have shared parameters like temperature or maxtokens looks like a strange evolution from the base class where those properties were well defined and you didn't have to guess them from the json result. For Oobabooga, I might try to use the dictionary directly to save on perfs because we're not expecting something different the JsonElements deserialized from config.json. But for the multiconnector, I guess I'm left with the same OpenAI approach of serializing / deserializing and hope for the best. |
Parameters like |
Sure, that's the plan, thanks |
@dmytrostruk I just published a new Nuget package, with an attempt at following the new request settings format. Oobabooga RequestSettings (this class and that one) are relatively similar to OpenAI, in that they will attempt to read the extended properties only if the passed in settings are not of the concrete type or OpenAI's. They also account for the fact that MaxTokens has a different name in Oobabooga (MaxNewTokens), and that RepetitionPenalty has a different scale (that was already implemented in the previous version). Also, they don't rely on serialization and will directly read JsonElement values or use IConvertible if the dictionary contains primitive types instead. As for the MultiConnector, I introduced specific settings class that tries to keep everything in the extended properties' dictionary, while upper casing keys. That was tested on the existing simple examples, but in order to support all cases, I will need to rework those settings to account for and map all existing connectors specific settings properties. Anyway, I guess that makes up for some feedback for your team, and I'll be happy to know what you think is the best way forward from there. |
@jsboige I would suggest removing OpenAI dependency from Oobabooga Connector. Even if settings for both Connectors are similar or exactly the same, it's still separate Connectors and having different types for them (even with the same properties) is completely fine. Here is the example, how the settings could be defined for semantic function: {
"schema": 1,
"description": "My semantic function",
"models": [
{
"service_id": "open-ai-gpt-4",
"max_tokens": 150,
"temperature": 0.9,
"top_p": 0.0
},
{
"service_id": "oobabooga-ai-model",
"max_new_tokens": 200,
"repetition_penalty": 1.18,
"do_sample": false
},
]
} At the point of semantic function execution, it should recognize available AI Connectors in Kernel and map settings to specific Connector. Also, it should be possible to specify settings not only on Connector level, but on AI Model level (e.g. For OpenAI, there will be separate settings for
That's exactly the reason why we shouldn't try to align settings to the same type/format. They are just different by nature and should be implemented separately.
From Semantic Kernel perspective, we should be compatible with different AI providers, AI models and related settings for them. In Hugging Face, there are around 30,000 text completion models available at the moment and most probably the settings for each of them are different as well.
In your scenario, I would suggest avoiding dictionary property at all (at least in first implementation). Try to use just concrete type ( I hope that makes sense and thank you for your feedback! I would be happy to continue this conversation and see if current setting model is scalable enough to cover your and other cases. Thanks again! |
@dmytrostruk I didn't plan to support OpenAI in the Oobabooga connector to start with, and you're probably right it is overkill. I just realized the OpenAI dependency already exists in the regular Semantic-kernel nuget package so I thought why not. But this is a very edge case indeed. Maybe I should reference a more restricted package?
As for the dictionary property, it is necessary to support it as it is where the prompt template config params reside, in the form of deserialized JsonElement properties. OpenAI connector does it through a round trip of json serialization/deserialization and a custom json converter. I chose to tap in the dictionary property directly to avoid the overhead, but I believe it is pretty much equivalent. Now the multiconnector is a different story. It has to harmonize accross models and prompt settings by definition, on the way in and on the way out, thus the use of the same serialization mechanism to flatten both concrete and dictionary properties. As for Huggingface, this is pretty much what Oobabooga is about. Pretty much all the models in the open LLM leaderboard are supported in Oobabooga, and I can tell you that apart from specific properties like mirostat for Llama.cpp models, pretty much all of them use the set of common properties that the previous ChatGPT inspired shared Completion settings exhibited. Only the scale for repetition penalty and beam search instead of alternate responses were significantly different. Anyway, I'll remove the OpenAI dependency on next release, but for now, it seems to work fine. Let's see what users say. |
@jsboige So, if we look from SK user perspective, if I want to use SK in my applications, I install SK and I have OpenAI Connector out of the box in order to be able to quickly jump in and integrate with AI. But if I'm SK Plugin or Connector developer, in ideal scenario I need to use just
I got the point now. Yes, so in current implementation you can work with strong settings type if you inject it programmatically, but when you load them from // Pass settings instance directly - access MySettings in Connector
var semanticFunction1 = kernel.CreateSemanticFunction("template", requestSettings: new MySettings { Text = "text", Number = 2 });
// Initialize settings instance with config.json - access ExtensionData in Connector
var semanticFunction2 = kernel.CreateSemanticFunction("template", PromptTemplateConfig.FromJson(GetConfigPayload())); But even with this case, semantic-kernel/dotnet/src/Connectors/Connectors.AI.OpenAI/OpenAIRequestSettings.cs Lines 131 to 155 in 18ffc4b
Your
Please let me know if this scenario will work for you. Thank you! |
microsoft#3176) ### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> Related: microsoft#3153 This PR contains changes to deprecate Oobabooga functionality in favor of new separate NuGet package: https://www.nuget.org/packages/MyIA.SemanticKernel.Connectors.AI.Oobabooga/ ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> Marked types and tests as `Obsolete`. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄
Important
Labeled Urgent because it may require a breaking change if we should remove TextCompletionRequest.
There are two types that derive from
AIRequestSettings
:OpenAIRequestSettings
andTextCompletionRequest
. The latter is specific to the Oobabooga connector but doesn't have Oobabooga in the name nor theRequestSettings
suffix. It's not clear to me how someone is supposed to know they should be creating aTextCompletionRequest
in order to fulfill anAIRequestSettings
parameter.To solve this, we will remove it from our current repo and point users to the new Oobabooga repo/package.
The text was updated successfully, but these errors were encountered: