Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在 .Net 中,以字符串初始化关键词识别时异常 #1639

Open
errore opened this issue Dec 21, 2024 · 3 comments
Open

在 .Net 中,以字符串初始化关键词识别时异常 #1639

errore opened this issue Dec 21, 2024 · 3 comments

Comments

@errore
Copy link

errore commented Dec 21, 2024

我使用sherpa-onnx库用于Unity中的关键词语音识别时,dll会抛出异常,从而导致Unity崩溃

它经常发生在如下情况:

  1. 当输入的参数不正确时(包括但不限于:keywords格式中在最后一行意外包含了一个\n符号、某个参数没有指定等)
  2. 另外的,在测试中,当windows系统语言不为utf-8时,遇到了崩溃情况
at <unknown> <0xffffffff>
at SherpaOnnx.KeywordSpotter:SherpaOnnxCreateKeywordStream <0x00069>
at SherpaOnnx.KeywordSpotter:CreateStream <0x00032>

这使得调试变得非常麻烦

能否有一种解决方案,如:
在c api中,使用isSuccess或者其它字段来在C#端知晓错误的发生并抛出异常

若无法解决,有任何的临时解决方案或建议吗?

@errore errore changed the title C dll 使 .Net 应用程序崩溃 C dll 抛出异常使 .Net 应用程序崩溃 Dec 21, 2024
@csukuangfj
Copy link
Collaborator

当输入的参数不正确时(包括但不限于:keywords格式中在最后一行意外包含了一个\n符号、某个参数没有指定等)

请自己保证输入是正确的。


另外的,在测试中,当windows系统语音不为utf-8时,遇到了崩溃情况

请提供详细的 log 和复现命令。

@errore
Copy link
Author

errore commented Dec 27, 2024

请提供详细的 log 和复现命令。

当然,我单独建立了一个 .Net 控制台程序,此现象也在我的电脑上还原了(在Unity中,我的电脑没有出现此情况):
我使用的系统是 Windows 11 Professional 23H2 中文版
环境:.Net 8.0 控制台程序
模型:sherpa-onnx-kws-zipformer-wenetspeech

关键代码:

var decoderPath = basePath + "decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
var encoderPath = basePath + "encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
var joinerPath = basePath + "joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
var tokensPath = basePath + "tokens.txt";

var kwsConfig = new KeywordSpotterConfig();
kwsConfig.FeatConfig.FeatureDim = 80;
kwsConfig.FeatConfig.SampleRate = 16000;
kwsConfig.ModelConfig.Transducer.Decoder = decoderPath;
kwsConfig.ModelConfig.Transducer.Encoder = encoderPath;
kwsConfig.ModelConfig.Transducer.Joiner = joinerPath;
kwsConfig.ModelConfig.Tokens = tokensPath;

var keywords = "x iǎo ài t óng x ué @小爱同学\nn ǐ h ǎo w èn w èn @你好问问\nx iǎo y ì x iǎo y ì @小艺小艺";
kwsConfig.KeywordsBuf = keywords;
kwsConfig.KeywordsBufSize = Encoding.UTF8.GetBytes(keywords).Length;

kwsConfig.ModelConfig.Provider = "cpu";
kwsConfig.ModelConfig.NumThreads = Environment.ProcessorCount;

var recognizer = new KeywordSpotter(kwsConfig);

当系统语言没有使用utf-8,报错:

D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\utils.cc:EncodeBase:67 Cannot find ID for token iǎo at line: x iǎo ài t óng x ué @小爱同学. (Hinteck the tokens.txt see if iǎo in it)
D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\utils.cc:EncodeBase:67 Cannot find ID for token ài at line: x iǎo ài t óng x ué @小爱同学. (Hint:ck the tokens.txt see if ài in it)
D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\utils.cc:EncodeBase:67 Cannot find ID for token óng at line: x iǎo ài t óng x ué @小爱同学. (Hinteck the tokens.txt see if óng in it)
...省略...
D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx/csrc/keyword-spotter-transducer-impl.h:InitKeywords:274 Encode keywords failed.

image

解决方法是勾上 Windows 的实验特性 utf-8,注意此操作需要重启电脑

也可能是我对 KeywordsBufKeywordsBufSize 的操作有误

请自己保证输入是正确的。

也许我不应该在游戏引擎中直接调试dll,单独建立 .Net 控制台程序调试要简单得多,也会有丰富的日志;
但是确实应当避免底层dll抛出异常,现在我要么祈祷它别在运行时发生错误,要么考虑将其隔离在其他线程

@errore errore changed the title C dll 抛出异常使 .Net 应用程序崩溃 在 .Net 中,以字符串初始化关键词识别时异常 Jan 2, 2025
@errore
Copy link
Author

errore commented Jan 9, 2025

后续,在Windows中,sherpa-onnx会以gbk格式加载tokens.txt文件,手动以utf-8格式加载string,使用tokensBuf,则不会引发异常。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants