Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
kevwan committed Oct 11, 2021
1 parent ec32a14 commit c58ff15
Show file tree
Hide file tree
Showing 4 changed files with 128 additions and 65 deletions.
16 changes: 16 additions & 0 deletions bot/nlp/sentencedetect.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package nlp

import (
"strings"
"unicode"

"github.com/tal-tech/go-zero/core/lang"
)
Expand All @@ -20,6 +21,11 @@ var (
)

func IsQuestion(sentence string) bool {
// we don't check whether question or not in English
if isAscii(sentence) {
return false
}

chars := []rune(strings.TrimSpace(sentence))
if len(chars) == 0 {
return false
Expand All @@ -37,3 +43,13 @@ func IsQuestion(sentence string) bool {

return false
}

func isAscii(s string) bool {
for i := 0; i < len(s); i++ {
if s[i] > unicode.MaxASCII {
return false
}
}

return true
}
73 changes: 73 additions & 0 deletions readme-cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# chatbot

[English](readme.md) | 简体中文

## 项目说明

`chatbot` 是一个通过已知对话数据集快速生成回答的 Go 问答引擎。比 [ChatterBot](https://github.com/gunthercox/ChatterBot) 快非常多,我们在1.2亿对话上的对比结果是:[ChatterBot](https://github.com/gunthercox/ChatterBot) 回答需要21秒,chatbot 只需要18毫秒。

* bot

问答引擎

* cli

* train

训练给定的问答数据并生成 `.gob` 文件

* `-d` 读取指定目录下所有 `json``yaml` 语料文件
* `-i` 读取指定的 `json``yaml` 语料文件,多个文件用逗号分割
* `-o` 指定输出的 `.gob` 文件
* `-m` 定时打印内存使用情况

* ask

一个示例的问答命令行工具

* `-v` verbose
* `-c` 训练好的 `.gob` 文件
* `-t` 数据几个可能的答案

## 数据格式

数据格式可以通过 `yaml` 或者 `json` 文件提供,参考 `https://github.com/kevwan/chatterbot-corpus` 里的格式。大致如下:

```yaml
categories:
- AI
conversations:
- - 什么是ai
- 人工智能是工程和科学的分支,致力于构建具有思维的机器。
- - 你是什么语言编写的
- Python
- - 你听起来像机器
- 是的,我受到造物者的启发
- - 你是一个人工智能
- 那是我的名字。
```
## 问答示例
```text
user: 在吗?
bot:
user: 在干嘛呢?
bot: 看电视
user: 看啥电视呀
bot: 活色生香
user: 很好看吗?
bot: 特搞笑
user: 你在哪里呀?
bot: 家里
user: 家里就你一个人嘛?
bot: 我喜欢一个人玩
user: 那我过来找你?
bot: 不可以,乖乖上班去
```
## 致谢
ChatterBot - https://github.com/gunthercox/ChatterBot
最早我是使用 [ChatterBot](https://github.com/gunthercox/ChatterBot).的,但由于回答太慢,所有后来只能自己实现了,感谢 [ChatterBot](https://github.com/gunthercox/ChatterBot),非常棒的项目!
20 changes: 0 additions & 20 deletions readme-en.md

This file was deleted.

84 changes: 39 additions & 45 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,67 @@
# chatbot

[English](readme-en.md) | 简体中文
English | [简体中文](readme-cn.md)

## 项目说明
`chatbot` is a conversational dialog engine build in Go which makes it possible to generate responses based on collections of known conversations. The language independent design of `chatbot` allows it to be trained to speak any language. It’s much faster than `ChatterBot`, our benchmark on 120 million conversations:

`chatbot` 是一个通过已知对话数据集快速生成回答的 Go 问答引擎。比 [ChatterBot](https://github.com/gunthercox/ChatterBot) 快非常多,我们在1.2亿对话上的对比结果是:[ChatterBot](https://github.com/gunthercox/ChatterBot) 回答需要21秒,chatbot 只需要18毫秒。
- ChatterBot takes 21s to answer
- chatbot takes 18ms to answer

## Project layout and command line tools

* bot

问答引擎
Conversational dialog engine

* cli

* train
训练给定的问答数据并生成 `.gob` 文件
* `-d` 读取指定目录下所有 `json` `yaml` 语料文件
* `-i` 读取指定的 `json` `yaml` 语料文件,多个文件用逗号分割
* `-o` 指定输出的 `.gob` 文件
* `-m` 定时打印内存使用情况

Train the given conversation data and generate corpus format file `.gob`

* `-d` reads all `json` and `yaml` corpus files in the specified directory
* `-i` read the specified `json` or `yaml` corpus files, splitting multiple files by commas
* `-o` specify the output `.gob` file
* `-m` print memory usage at regular intervals

* ask
一个示例的问答命令行工具

An example question and answer command line tool

* `-v` verbose
* `-c` 训练好的 `.gob` 文件
* `-t` 数据几个可能的答案
* `-c` trained `.gob` file
* `-t` data for several possible answers

## 数据格式
## Data format

数据格式可以通过 `yaml` 或者 `json` 文件提供,参考 `https://github.com/kevwan/chatterbot-corpus` 里的格式。大致如下:
The data format can be provided via `yaml` or `json` files, refer to the format in `https://github.com/kevwan/chatterbot-corpus`. Roughly, it is as follows.

```yaml
categories:
- AI
- artificial intelligence
conversations:
- - 什么是ai
- 人工智能是工程和科学的分支,致力于构建具有思维的机器。
- - 你是什么语言编写的
- Python
- - 你听起来像机器
- 是的,我受到造物者的启发
- - 你是一个人工智能
- 那是我的名字。
- - What is AI?
- Artificial Intelligence is the branch of engineering and science devoted to constructing machines that think.
- - Are you sentient?
- Sort of.
```
## 问答示例
## Example of a question and answer
```text
user: 在吗?
bot:
user: 在干嘛呢?
bot: 看电视
user: 看啥电视呀
bot: 活色生香
user: 很好看吗?
bot: 特搞笑
user: 你在哪里呀?
bot: 家里
user: 家里就你一个人嘛?
bot: 我喜欢一个人玩
user: 那我过来找你?
bot: 不可以,乖乖上班去
user: how are you?
bot: I am doing well, how about you?
user: I'm doing well, thanks!
bot: That is good to hear
```
## 致谢
## Acknowledgements
go-zero - https://github.com/zeromicro/go-zero
The `MapReduce` implementation of `go-zero in `core/mr` package gives `chatbot` a huge performance boost!

ChatterBot - https://github.com/gunthercox/ChatterBot

最早我是使用 [ChatterBot](https://github.com/gunthercox/ChatterBot).的,但由于回答太慢,所有后来只能自己实现了,感谢 [ChatterBot](https://github.com/gunthercox/ChatterBot),非常棒的项目!
I was using `ChatterBot` at first, but it responds too slow, so I decided to implement it myself. Thanks to ChatterBot, great project!

0 comments on commit c58ff15

Please sign in to comment.