Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executable for training with a persistent data store #99

Open
Ch4s3 opened this issue Jan 7, 2017 · 3 comments
Open

Executable for training with a persistent data store #99

Ch4s3 opened this issue Jan 7, 2017 · 3 comments

Comments

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 7, 2017

Per @parkr's idea it might be useful to have an executable that could be used to train and classify inputs for systems using persistent datastores.

@ibnesayeed
Copy link
Contributor

ibnesayeed commented Jan 8, 2017

I was thinking about it, but I thought it would be beyond the scope of this gem. Instead a separate repo can be created that uses this gem to facilitate a full blown CLI. Here is how I envision it (assuming that the executable is named classifier):

# Default store: redis://127.0.0.1:6380/0, but customizable using CLI flag such as:
#     --store=redis://user:[email protected]:6380/2
#     --store=postgresql://user:[email protected]:6380/5433/classifierdb

$ classifier train {class} {file_path|url|string|STDIN}
# If a file path is given as the last argument then read the content of the file
# If the input is a URL then fetch the content from the URL

# Or automatic batch training based on the sub-folder names
$ classifier train /path/to/training/folder
# Classes can be inferred from the names of the sub-folders of /path/to/training/folder
# Files from each sub-folder can be used as individual training instances
# Some built-in cleaners can be applied (by default or with a flag) such as removing markup if the files are HTML

$ classifier untrain {class} {file_path|url|string|STDIN}

# Or automatic batch untraining based on the sub-folder names
$ classifier untrain /path/to/untraining/folder

$ classifier classify {file_path|url|string|STDIN}

# Or automatic batch classification of files from a directory
$ classifier classify /path/to/data/folder
# => Two columns of output on STDOUT; class name and file path for each file
# Alternatively, the files can be copied/moved in class-named sub-folders of the output directory
$ classifier classify /path/to/data/folder /path/output/base/folder
# Copy /path/to/data/folder/record.txt to /path/output/base/folder/{class}/record.txt

Further to this, a sub-command server can be added to expose these functionalities over HTTP. We can use something like Sinatra for routing.

$ classifier server --namespace=/foo --store=redis://user:[email protected]:6380/2 --port=2017
# Listening on http://localhost:2017
# GET /foo/train/{class}/{string|url}
# POST /foo/train/{class} [upload_file]
# GET /foo/untrain/{class}/{string|url}
# POST /foo/untrain/{class} [upload_file]
# GET /foo/classify/{string|url}
# POST /foo/classify [upload_file]

Ideally, the training should be done only using POST method, untraining using PUT/PATCH, and classification using GET. However, supplying big text file in the GET path could be tricky. The default value of the namespace could be empty, but having it would allow serving multiple classifiers from the same server. Perhaps, it can also be configured to use specific stores for each namespace.

Additionally, various command like flags can be stored in a config file to read from, but overwritten if supplied from the terminal.

@parkr
Copy link
Member

parkr commented Jan 8, 2017

Start small: a simple CLI that can accept arguments and train/untrain/classify. If you find there is a compelling reason to add a web server, then that can be added later. For now, I'd start small and I'd keep the executable in this repo as it provides no added functionality beyond the library's core functions. Branch out once that PoC is done and it has users.

@ibnesayeed
Copy link
Contributor

Note: I missed some important aspects initially, so now I have updated the proposed CLI/server API.

@parkr, I agree that we can start small and branch off later. However, I was worried that unless we make really toy utility, we will have to use some sophisticated CLI library such as Thor that will add unnecessary clutter to this Gem. As far as the server is concerned, I was only trying to lay out the possible API that can be packaged into a binary. This will provide food for thoughts and help us architect the application in a way that can accommodate these use-cases when it gets evolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants