-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executable for training with a persistent data store #99
Comments
I was thinking about it, but I thought it would be beyond the scope of this gem. Instead a separate repo can be created that uses this gem to facilitate a full blown CLI. Here is how I envision it (assuming that the executable is named # Default store: redis://127.0.0.1:6380/0, but customizable using CLI flag such as:
# --store=redis://user:[email protected]:6380/2
# --store=postgresql://user:[email protected]:6380/5433/classifierdb
$ classifier train {class} {file_path|url|string|STDIN}
# If a file path is given as the last argument then read the content of the file
# If the input is a URL then fetch the content from the URL
# Or automatic batch training based on the sub-folder names
$ classifier train /path/to/training/folder
# Classes can be inferred from the names of the sub-folders of /path/to/training/folder
# Files from each sub-folder can be used as individual training instances
# Some built-in cleaners can be applied (by default or with a flag) such as removing markup if the files are HTML
$ classifier untrain {class} {file_path|url|string|STDIN}
# Or automatic batch untraining based on the sub-folder names
$ classifier untrain /path/to/untraining/folder
$ classifier classify {file_path|url|string|STDIN}
# Or automatic batch classification of files from a directory
$ classifier classify /path/to/data/folder
# => Two columns of output on STDOUT; class name and file path for each file
# Alternatively, the files can be copied/moved in class-named sub-folders of the output directory
$ classifier classify /path/to/data/folder /path/output/base/folder
# Copy /path/to/data/folder/record.txt to /path/output/base/folder/{class}/record.txt Further to this, a sub-command $ classifier server --namespace=/foo --store=redis://user:[email protected]:6380/2 --port=2017
# Listening on http://localhost:2017
# GET /foo/train/{class}/{string|url}
# POST /foo/train/{class} [upload_file]
# GET /foo/untrain/{class}/{string|url}
# POST /foo/untrain/{class} [upload_file]
# GET /foo/classify/{string|url}
# POST /foo/classify [upload_file] Ideally, the training should be done only using Additionally, various command like flags can be stored in a config file to read from, but overwritten if supplied from the terminal. |
Start small: a simple CLI that can accept arguments and train/untrain/classify. If you find there is a compelling reason to add a web server, then that can be added later. For now, I'd start small and I'd keep the executable in this repo as it provides no added functionality beyond the library's core functions. Branch out once that PoC is done and it has users. |
Note: I missed some important aspects initially, so now I have updated the proposed CLI/server API. @parkr, I agree that we can start small and branch off later. However, I was worried that unless we make really toy utility, we will have to use some sophisticated CLI library such as Thor that will add unnecessary clutter to this Gem. As far as the server is concerned, I was only trying to lay out the possible API that can be packaged into a binary. This will provide food for thoughts and help us architect the application in a way that can accommodate these use-cases when it gets evolved. |
Per @parkr's idea it might be useful to have an executable that could be used to train and classify inputs for systems using persistent datastores.
The text was updated successfully, but these errors were encountered: