Skip to content

Commit

Permalink
doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
brainsqueeze committed Jul 7, 2022
1 parent 0770fda commit 8c42a37
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 306 deletions.
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,16 @@ Models for contextual embedding of arbitrary texts.
## Setup
---

To get started, one should have a flavor of TensorFlow installed, with
version `>=2.4.1`. One can run
To get started, one should have a flavor of TensorFlow installed, with version `>=2.4.1`. One can run
```bash
pip install tensorflow>=2.4.1
```
If one wishes to run the examples, some additional dependencies
from HuggingFace will need to be installed. The full installation
looks like
If one wishes to run the examples, some additional dependencies from HuggingFace will need to be installed. The full installation looks like
```bash
pip install tensorflow>=2.4.1 tokenizers datasets
```

To install the core components as an import-able Python library
simply run
To install the core components as an import-able Python library simply run

```bash
pip install git+https://github.com/brainsqueeze/text2vec.git
Expand Down Expand Up @@ -51,7 +47,21 @@ This is an adapted bi-directional LSTM encoder-decoder model with a self-attenti

Both models are trained using Adam SGD with the learning-rate decay program in [[2](https://arxiv.org/abs/1706.03762)].

The pre-built auto-encoder models inherit from [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model), and as such they can be trained using the [fit method](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit). An example of training on Wikitext data is available in the [examples folder](./examples/trainers/wiki_transformer.py). This uses HuggingFace [tokenizers](https://huggingface.co/docs/tokenizers/python/latest/) and [datasets](https://huggingface.co/docs/datasets/master/).
The pre-built auto-encoder models inherit from [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model), and as such they can be trained using the [fit method](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit). Training examples are available in the [examples folder](./examples/trainers). This uses HuggingFace [tokenizers](https://huggingface.co/docs/tokenizers/python/latest/) and [datasets](https://huggingface.co/docs/datasets/master/).

If you wish to run the example training scripts then you will need to clone the repository
```bash
git clone https://github.com/brainsqueeze/text2vec.git
```
and then run either
```bash
python -m examples.trainers.news_transformer
```
for the attention-based transformer, or
```bash
python -m examples.trainers.news_lstm
```
for the LSTM-based encoder. These examples use the [Multi-News](https://github.com/Alex-Fabbri/Multi-News) dataset via [HuggingFace](https://huggingface.co/datasets/multi_news).


## Python API
Expand Down Expand Up @@ -120,10 +130,7 @@ Text2vec includes a Python API with convenient classes for handling attention an
## Inference Demo
---

Once a model is fully trained then a demo API can be run, along with a small
UI to interact with the REST API. This demo attempts to use the trained model
to condense long bodies of text into the most important sentences, using the
inferred embedded context vectors.
Once a model is fully trained then a demo API can be run, along with a small UI to interact with the REST API. This demo attempts to use the trained model to condense long bodies of text into the most important sentences, using the inferred embedded context vectors.

To start the model server simply run
```bash
Expand Down
4 changes: 1 addition & 3 deletions text2vec/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@
from . import autoencoders
from . import preprocessing
from . import optimizer_tools
from . import training_tools

__all__ = [
'models',
'autoencoders',
'preprocessing',
'optimizer_tools',
'training_tools'
'optimizer_tools'
]
291 changes: 0 additions & 291 deletions text2vec/training_tools.py

This file was deleted.

0 comments on commit 8c42a37

Please sign in to comment.