Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dayyass authored Aug 7, 2021
1 parent cb8226b commit b083cf1
Showing 1 changed file with 45 additions and 15 deletions.
60 changes: 45 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,64 @@ Pipeline for building text classification **TF-IDF + LogReg** baselines using **

### Usage
Instead of writing custom code for specific text classification task, you just need:
1) install pipeline:
1. install pipeline:
```shell script
pip install text-classification-baseline
```
2a) either run pipeline in **terminal**:
```shell script
text-clf --config config.yaml
```
2b) or run pipeline in **python**:
```python3
import text_clf
text_clf.train(path_to_config="config.yaml")
```
2. run pipeline:

- either in **terminal**:
```shell script
text-clf --config config.yaml
```

- or in **python**:
```python3
import text_clf
text_clf.train(path_to_config="config.yaml")
```

No data preparation is needed, only a **csv** file with two raw columns (with arbitrary names):
- text
- target
- `text`
- `target`

**NOTE**: the target can be presented in any format, including text - not necessarily integers from *0* to *n_classes-1*.
**NOTE**: the **target** can be presented in any format, including text - not necessarily integers from *0* to *n_classes-1*.

#### Config
The user interface consists of only one file [**config.yaml**](https://github.com/dayyass/text-classification-baseline/blob/main/config.yaml).

Change **config.yaml** to create the desired configuration and train text classification model.

Default **config.yaml**:
```{r engine='bash', comment=''}
cat config.yaml
```yaml
seed: 42
verbose: true
path_to_save_folder: models
# data
data:
train_data_path: data/train.csv
valid_data_path: data/valid.csv
sep: ','
text_column: text
target_column: target_name_short
# tf-idf
tf-idf:
lowercase: true
ngram_range: (1, 1)
max_df: 1.0
min_df: 0.0
# logreg
logreg:
penalty: l2
C: 1.0
class_weight: balanced
solver: saga
multi_class: auto
n_jobs: -1
```

#### Output
Expand Down

0 comments on commit b083cf1

Please sign in to comment.