FED is a closed domain event detector system for sentences in the Portuguese language. It detect events from sentences, i.e., event trigger identification and classification. The event types are based on the typology of the FrameNet project (BAKER; FILLMORE; LOWE, 1998). The models were trained on an enriched TimeBankPT (COSTA; BRANCO,2012) corpus.
Currently, in this Colab, 5 different trained models are available to execution: 0, 5, 25, 50, and 100 which respectively correspond to: 214, 137, 31, 13, and 5 event types.
The system outputs the event detections in the following Json format:
[
{
"text": "aumentou",
"start": 12,
"end": 20,
"event_type": "Cause_change_of_position_on_a_scale"
},
{
"text": "disse",
"start": 58,
"end": 63,
"event_type": "Statement"
}
]
-
Download and place the BERTimbau Base (SOUZA; NOGUEIRA;LOTUFO, 2020) model and vocabulary file:
$ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/bert-base-portuguese-cased_tensorflow_checkpoint.zip
$ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/vocab.txt
Then unzip and place it in the the models directory as follows:
├──models | └── BERTimbau | └── bert_config.json | └── bert_model.ckpt.data-00000-of-00001 | └── bert_model.ckpt.index | └── bert_model.ckpt.meta | └── vocab.txt | |...
-
Install the packages.
$ pip install -r requirements.txt
-h, --help Print this help text and exit
--sentence SENTENCE Sentence string to detect events from
--dir INPUT-DIR OUTPUT-DIR Detect events from files of input directory
(one sentence per line) and write output json
files on output directory.
--model ID Identifier of models available: 0, 5, 25, 50 or
100. The default model is 0
The text files in the input directory are expected to have the format:
* all text files end with the extension .txt
* sentences are separated by newlines
$ python3 src/fed.py --dir /tmp/input-dir /tmp/output-dir
$ python3 src/fed.py --sentence 'A Petrobras aumentou o preço da gasolina para 2,30 reais, disse o presidente.'
Peer-reviewed accepted paper:
- Sacramento, A., Souza, M.: Joint Event Extraction with Contextualized Word Embeddings for the Portuguese Language. In: 10th Brazilian Conference on Intelligent System, BRACIS, São Paulo, Brazil, from November 29 to December 3, 2021.