Text for Discussion Ipea (Institute for Applied Economic Research)

TD 2612 - Línguas Naturais E Máquinas Artificiais: aplicação de técnicas de mineração de texto para a classificação de sentenças judiciais brasileiras

This repository aims to use ML to classify sentence texts, using sample pre-sorted with regular expressions.

The files are sequenced as follows:

01_steam Receives the sentence texts from the file trf2_td_1st.csv, with the case number, and returns the file data_stemized.csv with the data ready for analysis.

02_reverse_onehotencoding Receives the texts cleaned in step 2 with the file: dados_stemizados.csv ,and the processes with sentence classification: tmp-sentencas-classificadas.csv. Then the data are merged into a single DF and the columns corresponding to the categorical classifications are converted to a single column. The final result is exported to the file: categorical.csv .

03_NB the result of applying NB is run here, with confusion matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
01_steam.py		01_steam.py
02_reverse_onehotencoding.py		02_reverse_onehotencoding.py
03_NLP_NB.py		03_NLP_NB.py
Archive.zip		Archive.zip
README.md		README.md
limpa.py		limpa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text for Discussion Ipea (Institute for Applied Economic Research)

About

Releases

Packages

Contributors 2

Languages

lucasmoreira/td_ipea

Folders and files

Latest commit

History

Repository files navigation

Text for Discussion Ipea (Institute for Applied Economic Research)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages