Skip to content

Commit

Permalink
support for skipping sentence split
Browse files Browse the repository at this point in the history
  • Loading branch information
fginter committed Jul 11, 2015
1 parent 7151e79 commit 2345f6d
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion split_text_with_comments.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,9 @@ fi

cat | $PYTHON check_encoding.py | $PYTHON hash_comments.py -d $TMPDIR/comment_hashes.json > $TMPDIR/hashed_text.txt

cat $TMPDIR/hashed_text.txt | opennlp SentenceDetector model/fi-sent.bin | opennlp TokenizerME model/fi-token.bin | $PYTHON txt_to_09.py -d $TMPDIR/comment_hashes.json
if [[ "$1" == "--no-sent-split" ]]
then
cat $TMPDIR/hashed_text.txt | grep -Pv '^\s*$' | opennlp TokenizerME model/fi-token.bin | $PYTHON txt_to_09.py -d $TMPDIR/comment_hashes.json
else
cat $TMPDIR/hashed_text.txt | opennlp SentenceDetector model/fi-sent.bin | opennlp TokenizerME model/fi-token.bin | $PYTHON txt_to_09.py -d $TMPDIR/comment_hashes.json
fi

0 comments on commit 2345f6d

Please sign in to comment.