-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for treebank-parser tree structure #45
Comments
Rafik - I think that representation is a good idea. (I’m new to both Clojure and OpenNLP, but I’m interested in the project and learning as I go.)
|
@turbopape I could see that being a pretty good representation, but I didn't want to include that out of the box since people using |
Yes I agree, I didn't want to go for the "load string" solution, I've put 2016-11-17 23:10 GMT+01:00 Lee Hinman [email protected]:
[image: --] Rafik Naccache |
@turbopape certainly, I'm definitely down for adding more representations. I figure we can keep the map representation and have things that will output it in different formats depending on the user's taste. |
The (let [text-lines ["Hello, world!"]]
(->> text-lines
(map tokenize)
(map (partial string/join " "))
parse
(map #(str "(quote " % ")"))
(map load-string)))
;; => ((TOP (FRAG (INTJ (UH Hello)) () (NP (NN world)) (. !)))) With a slight modification to (def ^:private s-parser
(insta/parser
"E = <'('> T <WS> (T | (E <WS?>)+) <')'> <WS?> ; T = #'[^)\\s]+' ; WS = #'\\s+'"))
;; Only this function modified. Including above and below for reference.
(defn- tr
"Transforms treebank string into series of s-like expressions."
[ptree & [tag-fn]]
(let [t (or tag-fn symbol)]
(if (= :E (first ptree))
(concat
(list (t (second (second ptree))))
(map #(tr % tag-fn) (drop 2 ptree)))
(second ptree))))
(defn make-tree
"Make a tree from the string output of a treebank-parser."
[tree-text & [tag-fn]]
(tr (s-parser tree-text) tag-fn)) One kind of nice thing you can do with a tree like this is use the default zipper for iterating and manipulating the parse tree. (-> parsed-s-expression
(zip/seq-zip)
zip/down
zip/down
zip/rightmost
(zip/append-child '(. "!"))
zip/root)
;; => ((TOP (FRAG (INTJ (UH "Hello")) (, ",") (NP (NN "world")) (. "!") (. "!")))) |
Hey @dakrone,
I am particularly interested by the treebank-parser.
One cool representation would be actually a one-to-one translation from the string representation of the tree into a Clojure List, with the first element being the tag and the rest of it the chunk!
This will be visually more understandable, and stick with Lisp's common representation of data in general !
This could be done using some reader-tricks:
But it would be better to have it generated when the parse is being done...
Whadda ya think ?
The text was updated successfully, but these errors were encountered: