Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML SmaCC Parser #25

Open
seandenigris opened this issue Oct 13, 2020 · 0 comments
Open

HTML SmaCC Parser #25

seandenigris opened this issue Oct 13, 2020 · 0 comments

Comments

@seandenigris
Copy link
Owner

Initial import from antlr4 is already done. I used John Brant's script to convert the grammar for both the lexer and parser from https://raw.githubusercontent.com/antlr/grammars-v4/master/html. I pasted the results into the source view of https://github.com/seandenigris/Resources-Live/blob/master/src/ResourcesLive/RlHTMLParser.class.st, which also generated https://github.com/seandenigris/Resources-Live/blob/master/src/ResourcesLive/RlHTMLScanner.class.st, but the parser does not work. To fix it (per John Brant on Discord GToolkit help channel 10/12/2020):

Looking at your grammar, I think the next step would be to try to fix the TODO parts that are in the grammar that the conversion tool couldn't handle. It appears that there are two main issues with the grammar that weren't handled by the conversion. The first is that SmaCC doesn't have non-greedy matching for the scanner (.?). The other is the pushMode/popMode code. For the non-greedy matching, the regex needs to be modified. Some of them are easy to modify like SCRIPT_OPEN which can be changed to <script [^\>] > since it only ends with a > we can take any character except for the >. For items like SCRIPTLET that end with either a ?> or %>, then you would need a more complex regex similar to the one for a C-style comment /* / (e.g., /* [^\*] *+ ([^\/\*] [^\*]* *+)* / handles C comments). For the push/popMode stuff, you'll need to add a production before the token is used in the grammar. For example, in the script production, you would write PushScript <SCRIPT_OPEN> .... Then you'll need to create a PushScript : [self scope: #SCRIPT]; . Similarly for popMode, you would create a production like Pop to add before that token. For now, you could define it as Pop : [self scope: #default];. If a stack is really needed, then the push and pop rules will need to be modified a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant