-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide on the way forward #26
Comments
You are asking questions that people not familiarized with modules cannot answer easily. And no, there aren't 5 or more module combinations that make sense (per your Q1). The maintainers of this project hoped that we could reach some agreement, but it is obviously difficult if not impossible.
No, not everyone is in a hurry about that. It has been explained that there won't be any answer before August (and note that "not before August" is not the same as "in August"), and I am fine if for example my question in issue #25 is answered in August or September. And that's because in the meantime, we are (probably) going to have 1.5 with the automatic module name.
It is not reasonable to block the solution to a serious problem (the lack of a declared module name) because of hypothetical backwards compatibility issues in 2.0 (which is going to be called 2.0 precisely because some backward incompatibilities are possible). My proposed XOM module split (#25) for 2.0 would affect the subset of developers that use Finally, I try to keep a constructive dialogue but your rejection of dependency version upgrades looks like opposing just for the sake of it. Could we spare this modularization drama for now, release 1.5 and retake the matter sometime later? |
Replying to your edit of the original post:
This is a library and not the final application, so using carefully chosen version ranges is a good practice (albeit not mandatory). I gave more details in my response to your comment in the PR. |
First of all, I acknowledge the annoyance of having a main branch owner who isn't up-to-speed with Java Modules (it's taken me way more time in terms of calendar distance than I hoped to get some idea of Java Modules) and also is as slow to respond as I have been. Sorry. Currently, technically the compile-time-hard and run-time-optional dependency on both encoding detectors and on XOM are handled the same way. I see an issue open about XOM, but XOM seems to have more degrees of freedom for the solution, since if you use XOM, you already choose a XOM-specific API entry point anyway. However, the encoding detectors are something you can optionally enable regardless of entry point, so in that sense solving those should solve XOM, too, unless it's a problem for XOM types to appear in the outward API. Considering the way Java worked up to and including 8, it would have been a backward compatibility bug to change the fully-qualified name of a class that remains otherwise compatible. That is, if you wrote an app with So far, I've understood that there are now restrictions on what packages #25 says that while While I can guess that "formally wrong" is bad, is there some articulation of how the badness would manifest in practice? OTOH, https://github.com/carlosame/htmlparser/tree/xom-removed/src/nu/validator/htmlparser and https://github.com/carlosame/htmlparser-xom/tree/master/src/nu/validator/htmlparser/xom suggest that the module system does allow a module called Considering that the validator project has over the span of its existence gone from me spreading stuff over multiple repos to @sideshowbarker merging some of the repos, I'm a bit uneasy about additional repos and am leaning towards keeping the XOM stuff in this repo if permitted by the rules of Modules and Maven. Regarding a Maven dependency, I'd hope that after all changes, it would still be possible to point Eclipse to the source directories and the dependency jars and have stuff build without Maven. As for JDK target, I believe the source code is still Java 5-compatible. Not using new language constructs matters for the Java-to-C++ translator, but actually running the code on a Java 5 JVM is unlikely to be a use case worth supporting. Running on OpenJDK 8 and recent-ish Android does seem relevant, though. Do I understand correctly that modularized jars can be released with the byte code compatibility level set to 8 and then pre-Modules user just dumps all the jars in the classpath, the older JVM ignores the Module manifests, and that's it? Going back to the encoding detectors: Ideally, the parser would depend on a Java port of chardetng, but one does not exist. The ICU detector isn't very good. In the absence of a Java port of chardetng, jchardet can make some sense. Do I understand correctly that the blocker for depending on it even via I've never used the RPM and OSGi stuff myself. Someone contributed them at some point. I have no idea if someone still cares. Probably prudent to leave that stuff in for 1.5 if it's easy to do so, drop that stuff for 2.0 and see if anyone complains. |
Well, none of the changes being discussed involves changing the fully-qualified name of any class (except for the conversion that I did of the unit test in the proposed new XOM repo where I put it in its parent package, but I do not think that you are talking about this).
No you do not need to make source changes if you use Java 7 or 8 (at least if we discuss my modularization patches, are we?). If you use later JDKs, you need to
There is no such a restriction.
The point of modularization is to know which modules you require to do certain work. If you start saying "well this may require this and that, or it may not", then modularizing is moot. My point is: if you want to keep the Jar file as it is (containing all the current packages), there is no adavantage in doing a full modularization (which brings its own problem with it: filename-based dependencies), and the project should be happy with the automatic module name.
That's fine, but then my above comment applies. (PS: it would be great if somebody volunteered to write at least one unit test for XOM 🙂)
Yes Maven is just a deployment-stage thing (although you can use a special "Maven" kind of Eclipse project, but I personally do not like them). Just to clarify: you do not need to put Maven as a dependency in Eclipse.
If we are discussing my modular patches (the only ones that have been presented) they are compatible with Java 7 and 8, and at the same time modular JDKs (JDK 11+) can get their
|
Thanks, I appreciate it.
Actually, I would like to move some internal classes in
No, the only technical restriction is that 2 Java modules cannot contain the same package (no matter whether they're exported). However, a guideline to avoid running into that restriction, is to use the package name of the top-most package as the module name (and to avoid module names that are a prefix of another module name, except for aggregate modules). So if we want to follow that guideline, there ought to be at least 2 modules, W.r.t. modularization, I believe the library should be geared towards the users that "simply" want to parse HTML5 via one of the 3 APIs. So another reason why I'd like 2 separate Java modules, is that allows to effectively hide the
What makes you say it's disappointing?
Yes, that'll still be possible, no matter how the modularization is done. (I'm curious why you want Eclipse to build stuff though, rather than relying on Maven or Gradle. If you open a Maven project with Eclipse (i.e. no
Exactly.
Yes. Either Finally, I'd like to reiterate that, in my opinion, Q1 in my OP is the essential question here. I.e. we shouldn't be concerned about project organisation in terms of Maven modules and/or Git repos at this point: no matter what the answer to Q1 is, it's possible to organise the project however you want (though some combinations wouldn't make sense, of course. For example if we'd go with a single Java module, it wouldn't make sense to use multiple Maven modules or Git repos). |
What problem would this solve?
Mainly having to replace one jar with many when upgrading and the resulting proliferation of jar. Maybe that's not a real problem.
It's disappointing if the existing jar can't be used as-is. Considering that I'd like to get rid of (That is, porting Is there really no backward compatibility mechanism that would allow the existing
I'm leaning towards three Java Modules:
However, this is based on a pre-Modules view of the world where DOM and SAX are assumed to be always present, so perhaps it would make sense to treat
Considering that Since you've used Modules and I haven't, do you see any benefit from decoupling the core of the parser from Regardless of the module division, I'd prefer all these to stay in this git repo. (As noted, we used to have more repos for the validator as a whole and moved towards having fewer.) What that means for Maven, I don't know. |
My point of departure is: only export each of the By only exporting a minimal API:
While I understand
W.r.t. project organization, I propose the following: a single Git repo, containing a single multi-module Maven project, containing a number of Maven modules, each defining a single Java module. For the Java modules, I'd start with your first division +
Some notes:
I don't see the problem: when I update a dependency declaration in the POM, I hardly ever notice that the new version has introduced additional transitive dependencies.
Yes, one can use any JAR as-is, and the JVM will deduce a module name from the JAR's filename. However, this is rather fragile and precludes some usages of the modules that require it (e.g. I'm pretty sure that it's impossible to use such modules with
Maybe I'm misunderstanding your question, but I doubt it's possible to decouple any piece of the parser from |
The clutter in Javadoc doesn't look too bad. Not sure about how much the internals show up in IDE autocomplete in practice for people who don't work on the internals.
AFAICT, in the 13-year lifespan of this project, the one API-breaking change (as opposed to parser behavior correctness change) is the removal of support for the HTML4 mode, which would not have become non-breaking had the change you propose been made ahead of time.
import nu.validator.htmlparser.common.CharacterHandler;
import nu.validator.htmlparser.common.DocumentModeHandler;
import nu.validator.htmlparser.common.Heuristics;
import nu.validator.htmlparser.common.TokenHandler;
import nu.validator.htmlparser.common.TransitionHandler;
import nu.validator.htmlparser.common.XmlViolationPolicy;
import nu.validator.htmlparser.impl.ErrorReportingTokenizer;
import nu.validator.htmlparser.impl.Tokenizer;
import nu.validator.htmlparser.io.Driver;
import nu.validator.htmlparser.common.DocumentMode;
import nu.validator.htmlparser.impl.CoalescingTreeBuilder;
import nu.validator.htmlparser.impl.HtmlAttributes; It is the design intent of the parser that a third party is allowed to write the kind of wrapper that
Good point. I had forgotten about that. Additonal observation: Enabling normalization checking depends on ICU4J. @carlosame, @sideshowbarker, do you see value in the |
And BTW exporting |
I don’t personally see value in it. But I’m not a domain expert around the fine points of packaging, and I don’t feel strongly enough about it that I’d object to it. That said, I have a general preference for keeping things as simple as possible. |
Fair enough, so I think we can agree on the following:
And have one open question: what with |
OK.
There's also another question whose answer might inform the answer to that question: What to do with the ICU4J normalization checking dependency. https://github.com/unicode-org/icu/blob/master/icu4j/manifest.stub says In any case, I think @carlosame, can you corroborate the incompatibility of the existing |
Both If you want to make this project more friendly to But in any case, if you ship
I'd rather have And needless to say that I see no point in having a separate |
Correct.
Ok, let's continue using @carlosame are you planning to provide a PR for this? (just asking to avoid duplicate efforts) |
I already provided a modularization patch (without Maven modules) in July 5 (PR #23) and it hasn't received a lot of attention. Switching that PR to a modular Maven project (with Maven modules involve a different directory layout (with a directory for each Maven module), so the files should be moved in a way that Git history is not completely lost ( Responding specifically to your question, I have no plans to provide a PR with a |
@hsivonen: as the de facto owner of this repository, I'd like to urge you to decide on the way forward, because I feel things are getting out of hand: @carlosame and I have different views on how to proceed, and we're both investing quite a bit of our spare time in implementing/defending our point of view. I believe that it's in everyone's interest for a final decision to be taken sooner rather than later. (I understand if modularization is not a priority for neither you nor your employer, but I believe it's not unreasonable to ask some of your time to settle this debate.)
1. Java modularization
There are plenty of options, starting from the most fine-grained modularization:
htmlparser.dom
->htmlparser.common
htmlparser.sax
->htmlparser.common
,saxtree
htmlparser.xom
->htmlparser.common
htmlparser
+saxtree
htmlparser
htmlparser
+htmlparser.xom
htmlparser
+htmlparser.xom
+saxtree
htmlparser.jaxp
+htmlparser.xom
+htmlparser.common
+saxtree
Q1: which combination makes most sense to you?
(edit: please note that this question is about modules as a generic software architectural concept, not about Java modules specifically. So any Java technicalities can be disregarded when answering this question. I've merely named it "Java modularization" in the title to clarify that it's about the conceptual modularization of the code, not about how the code is organized in Maven modules or Git submodules or anything like that)
2. Project organization
Assuming there's at least 2 modules in the chose combination, there's multiple options on how to organize them:
(since @carlosame claimed that the first option would "complicate the workflow [...] and expose the project to a new category of Maven bugs": I claim the exact opposite, so please disregard any Maven-related concerns)
Q2: which option works best for you & @sideshowbarker?
3. upcoming releases
Edit: given that we agree that backward compatibility will be broken as needed to implement the decisions for 1. and 2., I'm fine with a 1.5 release as it is (except for the use of version ranges, but I'll argument that in the PR).
I'd like to propose that, no matter the answer to Q1 and Q2, version 1.5 is released with minimal changes, i.e. only changes that are required to resolve #17. In particular: no automatic module name, no changes to the dependency versions, ... Once 1.5 is published, we can then implement 1. and 2. as decided for a 2.0 release.Q3: do you agree with this proposal?Thanks in advance for reading & answering the
32 questions above, and thereby bringing back peace and quiet to this repository.Edit: the comment below demonstrates exactly why I'm saying it's in everyone's interest for a final decision to be taken sooner rather than later: there's a lot of frustration on both sides, and both sides feel the other is counterworking them. That's why I'm asking you to take an authoritative decision on this matter, so we can both accept your decision and move forward on implementing it.
The text was updated successfully, but these errors were encountered: