Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly - Run the keep rule on Uber jar... #488

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

shanielh
Copy link

@shanielh shanielh commented Dec 7, 2022

Instead of running them jar by jar, this makes the keep rule much more usable.

I'm willing to make changes to this PR if needed. This should fix issue #265 and specifically this comment.

@eed3si9n
Copy link
Member

eed3si9n commented Dec 8, 2022

@shanielh Thanks for the contribution. Do you think it would be possible to demonstrate the corner case as a test?
Shading tests tends to be expressed as scripted test in this directory - https://github.com/sbt/sbt-assembly/tree/develop/src/sbt-test/shading

@shanielh
Copy link
Author

shanielh commented Dec 9, 2022

Yes, I will try to do it

@shanielh shanielh force-pushed the feature/keep-rule-on-whole-jar branch from 3ebd08e to 216b65a Compare December 11, 2022 06:29
@shanielh
Copy link
Author

Hi @eed3si9n , I've added test and modified the PR a bit because I didn't want to break compatibility with older tests and users of this plugin.

Note that the PR isn't working, the main code is in https://github.com/sbt/sbt-assembly/pull/488/files#diff-dcd028f99421b60751076b6597301090dcc7656497954ee32bf3e60b029258b0R575 and I'm not sure why, but I'm not getting items to exclude, I've noticed you're the maintainer of jarjar-abrams too, so you might have some helpful insights :)

@eed3si9n
Copy link
Member

I've noticed you're the maintainer of jarjar-abrams too, so you might have some helpful insights :)

In theory I am, but note that my contribution to the project is coming up with the name and making the repo, and I don't really know much about jarjar or shading. Having said that, if there's an issue with keep rule, I wonder if the layer that we should fix is at jarjar-abrams level instead of adding a new setting.

Also sbt-assembly has been using jarjar (Jar Jar Links), which was originally designed for Java to handle shading, but call graph walking in Scala is not completely trivial, so I'm not surprised that "keep" in general never correctly functioned in real-life applications.

@shanielh
Copy link
Author

I've noticed you're the maintainer of jarjar-abrams too, so you might have some helpful insights :)

In theory I am, but note that my contribution to the project is coming up with the name and making the repo, and I don't really know much about jarjar or shading. Having said that, if there's an issue with keep rule, I wonder if the layer that we should fix is at jarjar-abrams level instead of adding a new setting.

Also sbt-assembly has been using jarjar (Jar Jar Links), which was originally designed for Java to handle shading, but call graph walking in Scala is not completely trivial, so I'm not surprised that "keep" in general never correctly functioned in real-life applications.

So I tried to see what my class compiles into in the test, seems like the scala-compiler / java-compiler inlined the reference to a static final field:

public class keep.Keeped {
  public void main(java.lang.String[]);
    Code:
       0: ldc           #13                 // String GMT
       2: astore_2
       3: return

  public keep.Keeped();
    Code:
       0: aload_0
       1: invokespecial #22                 // Method java/lang/Object."<init>":()V
       4: return
}

My bad, I'll try to create an instance of a class 😄

@shanielh
Copy link
Author

shanielh commented Dec 11, 2022

After debugging a bit, I got to an impression that the KeepProcessor doesn't do what I would like it to do, for instance, with the Above code, this is the roots and the depends:

[info] [info] kp.roots:Buffer(keep/Keeped)
[info] [info] kp.depends:Map(keep/Keeped -> Set(Bar, org/apache/commons/lang3/tuple/ImmutablePair, Foo, scala/Predef$, scala/reflect/ScalaSignature))

Let's leave the "Bar" and "Foo" strings out of the equation because I'm not sure how the class visitor works and how it figures references to other classes.

The code for getExcludes():

    public Set<String> getExcludes() {
        Set<String> closure = new HashSet<String>();
        closureHelper(closure, roots);
        Set<String> removable = new HashSet<String>(depend.keySet());
        removable.removeAll(closure);
        return removable;
    }

First, it would be much easier to change it to getIncludes, because what we would like to include is the roots and their (recursive) dependencies, the process method should be also changed to visit all classes, because you might walk over a dependency before walking over a root - that way we would have a map of all classes and their dependencies, and getting the recursive dependencies should be easy once you've reached a root - after doing that, we can also keep the method signature for getExcludes.

Would you like me to open a PR for jarjar-abrams? If you do, please let me know what changes you want me to do

UPDATE Opened a PR for jarjar-abrams eed3si9n/jarjar-abrams#28

@shanielh
Copy link
Author

shanielh commented Dec 18, 2022

Hi @eed3si9n , can you review this PR too? Thanks! 😄

Note that I had to publishLocal the jarjar-abrams library in order to test it, so you might want to publish jarjar-abrams with latest changes before merging this.

This allows users of this lib/plugin to keep only files from libraries
they use in their project without specifying them in zap
@shanielh shanielh force-pushed the feature/keep-rule-on-whole-jar branch from 049380a to 6692a5e Compare December 18, 2022 07:03
@eed3si9n
Copy link
Member

eed3si9n commented Jan 3, 2023

Sorry about the late reply, and thanks for the test.
In general, if we need to do anything at sbt-assembly level, doesn't that mean that there's something wrong with jarjar-abrams?

@shanielh
Copy link
Author

shanielh commented Jan 4, 2023

Sorry about the late reply, and thanks for the test.
In general, if we need to do anything at sbt-assembly level, doesn't that mean that there's something wrong with jarjar-abrams?

sbt-assembly uses jarjar on each dependency-jar separately, The only reason I would think that I would like to add a Keep rule to a specific class/es in a dependency jar, is that I would like to enforce usage of a specific package/sub-system inside that dependency. (e.g. add ShadeRule.keep(..).inLibrary()), I don't think I see a good reason for people to use ShadeRule.keep(..).inAll because that would run the same keep-rule on all dependencies and the src-code, so maybe I would remove that in future version (I don't have a way to know if people uses that and for what reason).

I think that using a keep rule on the final jar is much easier and less error prone, and that's the new option I've added to sbt-assembly.

@shanielh
Copy link
Author

shanielh commented Feb 7, 2023

@eed3si9n Any chance you're going to review this?

@eed3si9n
Copy link
Member

eed3si9n commented Feb 7, 2023

Yea. it's on my todo.

Copy link
Member

@eed3si9n eed3si9n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I think shading with sbt-assembly is used for several use cases.

  1. closed-world app use case where the final assembly JAR contains main() and everything needed to run the app.
  2. closed-world job use cases where the final assembly JAR contains a data processing job, which eventually runs as part of some other main() framework like Hadoop or Spark, but otherwise contains most dependencies to run the job, and sbt-assembly is used to shade Guava, protobuf etc.
  3. open-world library use case where a library X depends on Y, and sbt-assembly is used to create X' which hides Y as an internal dependency with shading.

Ultimately, if we reasonably support the above scenarios we can make whatever changes that kind of make sense, assuming that we do not break the existing build.sbt too much. Based on your analysis, to me it makes more sense to just not run any keep rules at the individual JAR level, and run once one the aggregated JAR if that's possible.

I don't think I see a good reason for people to use ShadeRule.keep(..).inAll because that would run the same keep-rule on all dependencies and the src-code, so maybe I would remove that in future version (I don't have a way to know if people uses that and for what reason).

Given that the keep rule is a dependency walker (I am not confident that this actually works correctly for Scala), I would say that the only use case that makes sense for keep is to run it inAll.

Per https://code.google.com/archive/p/jarjar/wikis/CommandLineDocs.wiki

The keep rule marks all matched classes as "roots". If any keep rules are defined all classes which are not reachable from the roots via dependency analysis are discarded when writing the output jar. This is the last step in the process, after renaming and zapping.

So I think you're right that running rules at individual JAR level is not appropriate for this, but I think we can probably filter all keep rules out of the existing assemblyShadeRules. For example, if user had:

ThisBuild / assemblyShadeRules := Seq(
  ShadeRule.keep("com.example.**").inAll
)

I think the intent is not just keep com.example package, but any code reachable from com.example.

@shanielh
Copy link
Author

@eed3si9n - So, what you would like me to do in order to merge this? To allow only "inAll" in Keep rule and remove the new definition? This will break compilation compatibility (ShadeRule.keep("...").inLibrary / .inProject will not compile), but it's much more logical.

@eed3si9n
Copy link
Member

@eed3si9n - So, what you would like me to do in order to merge this? To allow only "inAll" in Keep rule and remove the new definition? This will break compilation compatibility (ShadeRule.keep("...").inLibrary / .inProject will not compile), but it's much more logical.

Yea. We can throw exception when we find bad shade rules with friendly error message explaining what's going on, and run the actual keep rule for the assembled JAR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants