Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fetching all references for a file, or all files, using public API #397

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

exterm
Copy link
Contributor

@exterm exterm commented Mar 19, 2024

What are you trying to accomplish?

For introspection (e.g. network graphs & graphs over time) as well as advanced enforcement regimes that may be difficult to implement in a packwerk checker extension it would be great to be able to get a list of all static constant references from a file or between all files.

Extracting constants references is where packwerk's core complexity lies and we should allow its reuse directly, without having to go through the more opinionated and specific logic built on top of it.

What approach did you choose and why?

I listed a naive approach in the first commit that doesn't require any modifications to existing packwerk code. This is the approach I currently use to generate detailed statistics about a monolith that I am working on. However, it uses a slew of private APIs.

In the second commit I propose a new module that creates public API for this functionality.

What should reviewers focus on?

  • Are we exposing more surface area than we need to? Anything specific we wouldn't want to expose?
  • Can you think of a cleaner way to implement this?

Caveats

  • I decided to raise on parser errors so we don't have to expose the Offense class and generally have a simpler interface. I'm happy to discuss other options.
  • The design of the RunContext class, which stems from a time when packwerk was a rubocop cop and we had to work around some of rubocop's limitations, is complex and entangled. Thus
    • it is difficult to pry apart, as you can see in the slightly awkward implementation of this feature
    • it is notoriously hard to test; indeed it doesn't have unit tests
    • Anything relying on RunContext is also hard to test, and so the tests included with this PR are not optimal

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • It is safe to rollback this change.

Additional Notes

This work is sponsored by https://www.onemedical.com/

@exterm exterm requested a review from a team as a code owner March 19, 2024 15:27
@exterm exterm force-pushed the references-from-file branch 3 times, most recently from 57448d9 to 99f7576 Compare March 19, 2024 16:06
@exterm exterm force-pushed the references-from-file branch 3 times, most recently from 24e6991 to 5f15a4b Compare March 19, 2024 18:27
@exterm exterm force-pushed the references-from-file branch from 5f15a4b to 028bd25 Compare March 19, 2024 18:37
private

def make_fake_reference
package_name = Array("ilikeletters".chars.sample(5)).join
Copy link
Contributor Author

@exterm exterm Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array call put in to satisfy sorbet.

@exterm
Copy link
Contributor Author

exterm commented Apr 2, 2024

@gmcgibbon any interest in merging this?

@exterm
Copy link
Contributor Author

exterm commented Jun 24, 2024

@rafaelfranca any interest in merging this? It would make analysis of the (actual) dependency graph considerably easier.

references_result.references.flat_map { |reference| reference_checker.call(reference) }
end

class FileReferencesResult < T::Struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe T::Struct is slower than a normal struct. Please provide a benchmark, or switch to a plain old Struct (I think we only use Struct in other files).

Copy link
Contributor Author

@exterm exterm Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 other uses of T::Struct around the codebase and 4 uses of Struct.

I did not think about performance differences and chose T::Struct because it's typed, and I appreciate the added documentation and type safety, especially within this class that is a little messy and doesn't have great tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick benchmark shows that access is about the same, while instantiation is ~5x slower for T::Struct. I was still able to instantiate 380k T::Struct within a second on my laptop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess you were looking for benchmarks of the whole thing, running packwerk on a real code base. Will do. I'm curious myself.

Copy link
Contributor Author

@exterm exterm Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, on this codebase I'm currently looking at: 800k lines of Ruby according to cloc, 164 packages. Tested on a github codespace.

There is no measurable difference between uncached packwerk check runs on this branch vs on Shopify/packwerk main. Packwerk reports ~18s, real is ~20s.

I also tested with cache, and there is no measurable difference either. Packwerk reports ~3.6s, real is ~6.5s.

end

sig { params(relative_file_paths: T::Array[String]).returns(T::Array[Packwerk::Reference]) }
def list_all(relative_file_paths: [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this the main method you use? Do we need to make the others public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think list is still useful to get references from a specific file. list_all goes through FilesForProcessing and thus respects includes and excludes, which could be confusing if you want to analyze a specific file that may be excluded.

files is public because it makes testing easier... I'm going to see how much more ugly it'd be to directly mock FilesForProcessing.

Copy link
Contributor Author

@exterm exterm Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that gets rid of the public files method in favour of some more elaborate stubbing in the test. It also renames the two remaining public methods so that hopefully their purpose is clearer.

  • list -> list_for_file
  • list_all -> list_for_all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, I wonder now whether it makes sense to inject a FilesForProcessing instance in the test instead.

@exterm
Copy link
Contributor Author

exterm commented Jul 15, 2024

Thank you for your review Gannon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants