Allow fetching all references for a file, or all files, using public API #397

exterm · 2024-03-19T15:27:19Z

What are you trying to accomplish?

For introspection (e.g. network graphs & graphs over time) as well as advanced enforcement regimes that may be difficult to implement in a packwerk checker extension it would be great to be able to get a list of all static constant references from a file or between all files.

Extracting constants references is where packwerk's core complexity lies and we should allow its reuse directly, without having to go through the more opinionated and specific logic built on top of it.

What approach did you choose and why?

I listed a naive approach in the first commit that doesn't require any modifications to existing packwerk code. This is the approach I currently use to generate detailed statistics about a monolith that I am working on. However, it uses a slew of private APIs.

In the second commit I propose a new module that creates public API for this functionality.

What should reviewers focus on?

Are we exposing more surface area than we need to? Anything specific we wouldn't want to expose?
Can you think of a cleaner way to implement this?

Caveats

I decided to raise on parser errors so we don't have to expose the Offense class and generally have a simpler interface. I'm happy to discuss other options.
The design of the RunContext class, which stems from a time when packwerk was a rubocop cop and we had to work around some of rubocop's limitations, is complex and entangled. Thus
- it is difficult to pry apart, as you can see in the slightly awkward implementation of this feature
- it is notoriously hard to test; indeed it doesn't have unit tests
- Anything relying on RunContext is also hard to test, and so the tests included with this PR are not optimal

Type of Change

Bugfix
New feature
Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Checklist

I have updated the documentation accordingly.
I have added tests to cover my changes.
It is safe to rollback this change.

Additional Notes

This work is sponsored by https://www.onemedical.com/

exterm · 2024-03-22T15:12:55Z

test/unit/packwerk/references_from_file_test.rb

+    private
+
+    def make_fake_reference
+      package_name = Array("ilikeletters".chars.sample(5)).join


Array call put in to satisfy sorbet.

exterm · 2024-04-02T20:03:08Z

@gmcgibbon any interest in merging this?

exterm · 2024-06-24T15:22:13Z

@rafaelfranca any interest in merging this? It would make analysis of the (actual) dependency graph considerably easier.

gmcgibbon · 2024-07-12T04:36:05Z

lib/packwerk/run_context.rb

+        references_result.references.flat_map { |reference| reference_checker.call(reference) }
+    end
+
+    class FileReferencesResult < T::Struct


I believe T::Struct is slower than a normal struct. Please provide a benchmark, or switch to a plain old Struct (I think we only use Struct in other files).

There are 4 other uses of T::Struct around the codebase and 4 uses of Struct.

I did not think about performance differences and chose T::Struct because it's typed, and I appreciate the added documentation and type safety, especially within this class that is a little messy and doesn't have great tests.

A quick benchmark shows that access is about the same, while instantiation is ~5x slower for T::Struct. I was still able to instantiate 380k T::Struct within a second on my laptop.

Ah, I guess you were looking for benchmarks of the whole thing, running packwerk on a real code base. Will do. I'm curious myself.

So, on this codebase I'm currently looking at: 800k lines of Ruby according to cloc, 164 packages. Tested on a github codespace.

There is no measurable difference between uncached packwerk check runs on this branch vs on Shopify/packwerk main. Packwerk reports ~18s, real is ~20s.

I also tested with cache, and there is no measurable difference either. Packwerk reports ~3.6s, real is ~6.5s.

gmcgibbon · 2024-07-12T04:47:18Z

lib/packwerk/references_from_file.rb

+    end
+
+    sig { params(relative_file_paths: T::Array[String]).returns(T::Array[Packwerk::Reference]) }
+    def list_all(relative_file_paths: [])


So is this the main method you use? Do we need to make the others public?

I think list is still useful to get references from a specific file. list_all goes through FilesForProcessing and thus respects includes and excludes, which could be confusing if you want to analyze a specific file that may be excluded.

files is public because it makes testing easier... I'm going to see how much more ugly it'd be to directly mock FilesForProcessing.

I pushed a commit that gets rid of the public files method in favour of some more elaborate stubbing in the test. It also renames the two remaining public methods so that hopefully their purpose is clearer.

list -> list_for_file

list_all -> list_for_all

mmh, I wonder now whether it makes sense to inject a FilesForProcessing instance in the test instead.

exterm · 2024-07-15T18:24:02Z

Thank you for your review Gannon!

hacky MVP

49f57d1

exterm requested a review from a team as a code owner March 19, 2024 15:27

exterm force-pushed the references-from-file branch 3 times, most recently from 57448d9 to 99f7576 Compare March 19, 2024 16:06

proposed cleaner solution

f5fdbd3

exterm force-pushed the references-from-file branch 3 times, most recently from 24e6991 to 5f15a4b Compare March 19, 2024 18:27

add tests

028bd25

exterm force-pushed the references-from-file branch from 5f15a4b to 028bd25 Compare March 19, 2024 18:37

exterm commented Mar 22, 2024

View reviewed changes

gmcgibbon reviewed Jul 12, 2024

View reviewed changes

refactor and clarify API

e7013a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow fetching all references for a file, or all files, using public API #397

Allow fetching all references for a file, or all files, using public API #397

exterm commented Mar 19, 2024 •

edited

Loading

exterm Mar 22, 2024 •

edited

Loading

exterm commented Apr 2, 2024

exterm commented Jun 24, 2024

gmcgibbon Jul 12, 2024

exterm Jul 15, 2024 •

edited

Loading

exterm Jul 15, 2024

exterm Jul 15, 2024

exterm Jul 15, 2024 •

edited

Loading

gmcgibbon Jul 12, 2024

exterm Jul 15, 2024

exterm Jul 15, 2024 •

edited

Loading

exterm Jul 19, 2024

exterm commented Jul 15, 2024

Allow fetching all references for a file, or all files, using public API #397

Are you sure you want to change the base?

Allow fetching all references for a file, or all files, using public API #397

Conversation

exterm commented Mar 19, 2024 • edited Loading

What are you trying to accomplish?

What approach did you choose and why?

What should reviewers focus on?

Caveats

Type of Change

Checklist

Additional Notes

exterm Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

exterm commented Apr 2, 2024

exterm commented Jun 24, 2024

gmcgibbon Jul 12, 2024

Choose a reason for hiding this comment

exterm Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

exterm Jul 15, 2024

Choose a reason for hiding this comment

exterm Jul 15, 2024

Choose a reason for hiding this comment

exterm Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

gmcgibbon Jul 12, 2024

Choose a reason for hiding this comment

exterm Jul 15, 2024

Choose a reason for hiding this comment

exterm Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

exterm Jul 19, 2024

Choose a reason for hiding this comment

exterm commented Jul 15, 2024

exterm commented Mar 19, 2024 •

edited

Loading

exterm Mar 22, 2024 •

edited

Loading

exterm Jul 15, 2024 •

edited

Loading

exterm Jul 15, 2024 •

edited

Loading

exterm Jul 15, 2024 •

edited

Loading