-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(resolution): Fix resolution of querystringed imports in ESM files #322
base: main
Are you sure you want to change the base?
fix(resolution): Fix resolution of querystringed imports in ESM files #322
Conversation
02170de
to
8a039c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See a previous PR for how to add a test: #289
- input.js is the input file that nft traces
- output.js is the expected files to be traced
- any other file can be required/imported by input.js or others
Codecov Report
@@ Coverage Diff @@
## main #322 +/- ##
==========================================
+ Coverage 81.23% 81.25% +0.01%
==========================================
Files 13 13
Lines 1503 1520 +17
Branches 553 558 +5
==========================================
+ Hits 1221 1235 +14
- Misses 115 118 +3
Partials 167 167
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Good catch, @lforst! I've made the check more robust. |
8a039c1
to
37c2a03
Compare
That's an edge case I hadn't thought of. I did a bunch of testing (details below) and here's what I found:
Or, put visually:
So... I'm not sure where that leaves me in terms of what should be done here. Which behavior should I be trying to emulate? Details of my testingTo test out the possibilities, I ran a few tests using the following files, along with a corresponding set of file1.js const file2 = require("./file2?aardvark");
console.log(file2.bear); file2.js const file3 = require("./file3");
console.log(file3.cheetah);
module.exports = { bear: "dinosaur" }; file3.js const file4 = require("./file?4");
console.log(file4.emu);
module.exports = { cheetah: "fox" }; file?4.js module.exports = { emu: "giraffe" };
|
Thanks. And yeah, I'd gotten the basic idea of
I'm also now thinking that this should be tested outside of a webpack context as well. I'll get started on that in the meantime. |
A few more questions on tests:
I'm not at all sure how to translate that into either a unit test or an integration test which will fit with your system, though. As I mentioned above, I do see the webpack wrapper tests in the unit test folder, but they seem to be testing how Thoughts? |
The purpose of nft is to simulate Node.js module resolution. The existing webpack tests are trying to reverse the wrappers that webpack adds back to the original source code the author wrote. In particular, I think this PR should ignore anything that webpack does and focus solely on Node.js module resolution behavior. |
37c2a03
to
ad88220
Compare
Right, okay. I think I've now got it so that it does. If we're dealing with ESM, we strip the querystring (if any) in
Yeah, that's fair. I think I can probably (?) mock the relevant thing it does, which is distinguish between modules if the querystring is different. The other tests should hopefully be more straightforward. UPDATE: Done. |
10490b4
to
cc7e76d
Compare
UPDATE: Tests are now pushed and I've updated the PR description to reflect the results of our conversations above, so I will mark this ready for review. Please let me know if there's anything I missed! |
ab66e07
to
c83ab86
Compare
Hey, @styfle - I fixed the windows CI issues. Mind taking another look? |
|
||
// Splits an ESM specifier into path and querystring (including the leading `?`). (If the specifier is CJS, | ||
// it is passed through untouched.) | ||
export function parseSpecifier(specifier: string, cjsResolve: boolean = true): ParseSpecifierResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function parseSpecifier(specifier: string, cjsResolve: boolean = true): ParseSpecifierResult { | |
export function parseSpecifier(specifier: string, cjsResolve: boolean): ParseSpecifierResult { |
Lets not use a default param or else the caller might forget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we need the default to be there, because the first time this is called (at the top of analyzeAndEmitDependency
when it's called by emitDependency
), we don't have a cjsResolve
value. Having it default to true
works in that case, though, because that only happens at the root of the tree, when nodeFileTrace
calls emitDependency
on an actual file (not an import specifier), which we know won't have a querystring.
if (!cjsResolve) { | ||
// Regex which splits a specifier into path and querystring, inspired by that in `enhanced-resolve` | ||
// https://github.com/webpack/enhanced-resolve/blob/157ed9bcc381857d979e56d2f20a5a17c6362bff/lib/util/identifier.js#L8 | ||
const match = /^(#?(?:\0.|[^?#\0])*)(\?(?:\0.|[^\0])*)?$/.exec(specifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks slightly different from the webpack regex.
Can we instead use url.parse()
or new URL()
to safely parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks slightly different from the webpack regex.
Can we instead use url.parse() or new URL() to safely parse?
In my testing, node itself seemed to treat import * as something from './someFile.js?things#stuff
, import * as something from './someFile.js?things
, and import * as something from './someFile.js
the same way, namely that it stripped the ?
and everything after it off before trying to resolve the file. I therefore wanted to do the same thing. Before landing on this regex (which is the original regex, minus treating the fragment as a separate entity - see the screenshot below), I considered:
-
Using
indexOf
on'?'
and then usingslice
to cut the string off at that point. Straightforward, but brute-force-y, and I've had brute force bite me enough times when handling edge cases I hadn't thought of that I wanted to out-source it to something purpose-built and presumably battle-tested. -
Using
url.parse
. This splits the querystring and fragment off into separate parts I'd just have to put back together again. Also, my linter got mad because it's deprecated. -
Using
new URL()
. This similarly splits the querystring and fragment off separately. It also errors out when given a path withouthttp://
(or similar) at the beginning.
None of them seemed ideal, and so I decided to see how webpack does it (since I was in the nextjs nft webpack plugin with the debugger already), figuring that the most likely context for someone having the querystring problem in the first place was them using a webpack loader with options. After a lot of stepping through code, eventually I landed in the enhanced-resolve
file I linked to in the comment. As I mentioned above, you're right that it's been changed, but only to combine the querystring and fragment into a single group.
(I gave the regex I'm using above a name the same length as the original regex's name in enhanced-resolve
so it's easier to compare.)
I can certainly make url.parse
or new URL()
work if you'd like, but given the above IMHO the regex is still the best option. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url.parse()
is not really deprecated because its the only safe way to parse a relative url since new URL()
only works for absolute urls. The docs mention the deprecation was revoked
Also, I trust node's own url parsing more than a modified regex which might suffer from ReDoS.
Now that there are tests, we should be able to swap this implementation between regex, url.parse() or new URL() easily.
// If querystring was stripped during resolution, restore it | ||
if (queryString && !item.endsWith(queryString)) { | ||
item += queryString; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the query string exist on node built-ins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as it turns out. I did some testing:
(Note that this is all for ESM imports, since we know that CJS just treats ?
like any other character, and there are no built-ins with ?
in their name.)
-
For node on its own:
- Querystrings on relative imports (
import * as file1 from './file1.js?things'
) work (they get stripped). - Querystrings on regular node modules (
import * as semver from 'semver?stuff'
) error out withError [ERR_MODULE_NOT_FOUND]: Cannot find package 'semver?stuff'
. - Querystrings on un-prefixed built-ins (
import * as fs from 'fs?morestuff'
) error out withError [ERR_MODULE_NOT_FOUND]: Cannot find package 'fs?morestuff'
. (In other words, it doesn't recognize it as a built-in.) - Querystrings on prefixed built-ins (
import * as fs from 'node:fs?morestuff'
) error out withError [ERR_UNKNOWN_BUILTIN_MODULE]: No such built-in module: node:fs?morestuff
.
- Querystrings on relative imports (
-
For a webpack build:
- Querystrings on relative imports (
import * as file1 from './file1.js?things'
) work (they get stripped). - Querystrings on regular node modules (
import * as semver from 'semver?stuff'
) work (they get stripped). - Querystrings on un-prefixed built-ins (
import * as fs from 'fs?morestuff'
) error out withModule not found: Can't resolve 'fs?morestuff'
. (In other words, it doesn't recognize it as a built-in, strips it, and then errors when it doesn't find it innode_modules
.) - Querystrings on prefixed built-ins (
import * as fs from 'node:fs?morestuff'
) error out withError [ERR_UNKNOWN_BUILTIN_MODULE]: No such built-in module: node:fs?morestuff
.
- Querystrings on relative imports (
So not something we have to worry about. Might be worth saying so in a comment, though. I'll do that.
@@ -0,0 +1,5 @@ | |||
// Test that querystrings of various forms get stripped from esm imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add package.json
to this dir with "type": "module"
, otherwise its wont be valid for Node.js
test/unit/esm-querystring/input.js
Outdated
@@ -0,0 +1,5 @@ | |||
// Test that querystrings of various forms get stripped from esm imports | |||
|
|||
import * as aardvark from "./animalFacts/aardvark?anteater"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add one more test similar to this one but with .mjs
extensions for the file names as well as import specifiers
const errorMessage = path === rawPath | ||
? 'File ' + path + ' does not exist.' | ||
: 'Neither ' + path + ' nor ' + rawPath + ' exists.' | ||
throw new Error(errorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a test missing for this error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/unit.test.js
Outdated
const { nodeFileTrace } = require('../out/node-file-trace'); | ||
|
||
global._unit = true; | ||
|
||
const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink']; | ||
const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink', "cjs-querystring"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink', "cjs-querystring"]; | |
const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink', 'cjs-querystring']; |
Oops, looks like we don't have prettier in this repo yet 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL - yes. Every time I saved VSCode would reformat the entire file. 🤦🏻♀️
There are still a bunch of places where double-quotes exist, but I've gone through and fixed everything in this PR to use single quotes, except where double quotes were necessary to avoid having to escape a '
contained in the string. LMK if you spot anywhere that I've missed.
77be27e
to
b8239c7
Compare
…call in `maybeEmitDep`
b8239c7
to
a7c4822
Compare
Hey, @styfle - I know it's a crazy time of year, would love to get this in before the year's out, if possible, as it's blocking some of our customers. I'll be AFK most of this week (early family celebrations) and then back the week after. Cheers! |
Is this still needed? #336 (comment) |
Not from our side (Sentry SDK). I'll of course leave it up to you if you still want to implement this for the sake of consistency. |
Currently, there are two problems with the way
@vercel/nft
handles ESM imports with querystrings (import * as something from './someFile?someQuerystring';
):?
as it would any other character. As a result, it fails to resolve such imports to the non-querystringed files to which they correspond. (In the above example, it looks for a file namedsomeFile?someQuerystring.js
and, as one would expect, doesn't find it.) While this is the correct behavior forrequire
calls in CJS files, it doesn't match the way that Node handles ESM imports (which is to strip the querystring off before attempting to resolve the file).resolve
method is provided, and that custom method does strip querystrings from ESM imports, the querystrings are then not put back on before the imported module is processed. This matters, for example, in cases wherereadFile
is also overridden, with a method whose output is affected by the querystring's value. (Webpack is such a system, so one place this shows up is in Next.js'sNextTraceEntryPointsPlugin
.)This fixes both of those problems by selectively stripping and re-adding the querystring in ESM imports at various points in the
nodeFileTrace
lifecycle. More specifically, inresolveDependecy
and inemitDependency
and its helpers, we strip the querystring (and/or don't add it back), with the exception of:readFile
, we keep the querystring, since some implementations ofreadFile
, including the one in the nft webpack plugin, read different modules out of memory with different querystrings).resolve
, we add the querystring back in, since this is also something which can be supplied by the user, and and the user-supplied version might care about query strings.emitDependency
, we add the querystring back in, since we need it there to be able to start the whole cycle over.Notes:
cjsResolve
to the recursive call ofemitDependency
. Rather than changeemitDependency
's signature, I opted to makeemitDependency
a wrapper for an inner function (analyzeAndEmitDependency
) with the updated signature. Recursive calls now go to the inner function.@sentry/nextjs
team are interested in this because we use a webpack loader to insert in the dependency tree a proxy module for each page, which allows us to automatically wrap users' exported functions (getServerSideProps
and friends, API route handlers, etc). To do this, we replace the code from/path/to/pages/somePage.js
with our proxy code, which includes an import from'./somePage?__sentry_wrapped__'
. When webpack then crawls our proxy module looking for dependencies, the querystring tricks it into thinking it hasn't yet dealt with that page, so it forwards it to our loader a second time rather than skipping it. When our loader sees the querystring, it knows the code replacement has already been done, and returns the original page code untouched, thereby preserving the continuity of the dependency tree. This fix ensures that@vercel/nft
successfully traces the modified tree.cjs-querystring
) required a little gymnastics in order to not break CI on Windows. It turns out that skipping the test on Windows wasn't enough - the mere fact that a file exists with a?
in its name is enough to cause Windows not to even be able to check out the commit. Therefore, the file is stored without any punctuation in its name, and before the tests run on non-Windows platforms, it is copied to a version which does have the punctuation, and which is git-ignored to enforce it not ending up in the repo in the future by accident. The dummy, no-punctuation version is stored in a folder in order to differentiate its path from that of the punctuated version by more than just the punctuation. That way, if the punctuated version is found, it's clear it's not just because the punctuation is somehow being ignored.Fixes getsentry/sentry-javascript#5998.
Fixes #319.