Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write path normalization without array allocations #60812

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 106 additions & 5 deletions src/compiler/path.ts
Original file line number Diff line number Diff line change
Expand Up @@ -624,12 +624,115 @@ export function getNormalizedPathComponents(path: string, currentDirectory: stri
}

/** @internal */
export function getNormalizedAbsolutePath(fileName: string, currentDirectory: string | undefined): string {
return getPathFromPathComponents(getNormalizedPathComponents(fileName, currentDirectory));
export function getNormalizedAbsolutePath(path: string, currentDirectory: string | undefined): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if currentDirectory is almost always normalized.

let rootLength = getRootLength(path);
if (rootLength === 0 && currentDirectory) {
path = combinePaths(currentDirectory, path);
rootLength = getRootLength(path);
}
const simple = simpleNormalizePath(path);
if (simple !== undefined) {
return simple;
}
const root = path.substring(0, rootLength);
const normalizedRoot = root && normalizeSlashes(root);
// `normalized` is only initialized once `path` is determined to be non-normalized
let normalized = normalizedRoot === root ? undefined : normalizedRoot;
Comment on lines +637 to +640
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just use lastIndexOf to see if altDirectorySeparator occurs. That way you can avoid creating a substring for root at all.

let index = rootLength;
let segmentStart = index;
let normalizedUpTo = index;
let seenNonDotDotSegment = rootLength !== 0;
while (index < path.length) {
Copy link
Member

@rbuckton rbuckton Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since path.length doesn't change, it may be more efficient to store it in a local and reuse that rather than look it up each time (both here and elsewhere in the function).

// At beginning of segment
segmentStart = index;
let ch = path.charCodeAt(index);
while (isAnyDirectorySeparator(ch) && index + 1 < path.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to see why this works, but it's because index starts immediately after the root, or after the first separator following the prior segment. Figuring out if one of those is a backslash is handled below.

index++;
ch = path.charCodeAt(index);
}
if (index > segmentStart) {
if (normalized === undefined) {
// Seen superfluous separator
normalized = path.substring(0, segmentStart - 1);
}
Comment on lines +654 to +657
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (normalized === undefined) {
// Seen superfluous separator
normalized = path.substring(0, segmentStart - 1);
}
// Seen superfluous separator
normalized ??= path.substring(0, segmentStart - 1);

segmentStart = index;
}
// Past any superfluous separators
const sepIndex = path.indexOf(directorySeparator, index + 1);
const altSepIndex = path.indexOf(altDirectorySeparator, index + 1);
Comment on lines +661 to +662
Copy link
Member

@DanielRosenwasser DanielRosenwasser Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quadratic because you have to find at least one of the other kinds of slashes over and over. You should just tight-loop on !isAnyDirectorySeparator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use findIndex(path, isAnyDirectorySeparator, index + 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with findIndex is that it might not optimize well it passed in both arrays and strings. Maybe I'm superstitious.

let segmentEnd = sepIndex === -1 ? altSepIndex : altSepIndex === -1 ? sepIndex : Math.min(sepIndex, altSepIndex);
if (segmentEnd === -1) {
segmentEnd = path.length;
}
if (segmentEnd === altSepIndex && normalized === undefined) {
// Seen backslash
normalized = path.substring(0, segmentStart);
}
Comment on lines +667 to +670
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (segmentEnd === altSepIndex && normalized === undefined) {
// Seen backslash
normalized = path.substring(0, segmentStart);
}
if (segmentEnd === altSepIndex) {
// Seen backslash
normalized ??= path.substring(0, segmentStart);
}

const segmentLength = segmentEnd - segmentStart;
if (segmentLength === 1 && path.charCodeAt(index) === CharacterCodes.dot) {
// "." segment (skip)
if (normalized === undefined) {
normalized = path.substring(0, normalizedUpTo);
}
Comment on lines +674 to +676
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (normalized === undefined) {
normalized = path.substring(0, normalizedUpTo);
}
normalized ??= path.substring(0, normalizedUpTo);

}
else if (segmentLength === 2 && path.charCodeAt(index) === CharacterCodes.dot && path.charCodeAt(index + 1) === CharacterCodes.dot) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: If the previous segmentLength was 1, but the segment was not ., this performs an unnecessary recheck of segmentLength === 2.

// ".." segment
if (!seenNonDotDotSegment) {
if (normalized !== undefined) {
normalized += normalized.length === rootLength ? ".." : "/..";
}
else {
normalizedUpTo = index + 2;
}
}
else if (normalized === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this branch it is technically possible for you to avoid extra substrings by adjusting normalizedUpTo instead of initializing normalized.

if (normalizedUpTo - 2 >= 0) {
normalized = path.substring(0, Math.max(rootLength, path.lastIndexOf(directorySeparator, normalizedUpTo - 2)));
}
else {
normalized = path.substring(0, normalizedUpTo);
}
}
else {
const lastSlash = normalized.lastIndexOf(directorySeparator);
if (lastSlash !== -1) {
normalized = normalized.substring(0, Math.max(rootLength, lastSlash));
}
else {
normalized = normalizedRoot;
}
if (normalized.length === rootLength) {
seenNonDotDotSegment = rootLength !== 0;
}
}
}
else if (normalized !== undefined) {
if (normalized.length !== rootLength) {
normalized += directorySeparator;
}
seenNonDotDotSegment = true;
normalized += path.substring(segmentStart, segmentEnd);
}
else {
seenNonDotDotSegment = true;
normalizedUpTo = segmentEnd;
}
index = segmentEnd + 1;
}
return normalized ?? (path.length > rootLength ? removeTrailingDirectorySeparator(path) : path);
}

/** @internal */
export function normalizePath(path: string): string {
let normalized = simpleNormalizePath(path);
if (normalized !== undefined) {
return normalized;
}
normalized = getNormalizedAbsolutePath(path, "");
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
Comment on lines +727 to +732
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to hasTrailingDirectorySeparator is redundant as that check is also performed in ensureTrailingDirectorySeparator. This could be simplified to:

Suggested change
let normalized = simpleNormalizePath(path);
if (normalized !== undefined) {
return normalized;
}
normalized = getNormalizedAbsolutePath(path, "");
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
return simpleNormalizePath(path) ??
ensureTrailingDirectorySeparator(getNormalizedAbsolutePath(path, ""));

}

function simpleNormalizePath(path: string): string | undefined {
path = normalizeSlashes(path);
// Most paths don't require normalization
if (!relativePathSegmentRegExp.test(path)) {
Expand All @@ -643,9 +746,7 @@ export function normalizePath(path: string): string {
return path;
}
}
// Other paths require full normalization
const normalized = getPathFromPathComponents(reducePathComponents(getPathComponents(path)));
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
return undefined;
}

function getPathWithoutRoot(pathComponents: readonly string[]) {
Expand Down
15 changes: 15 additions & 0 deletions src/testRunner/unittests/paths.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,9 +317,24 @@ describe("unittests:: core paths", () => {
assert.strictEqual(ts.getNormalizedAbsolutePath("", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath(".", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./a", ""), "a");
// Strangely, these do not normalize to the empty string.
assert.strictEqual(ts.getNormalizedAbsolutePath("..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../..", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("./..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../a/..", ""), "../..");

// More .. segments
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../../bar/bar.ts", ""), "bar/bar.ts");
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../..", ""), "");
// not a real URL root!
assert.strictEqual(ts.getNormalizedAbsolutePath("file:/Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "../typings/@epic/Core.d.ts");
// the root is `file://Users/`
assert.strictEqual(ts.getNormalizedAbsolutePath("file://Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file://Users/typings/@epic/Core.d.ts");
// this is real
assert.strictEqual(ts.getNormalizedAbsolutePath("file:///Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file:///typings/@epic/Core.d.ts");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if we want to preserve specific examples in these, but that might be fine.

Maybe we should have tests on untitled:?


// Interaction between relative paths and currentDirectory.
assert.strictEqual(ts.getNormalizedAbsolutePath("", "/home"), "/home");
Expand Down
Loading