Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split on preserved comments before running minification, fixes #222 #320

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

davidscotson
Copy link

Okay, I'll admit up front that I'm not actually sure exactly what the problem was that caused the issue described in #222, but this first commit seems to fix it for me an an effected server and I have a reasonable intuition of why that makes sense.

I've included @timhunts regex fix for #333 in this, since that regex is pretty key to this and seems to be more reliably correct with his update. So this also has the better output (no longer accidentally skipping some minification opportunities) of that PR.

I think this change should help in some rare, extreme cases where people are minifying giant files, but also shouldn't have any bad effect on more normal usage.

In the test file for #222, and I'd guess many other times when you end up with >1MB javascript files, which is the size where it seems to start mattering, it will be because you have pre-concatenated the seperate files, and many of those will have comments they wish preserved.

Previously, preserved comments where extracted from the file and replaced with a counter, and then injected back into the same places later on in the process. To avoid the memory paging/swapping behaviour that I believe contributes to #222 I wanted to split the large file up and process each one individually. Since the minifications shouldn't cross the boundary of these extracted comments, they seemed like a good place to split. And since they're just a special kind of CSS comment, they could also be added manually to break up large files. (I did look at more brutally chopping up the input into byte chunks to see if there was a sweet spot in size but wasn't as confident that wouldn't cause issues and as long as the chunks remain below about 850K you seem to get the benefit)

Side note: the library itself already supports passing individual files in one at a time and minifying them in one go, which is as fast as this approach and a good workaround if you control the input going into the library, in some cases it might be easier or make more sense just to pass in the individual files one at a time in the first place rather than concatenate them just for the minification process to split them up again and then recombine them. Might be worth noting that in the documentation somewhere so people can avoid hitting this issue.

As far as I can tell, when you pass the large file as a single piece then the repeated substr (and possibly the regex, though I think the C implementation of those may be so optimised that it doesn't matter) on these long strings creates a lot of memory churn and this can slow things down as files get bigger. There's no memory leak as such, it just burns through a lot of quickly disposible strings. This causes slowness on most servers as the file size increases but can be a showstopper on weak/overloaded servers (in the latter case adding a degree of unpredictably too).

This patch tries to mimick that behaviour when the input is out of your control. Rather than extracting the preserved comments at the same time as the other search and replaces, it does it first with a preg_split and then deals with the file as the resulting chunks of code / comment / code / comment / code / comment, minifying the code ones before stitching them back together.

I'm hoping that keeping the output of the minification process as arrays of non-modified strings as much as possible will help further with the memory usage / speed, though I've kept that to a seperate commit and it's hard to see/measure much benefit of that seperate from the other change.

While running this with some diagnostic output (from Tim Hunt's patch for Moodle) I noticed that two very similar regexes were getting run the same number of times after each other. I believe the first one is redundant if you're also running the second one, that's the 3rd commit.

David Scotson and others added 3 commits March 22, 2020 14:18
uses Tim Hunt's regex fix for matthiasmullie#333, but moves it so it's done
in a different place to split on, rather than remove, special comments
It's not clear if both of these are necessary, the second one should
just match everything the first one matches plus things within internal
lines, so it seems redundant.

The unit test commited with it, plus all the others still pass, and
there's no diff in the output on a long test file.
@davidscotson
Copy link
Author

More full disclosure on this, my investigation leads me to think that the workaround (mentioned above) of passing the files to the library individually will work for my use cases, so I'm not going to be trying to put this into production, though I think it's generally a helpful improvement if anyone else wants to take it on.

@matthiasmullie matthiasmullie force-pushed the master branch 5 times, most recently from 85ee174 to 227f190 Compare December 27, 2020 21:43
@matthiasmullie matthiasmullie force-pushed the master branch 3 times, most recently from 949e01d to 51d1aa4 Compare November 18, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant