-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CompressionStream supported in all browsers #1
Comments
Oh, yeah I did see that, super awesome! The only thing is that I do wish there were async functions to go along with the stream-based parts of the API too. Then it would be simpler to use with Sorry for the rant, guess I'm just not sure how I should manage complete feature parity, while also using modern APIs. After thinking about it, I guess it's no different than having to make your own file picker function wrapper around things like the Oh yeah, and that TypeScript doesn't have support for it yet either. Have you heard any news as to when this API will become part of the standard library type definitions? Sorry this is so long, didn't mean for that. Guess I've been unsure with how to properly set this all up here, at least what the 'best' way to do it should be. I'm curious if there are any performance differences between the two methods I've tried out here? // My first version
export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";
export interface CompressionOptions {
format?: CompressionFormat;
}
/**
* Transforms a Uint8Array through a specific compression format. If a format is not provided, the gzip format will be used.
*/
export async function compress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
const stream = new CompressionStream(format);
const readable = new Blob([data]).stream().pipeThrough(stream);
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
/**
* Transforms a Uint8Array through a specific decompression format. If a format is not provided, the gzip format will be used.
*/
export async function decompress(data: Uint8Array, { format = "gzip" }: CompressionOptions = {}){
const stream = new DecompressionStream(format);
const readable = new Blob([data]).stream().pipeThrough(stream);
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
} // My second version
// I think I changed my mind on 'gzip' being the default compression format.
export type CompressionFormat = "gzip" | "deflate" | "deflate-raw";
export interface CompressionOptions {
format: CompressionFormat;
}
/**
* Compresses a Uint8Array using a specific compression format.
*/
export async function compress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
const readable = body!.pipeThrough(new CompressionStream(format));
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
}
/**
* Decompresses a Uint8Array using a specific decompression format.
*/
export async function decompress(data: Uint8Array | ArrayBufferLike, { format }: CompressionOptions): Promise<Uint8Array> {
const { body } = new Response(data instanceof Uint8Array ? data : new Uint8Array(data));
const readable = body!.pipeThrough(new DecompressionStream(format));
const buffer = await new Response(readable).arrayBuffer();
return new Uint8Array(buffer);
} |
Yeah, this is my main concern for it. While it is supported in the most recent versions, still not quite everyone has moved up to the latest browser versions yet. CompressionStream API - Caniuse |
There is currently no straight-forward method to compress raw binary data without all the stream stuff which I guess does make sense as network stuff is the place where compression is usually used and needed, but yeah, I would wish one for myself, especially to be able to read and write .zip, .docx, ... files for a possible future rich-text editor project of mine (see also @ndesmic's article about that). Such an async function has been suggested in e.g. whatwg/compression#8; it might maybe become available in the future? As for ways to implement your own wrapper function, using Responses and .pipeThrough() seems to be inefficient; the spec suggests an example on how to do that in a more performant way using writers and readers. Regarding the TypeScript definitions, CompressionStreams will be in v5.1 (you can already try it out with its beta version), in the meantime you can use my new-javascript type definitions where they're also included. |
Ok, I decided to make a comparison for these two, and it came up with some interesting results! You can save this locally as // clear; npx tsx ./Compression-Streams-Speed.mts
const DATA = new Uint8Array(Array.from({ length: 0x1000 },() => Math.floor(Math.random() * 10)));
console.log(DATA,"\n");
const BLOB = new Blob([DATA]);
const TEST_REPEATS = 0x1000;
for (let i = 0; i < TEST_REPEATS; i++){
const COMPRESSED_DATA =
await timer(`#${i} ArrayBuffer Compress `,async () => compressArrayBuffer(DATA,"deflate"));
await timer(`#${i} ArrayBuffer Decompress`,async () => decompressArrayBuffer(COMPRESSED_DATA,"deflate"));
const COMPRESSED_BLOB =
await timer(`#${i} Blob Compress `,async () => compressBlob(BLOB,"deflate"));
await timer(`#${i} Blob Decompress `,async () => decompressBlob(COMPRESSED_BLOB,"deflate"));
console.log();
}
async function timer<T>(label: string, callback: () => Promise<T>){
console.time(label);
const result: T = await callback();
console.timeEnd(label);
return result;
}
// 8.2. Deflate-compress an ArrayBuffer to a Uint8Array
async function compressArrayBuffer(input: BufferSource, format: CompressionFormat){
const cs = new CompressionStream(format);
const writer = cs.writable.getWriter();
writer.write(input);
writer.close();
const output: Uint8Array[] = [];
const reader = cs.readable.getReader();
let totalSize = 0;
while (true){
const { done, value } = await reader.read();
if (done) break;
output.push(value);
totalSize += value.byteLength;
}
const concatenated = new Uint8Array(totalSize);
let offset = 0;
for (const array of output){
concatenated.set(array,offset);
offset += array.byteLength;
}
return concatenated;
}
// Demo: decompress ArrayBuffer
async function decompressArrayBuffer(input: BufferSource, format: CompressionFormat){
const ds = new DecompressionStream(format);
const writer = ds.writable.getWriter();
writer.write(input);
writer.close();
const output: Uint8Array[] = [];
const reader = ds.readable.getReader();
let totalSize = 0;
while (true){
const { done, value } = await reader.read();
if (done) break;
output.push(value);
totalSize += value.byteLength;
}
const concatenated = new Uint8Array(totalSize);
let offset = 0;
for (const array of output){
concatenated.set(array,offset);
offset += array.byteLength;
}
return concatenated;
}
// 8.3. Gzip-decompress a Blob to Blob
async function decompressBlob(blob: Blob, format: CompressionFormat){
const ds = new DecompressionStream(format);
const decompressionStream = blob.stream().pipeThrough(ds);
return new Response(decompressionStream).blob();
}
// Demo: compress Blob
async function compressBlob(blob: Blob, format: CompressionFormat){
const cs = new CompressionStream(format);
const compressionStream = blob.stream().pipeThrough(cs);
return new Response(compressionStream).blob();
}
declare global {
type CompressionFormat = "deflate" | "deflate-raw" | "gzip";
class CompressionStream extends TransformStream<BufferSource,Uint8Array> {
constructor(format: CompressionFormat);
}
class DecompressionStream extends TransformStream<BufferSource,Uint8Array> {
constructor(format: CompressionFormat);
}
}
export {}; |
Thanks for letting me know about the performance differences, I had no idea that was the case! I will look into using the |
Did more investigating into multiple ways of using the Compression Streams API, and I realized that this is a really nice setup to have, possibly for a standalone module outside of NBTify. This provides a more cross-platform zlib API which would be similar to using `node:zlib`, but available both in the browser, and Node! It's build off of the Compression Streams API too, so it's only an API wrapper, and possibly a polyfill where necessary. I'm going to try settings things up first here, then move them to a dedicated project which both handles the compression polyfills, and provides these more functional-based APIs, which both provide more performance than my previous streams-specific `Blob` handling that I had before, and are simpler to use directly with `ArrayBuffer` and the various `TypedArray` objects, rather than having to go back and forth between `Blob` containers, which does appear to be a little less performant with the simple test that I set up. Oh my, that last sentence was horrible, I'm sorry. I guess I'm more of a dev than I am a writer. Offroaders123/Dovetail#1 whatwg/compression#8 https://wicg.github.io/compression/#example-deflate-compress https://jakearchibald.com/2017/async-iterators-and-generators/#making-streams-iterate
Brought these ideas over from my NBTify demos. Not sure what the callback inside of the polyfill was for, but turns out it was necessary to be able to run! Re-added that back in to fix it. Offroaders123/Dovetail#1
Did more investigating into multiple ways of using the Compression Streams API, and I realized that this is a really nice setup to have, possibly for a standalone module outside of NBTify. This provides a more cross-platform zlib API which would be similar to using `node:zlib`, but available both in the browser, and Node! It's build off of the Compression Streams API too, so it's only an API wrapper, and possibly a polyfill where necessary. I'm going to try settings things up first here, then move them to a dedicated project which both handles the compression polyfills, and provides these more functional-based APIs, which both provide more performance than my previous streams-specific `Blob` handling that I had before, and are simpler to use directly with `ArrayBuffer` and the various `TypedArray` objects, rather than having to go back and forth between `Blob` containers, which does appear to be a little less performant with the simple test that I set up. Oh my, that last sentence was horrible, I'm sorry. I guess I'm more of a dev than I am a writer. Offroaders123/Dovetail#1 whatwg/compression#8 https://wicg.github.io/compression/#example-deflate-compress https://jakearchibald.com/2017/async-iterators-and-generators/#making-streams-iterate
TypeScript added type definitions for the Compression Streams API, yay!!! As of when committing this, VSCode's TypeScript version doesn't updated to support this yet, so you may have to specify the editor lib version using a config file, which can be set using the Command Palette to change the TypeScript version to use that of the workspace. Otherwise, placing this in `./.vscode/settings.json` will do the same thing. ```json { "typescript.tsdk": "node_modules/typescript/lib" } ``` Offroaders123/Dovetail#1
Fully added support for determining the compression format used by the file, which will now work for all of the formats: `gzip`, `deflate`, and `deflate-raw`! This might not sound new, because the library has supported explicitly opening all three formats for the last month or so. Now it will be able to automatically open files that are compressed with `deflate-raw`, you don't have to explicitly mention if it is `deflate-raw` or not, to open it successfully. Been working on adding support for the this the last 4 days or so, and I hit multiple roadblocks along the way. So firstly, I realized that there isn't as easy of a way to detect a `deflate-raw`-encoded data using the same header technique that I use for the other two formats, because the raw data simply doesn't have a header that you can check against. There might be a way to write some fancy logic to detect it, but I realized that it might be better in terms of being more explicit/reliable to simply just try parsing the data itself, and seeing what happens. While reading across multiple articles online about other people trying to do the same thing, many of the main responses said that just trying to parse the data will be the most efficient, because it will fail fast once something in the data isn't correct. After thinking about that for a bit, coming back the last few days to try this was the next step. Then I started running into errors while having everything set up. It was seemingly random, in the fact that wrapping the new code in try-catch blocks wouldn't catch where any of the errors were. While debugging it for about a full day or two, my head was starting to spin and I was getting a bit frustrated. I tried breaking the code out into it's own file to try and debug where the error was, and it started working magically. Then I realized it must be something with the Node `deflate-raw` polyfill I added recently, so I added that in to see if that would break the working code (it didn't). I was really happy that it was working, but I was even moreso concerned that the orignal code wasn't working, having learned everything else that *was* indeed working. Then I had a last attempt idea, maybe it was the `pipeThroughCompressionStream()` function? That wouldn't make any sense. But, I'll try adding that to my standalone code, see what happens. ```ts const data = Uint8Array.from({ length: 25 },() => 25); console.log(data); const compressed = await compress(data,"deflate-raw"); console.log(compressed); const unzipped = await unzip(compressed); console.log(unzipped); async function unzip(data: Uint8Array): Promise<Uint8Array> { try { return await decompress(data,"gzip"); } catch { try { return await decompress(data,"deflate"); } catch { try { return await decompress(data,"deflate-raw"); } catch { throw new Error("Could not unzip the buffer data"); } } } } async function compress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> { try { const compressionStream = new CompressionStream(format); return pipeThroughCompressionStream(data,compressionStream); } catch (error){ if (format !== "deflate-raw") throw error; // @ts-expect-error const { deflateRawSync } = await import("node:zlib"); return new Uint8Array(deflateRawSync(data)); } } async function decompress(data: Uint8Array, format: CompressionFormat): Promise<Uint8Array> { try { const decompressionStream = new DecompressionStream(format); return pipeThroughCompressionStream(data,decompressionStream); } catch (error){ if (format !== "deflate-raw") throw error; // @ts-expect-error const { inflateRawSync } = await import("node:zlib"); return new Uint8Array(inflateRawSync(data)); } } async function pipeThroughCompressionStream(data: Uint8Array, { readable, writable }: CompressionStream | DecompressionStream): Promise<Uint8Array> { const writer = writable.getWriter(); writer.write(data); writer.close(); const chunks: Uint8Array[] = []; let byteLength = 0; const generator = (Symbol.asyncIterator in readable) ? readable : readableStreamToAsyncGenerator(readable as ReadableStream<Uint8Array>); for await (const chunk of generator){ chunks.push(chunk); byteLength += chunk.byteLength; } const result = new Uint8Array(byteLength); let byteOffset = 0; for (const chunk of chunks){ result.set(chunk,byteOffset); byteOffset += chunk.byteLength; } return result; } async function* readableStreamToAsyncGenerator<T>(readable: ReadableStream<T>): AsyncGenerator<T,void,void> { const reader = readable.getReader(); try { while (true){ const { done, value } = await reader.read(); if (done) return; yield value; } } finally { reader.releaseLock(); } } ``` Bingo! The same error started happening. So for some reason, at least with Chrome/V8 (Node)'s Zlib implementations, you can't decompress a `deflate-raw` stream using my `pipeThroughCompressionStream()` chunker function. That was unexpected! My demo had worked fine just before this, when I had originally set it up with my old `pipeThroughCompressionStream()` implementation, just because I was too lazy to add all of the fancy logic for the chunker function version. Here's that one: ```ts async function pipeThroughCompressionStream(data: Uint8Array, stream: CompressionStream | DecompressionStream): Promise<Uint8Array> { const { body } = new Response(data); const readable = body!.pipeThrough(stream); const buffer = await new Response(readable).arrayBuffer(); return new Uint8Array(buffer); } ``` So yeah, that was an interesting find. And for reference, this is the error code that you get from running this in the console: ```log Error: incorrect header check at __node_internal_genericNodeError (node:internal/errors:865:15) at Zlib.zlibOnError [as onerror] (node:zlib:189:17) { errno: -3, code: 'Z_DATA_ERROR' } ``` I hopefully would like to delve into this a bit more, and figure out why this is an issue. Maybe I'm missing something inside of the `pipeThroughCompressionStream()` function, and that's all that needs help, it might not be Zlib's issue afterall (or rather, my use of it). This chunker version of the function has proven from the test we've done here has shown (Offroaders123/Dovetail#1), so ideally I'd like to use that version to decompress the data, but this one will work every time, so I'm going with this old version since it is reliably successful, albeit a little slower (not really, just in the grand scheme of things). (#11) I'm also debating as of writing this, that it makes sense to provide both ways of doing this auto-detection, since it is faster and less recursive in terms of format parsing if you check the headers of the files before trying to parse them with `x` format (you already know before trying to open it, so you can just open it with the right one to start with), and it also could just use this old non-chunked version for `deflate-raw` specifically, since it works great for the other two formats. I'm deciding not to worry about that with this initial find/addition/fix/discovery though, as having it working for all features is more important to me than having it work the absolute greatest too. I don't want to overcomplicate things just because it might be faster, I want to demo things out more to instead see if I can fix that chunking issue, and add `deflate-raw` support to it, so there would only have to be one `pipeThroughCompressionStream()` function, not two. And I also like the simplification that this headerless auto-decompression handling provides, so I'm going with that one only for the time being, because it is easier for me to understand at the moment. Once again, eventually when I can find benefits to having both, if it makes the others faster, than I will add both because it could simplify the reading process, and have less format checking to do, for the parts that are explicit in how they are formatted (the compression headers). I may also go this way with the Bedrock Level header, but that mostly depends on if I go all-in on the auto-reading try-catch behavior for compression too, since they are similar in terms of the header-checking behavior. So here's a mental summary (for you and myself): I'm going with absolute-support first, leaving performance to be picked up later. I want to ensure that everything is working up to par, then worry about possible performance gains after that. Ideally, I eventually want to have both. But in the meantime, I want to rely on features that fully work, without any inconsistencies or things that could cause spaghettic code (typo, was neat. keeping that; spaghettic - to be spaghetti-like; use: that code is very spaghettic.).
Figured out the issue! Yay, epic. Turns out that `WritableStreamDefaultWriter` methods `write()` and `close()` return `Promise`-es, so you have to `await` them to make sure that the error handling gets bubbled back up to the containing `async` function, hence why the demons were let free, and why I couldn't catch the error with a `try`/`catch` handler. I'm not quite sure yet why it was working for all other cases up until now and others, and maybe the `for await` loop helped make it work or something? eeh, kinda confusing still. Offroaders123/Dovetail#1 https://developer.mozilla.org/en-US/docs/Web/API/WritableStreamDefaultWriter/write https://developer.mozilla.org/en-US/docs/Web/API/WritableStreamDefaultWriter/close This commit only brings back the original chunking function code, and adds the missing `await` calls to `writer.write()` and `writer.close()` to make it work properly now.
Hmm, maybe it's not fixed so soon. Having both of the |
I'm curious if this has to do between the possible difference of underlying Stream implementations between Node and the browser? This caught my eye over on the
Does this also mean that it's possible that it won't always resolve either? |
Ok, this is getting more interesting. It's looking like those might not have been the issue all along? I think it is starting to shape out to looking like it's instead the changes that I made, before I brought back their original implementations in the 'Compression Headers Revert' commit for NBTify. So adding This seems to be what my tests are showing, proven by if I only update the Read module (that's where the auto-detect compression logic is), the error comes back, and this blocking behavior isn't present in either platform if I remove the This message is starting to feel less clear the more I write it, I'm just going to keep testing, then come back lol. |
Ok, looks like this is the offending code: export async function read(data,{ name, endian, compression, bedrockLevel, strict } = {}){
// Snippet of the function body; incomplete
if (compression === undefined){
try {
return await read(data,{ name, endian, compression: null, bedrockLevel, strict });
} catch (error){
try {
return await read(data,{ name, endian, compression: "gzip", bedrockLevel, strict });
} catch {
try {
return await read(data,{ name, endian, compression: "deflate", bedrockLevel, strict });
} catch {
try {
return await read(data,{ name, endian, compression: "deflate-raw", bedrockLevel, strict });
} catch {
throw error;
}
}
}
}
}
} |
It would've been ideal to track this in NBTify instead I guess, but that's ok. I thought it had to do with the chunking code originally. |
Wow, this was an interesting one! Turns out my current compression code doesn't work in the browser, discovered this while trying to update to this latest version in Dovetail. Offroaders123/Dovetail#1 Realized that I made the Bedrock Level header version for `ridiculous.nbt` a signed int, oops. I really want to find out whether the header is signed or not, my implementation currently goes through signed *and* unsigned for values, so mine isn't very structurally sound, come to think of it.
Wow, this was an interesting one! I think it may have to do with how the backing implementation of the Compression Streams API is implemented in the browser(s) vs Node. Turns out after all, the error is still partially around the `writer.write()` and `writer.close()` calls. So, in Node, awaiting both calls works, because they have a `.then()`-able return value, while in Chrome, Safari, and Firefox, they don't ever return when chained with either `.then()`, or used with `await`, hence why it would hang with what would seem like an errorless result. In Node however though, these do eventually resolve, with just a `void` return value. And interestingly enough, not sure if it has to do with the compression backing implementation, or the handling of Promise rejections and thread sleeping (I don't remember what that's called, but having to do with when the task ends, hiding any errors which haven't rejected by the time the task is completed) (I think it is that second one, I'll explain), but those errors don't occur in Node, but they do in the browser. I think this is because the console closes before the Promises have rejected, meaning they simply get hidden from failing in the function calls. I think this can be related to something similar to that of Jake Archibald's article about `for await`, and how it can't properly catch unhandled Promises, as they aren't caught by the current thread pool (not sure if that's the proper term either) https://jakearchibald.com/2023/unhandled-rejections/ While searching possible fixes to help figure this out, seeing the extra `.catch()` handlers made it click a little more for me, and it help further discover where these ghost errors were bubbling up from! https://stackoverflow.com/questions/37624322/uncaught-in-promise-undefined-error-when-using-with-location-in-facebook-gra Overall, I think the main issue is that `writer.write()` and `writer.close()` don't properly have `.then()`-able resolutions in browser(s), while they do in Node. If I could use `await` on these calls, then I could ensure that these errors properly get bubbled back up to the main function, which could properly send them as feedback up to the caller, which can decide how to handle them as they need. In the meantime, I think I simply have to rely on the errors that would fire after these, because these can't be successfully caught by the main containing function. So I think using a ghost `.catch()` method on them will be the only fix for it, unfortunately. Maybe I'm missing something about how you should use `write()` and `close()` and maybe they shouldn't be awaited. I'd think that would be key to catching errors properly in situations like these though, no? Offroaders123/Dovetail#1 For a while, I thought this had to do with my other changes to the Read module (Root ListTags, `return await` try-catch calls, `deflate-raw` auto-detect handling), but turns out it was just the error handling within the Compression module itself, for when piping through a compression stream would have an error, and it simply wan't being cast to the parent scope correctly, because it wasn't being called with `await` (Because it doesn't work in the browser). Ooh, and the part I said I'd go back to! I think this was happening in Node actually, but only when the thread was running long enough to allow the errors to bubble up to become Uncaught. When I implemented the auto-detect without header checks change, that was using try-catch for each of the compression format types, and errors would magically cause the NEXT compression format to error out, and I couldn't figure out why. Turns out it *was* because Node was exiting in every other case, before those un-awaited Promises were erroring out (because the *header* based checks were faster than when the header-less checks were running), meaning the header-less checks ran long enough for the uncaught errors in the previous auto-detect compression test would bubble up to the main scope as uncaught, and I associated this as only happening with the `deflate-raw` format, because it would only happen with that one, because the other checks would run fast enough to finalize before the errors would bubble up in time.
Ok, turns out it does have to do with the |
For a better reference on the error handling for NBTify here, see the demo code in this commit here, referenced from the same commit as above. |
I think this could be the final resolution to the unhandled Promise streaming issue. This will fix the issues in implementing the latest version of NBTify into Dovetail. whatwg/compression#55 Offroaders123/Dovetail#1 https://jakearchibald.com/2023/unhandled-rejections/ (Part of the resolution)
Bringing Dovetail up to the latest version of NBTify! That means that now there is support for Deflate-Raw compressed files! This update was waiting on fixes for some additional compression errors in NBTify that went unnoticed in the tests I wrote for Node, because Node has a different streams implementation which was behaving differently, and is passing the tests unexpectedly, meaning the extra unhandled errors were only happening in the browser. #1
Hi again,
I just saw you forking the CompressionStream polyfill, so I thought you might have missed the news that CompressionStream is now already supported by all three browser engines, including Gecko (Firefox) and WebKit (Safari) (see also https://caniuse.com/mdn-api_compressionstream).
The text was updated successfully, but these errors were encountered: