-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to avoid parsing entire Matroska file? #2135
Comments
Update:
|
parseBlob()
?
Does music-metadata v9.0.0 solve you issue? The implementation of reading from Blobs have been changed from buffering to streaming. |
I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files: Also, do I still need a I'm testing with the following code:
Thanks. |
@Borewit Unless I'm missing something, it looks like parseWebStream is not being exported and thus cannot be used: https://github.com/Borewit/music-metadata/blob/v9.0.0/lib/index.ts#L11. Furthermore, on use of this code: const response = await fetch(`https://my/mp3/file`);
const metadata = await parseWebStream(response.body!, response.headers.get('content-type')!, {
skipPostHeaders: true,
includeChapters: true,
skipCovers: true
}); I get this error:
I wish I knew more about it or else I would have debugged further! Leaving this here instead of on a new issue since I think fixing this would solve "avoid parsing entire file" |
Moved #2135 (comment) to issue #2143 |
It works fine for flac and mp3, no more Buffer-related errors. I'm still getting errors for webm and mkv, though. using
using
|
Parse 'parseBlob()' is calling Lines 23 to 29 in d6c2755
Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/ |
Yes, same error. I tried with a few video formats (webm, mkv, mp4).. Fileinfo of one of them:
|
I managed to get an end-of-stream exception as well, parsing an MP4 file. Issue may be caused by https://github.com/Borewit/peek-readable/blob/master/lib/WebStreamReader.ts Not something I can resolve quickly. |
No problem, thanks for investigating this. In the meantime, I'll keep testing it with more audio files. I love the fact that my bundle size has decreased around 100 kB with the new music-metadata, compared to the latest music-metadata-browser. Awesome job! |
I did some testing with music-metadata v9.0.3 and this is what I got:
It still reads the entire file, even with I'm not sure if this can be avoided at all, since I don't think you can skip to a random position in the stream (without reading all the data up to that point sequentially). |
The atom based format parser, Changing the file size, will impact the container format read. Depends on the structure of file is that has an impact, the length of the nested atoms will usually override the parent atom / container size. There are a few approaches possible to get your metadata result faster: 1: Read only a portion of the stream
No, that is not directly possible. But... the underlying token architecture (see dependencies), is designed that if the underlying file access does support skipping to a random position, that can be utilized, which brings us to option: 2: Utilize the tokenizer 3: Get early access to the metadata |
In PR #2213 I am working towards asynchronous parsing of Matroska, instead of extracting metadata from the full tree. I hope to be able parse less elements, to speed up the overall process. |
@Borewit Thanks for the update, much appreciated! |
It is very tricky, looks like not all metadata is necessary at the beginning of the file. For video a 1 GB remote (on WS S3 cloud) video file, I could bring the the parsing time from 45 seconds to 500 ms, by quieting after receiving the first With that hack, other Matroska files fail, as they have metadata further on on the file. With partial read support, there are possible optimizations to be made. There are certainly elements I did parse, which are not even used. I flagged a bunch of them to be ignored, but it does not do magic. The elements I am interested in are sometimes on the same level as (many) elements I am not interested in. So it is hard to efficiently seek in the file. |
Hello!
I'm having some issues when trying to retrieve the metadata of a large (15GB) video file with
parseBlob()
- disk usage skyrockets and it takes about 1 minute and 20 seconds to resolve with the metadata, so it looks like the it's parsing the entire file.Sometimes the browser just crash or I get an out of memory error (having the dev tools open seems to make things worse / slower).
I tried using
skipPostHeaders: true
andduration: false
, but it seemsparseBlob()
doesn't take an options object.I'd appreciate any advice.
Kind regards.
The text was updated successfully, but these errors were encountered: