-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #52
Comments
An obvious solution would be to use pub struct MediaPlaylist<'a> { /* ... */ }
impl<'a> TryFrom<&'a str> for MediaPlaylist<'a> {
type Error = Error;
fn try_from(input: &'a str) -> Result<Self, Self::Error> {
parse_media_playlist(input, &mut MediaPlaylist::builder())
}
} One might also consider adding |
All places where it could be used (I think):
MediaPlaylist
MasterPlaylist:
but I am not sure if it should be used as a drop-in replacement for Why
|
Maybe it would be possible to parse tags in parallel? I think that would be really difficult, because the m3u8 file format seems to be designed to be parsed sequentially (some tags depend on the previous or next tag). |
Most likely the culprit is the |
I made some experimental changes to try and get a speed-up. I'm afraid I gutted the code and will have broken lots of existing uses cases / tests, so I can share a branch, but not provide a PR yet.
with all that, I see the benchmark go from around 80MB/s to:
Will try to push the branch to share work in progress later. |
WIP over here dholroyd@f601da1
It's not intended to be ready to commit yet. In particular lots of I'm not particularly committed to using nom; just a tool I've tried out recently. The parser only handles a small portion of the syntax that the FromString version covers, that would need loads more work. This is an opportunity to add 'Span' support though. I did not yet try your idea to handle uri / description with Cow (or beef) - could well help further. |
Do not worry, all help is appreciated 🤗
A I had quite a difficult time writing code that keeps the insertion order of the provided segments vector and that does allow for explicitly numbered segments (the implicitly numbered segments should fill in the free spaces in the vector). I choose a We could replace the Simply using a
I used it quite a while ago, when you had to use macros for everything, which was quite confusing/difficult for me and I think adding a parsing library would make it more difficult for people to contribute this this library, because they would have to first understand I would rather use a custom trait ( How much did this improve performance?
I would wait with that...This is how I ended up rewriting most of this library, because I had so many ideas that I wanted to apply 😅. |
My reading of the spec is that,
Is that your understanding too? So, writing out explicitly the per-segment values that are implied by the manifest + spec-rules:
Or with explicit headers,
|
Further to my last comment , I also recall that there are not supposed to be any gaps in the sequence of MSN values, but I can't seem to find where this is stated or implied in the spec 😨 (And on that point, the 2nd edition spec introduces the |
Yes that is my understanding too. (Note: that I somehow ignored the discontinuity number, which should be inside the
https://tools.ietf.org/html/rfc8216#section-3 The RFC seems to indicate that gaps are allowed? My quick prototype collection (a wrapper around
I will make a PR against your fork later today (or tomorrow) :) |
So I guess I had naively assumed that callers could arrange for segments to be submitted 'in order'. Sounds like you have more advanced usages in mind? For me, it seems reasonable to require that insert-order == manifest-order, but I care far more about parsing today than I do about generating 😄 |
I'm sorry, that earlier "not supposed to be any gaps" statement was just my confusion :) |
I've not done the latter part, but FWIW I pushed a change to the branch which drops nom, so its 'just' a recursive descent parser in plain old rust now. Overall structure is the same, but no higher-order functions etc now. NB this parser does still depend on I can take a look at moving to some kind of mutable cursor, rather than threading |
Small update from my side:I just replaced almost all instances of
The |
Nice! I've made one attempt at moving my parser over to using a custom
to
but I was unable to overcome lifetime issues 😫 |
It would make more sense to change the function signatures to something like this fn foo(input: &mut Cursor<'a>) -> Result<T, E>; and the use std::marker::PhantomData;
struct Cursor<'a, B = &'a [u8]>
where
B: Buffer
{
buffer: B,
_p: PhantomData<&'a str>,
}
trait Buffer {
fn read(&mut self, buf: &mut [u8]) -> Result<usize, Error>;
// insert methods here required for a parsing?
// for example:
fn tag(&mut self, expected: &[u8]) -> Result<(), Error> {
let mut buffer = Vec::with_capacity(expected.len());
self.read(&mut buffer)?;
if buffer == expected {
Ok(())
} else {
unimplemented!("return error here")
}
}
}
impl<'a> Buffer for &'a [u8] {
fn read(&mut self, buf: &mut [u8]) -> Result<usize, Error> {
unimplemented!()
}
} |
So to demonstrate where I got to, I've extracted out a small portion of the WIP code the shows the problem,
As you can see from all the annotations, I've been trying to play lifetime whack-a-mole, and have lost! 😉 There are probably various strategies that can work around the lifetime issues by using additional types to defer ownership checks to runtime, however the goal of my exercise is to remove runtime costs! |
As soon as I posted that last comment, I thought... hmm, those functions look kinda like they want to be instance methods.. 🤔 ...and indeed this transformation addresses the lifetime issues. I will try this in the real codebase! 😁 |
I believe mutable I'll rebase for your latest updates. |
I've rebased my branch on master, gaining the
|
I just made some benchmarks on Current master branch:
Without
|
Yup, I think it's fair to say that validation isn't the biggest concern, I can see it's about 2-3% of this ...much more significant on master is the amount of time spent in For avoidance of doubt, I want to manifests validated - this is one of my goals in using this crate - it's just that my priority in the short term is iterating on performance. My branch shouldn't be merged without validation working just as well as now, and all other tidy-ups etc being applied. |
FWIW, on my branch the main offender (according to linux
Looking at the flamegraph, this cost seems to be spread out across a number of functions, and with no obvious 'explicit' data movement that I can tie it to. I am guessing that this must be due to the remaining uses of intermediate representations that need to be copied to arrive at the final MediaSegment value in the vec. I would like to see if there's a way to arrange things that will convince rustc to optimise more of these moves away / construct the final value in-place. Not sure how, but seems worth a try! |
I've added a couple of additional commits to my branch: Originally, I had copied the Next, dholroyd@c855f48 alters Current benchmark standings:
|
Wow that is amazing 😍 |
dholroyd@21b7fdc puts some restrictions on the format of floating point numbers when parsing the duration value from every |
On my branch right now, the costly activities that stand out in the profile are:
I see maybe one more major win for my usage, which to start tracking state across multiple subsequent reloads of a live manifest, and to avoid the full parsing pipeline for sections of the manifest that are unchanged. Note that all segments are expected to be the same from one reload to the next except that 1) one (or more) segment is added onto the end, 2) one (of more) old segments may be removed from the start. For a very long manifest, the vast majority of segments will be the same, so this might be a profitable strategy (less so for very small live manifests, I expect). I think this is better tacked as a follow-up ticket / PR though. |
I'd like to work towards getting master achieving the same level of performance as my experimental branch. I think at least the following problems make the work on my branch unsuitable for merging right now:
If these were addressed, would a PR using the design in https://github.com/dholroyd/hls_m3u8/tree/parser-perf be accepted? I'd like to address design concerns before writing lots more code. |
I would prefer if you could integrate your improvements into the existing types, by for example having a |
It might make sense to create a new trait for this e.g. trait ParseFrom {
type Error;
fn try_from_cursor(_: &mut Cursor) -> Result<Self, Self::Error>;
}
impl<T: ParseFrom> TryFrom<&[u8]> for T {
// ...
} |
Maybe something like this would be better pub enum Poll<T> {
Ready(T),
Missing(Option<usize>)
}
trait Parser {
type Error;
fn parse<R: Read>(_: R) -> Poll<Result<Self, Self::Error>>;
} I think currently it would be better to not expose your parser to the user, so it can be changed in the future. |
As mentioned in the PR #51, parsing
MediaPlaylist
files withMediaPlaylist::from_str
is quite slow and this can definitely be improved :)@dholroyd
The text was updated successfully, but these errors were encountered: