Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroll sscanf into strncmp and strtol #83

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

msg7086
Copy link
Member

@msg7086 msg7086 commented Feb 7, 2025

Most of the time in parse_index was spent in sscanf. I unrolled it into strncmp and strtol to improve performance.

(5s -> 1.3s for a 330MB lwi file.)

@Asd-g
Copy link
Contributor

Asd-g commented Feb 8, 2025

If you have only LWLibavVideoSource(), the index file is always recreated instead to be parsed.

@msg7086
Copy link
Member Author

msg7086 commented Feb 9, 2025

Let me investigate

@msg7086
Copy link
Member Author

msg7086 commented Feb 9, 2025

Fixed the corner case of PTS and DTS = LLONG_MIN

@msg7086
Copy link
Member Author

msg7086 commented Feb 9, 2025

I might as well try to abstract and pull the lwi file IO part out of the lwindex.c.

@msg7086 msg7086 marked this pull request as draft February 9, 2025 04:48
@Asd-g
Copy link
Contributor

Asd-g commented Feb 12, 2025

Tested with ~1h video+audio. The parsing time is improved from ~2.66s to ~1.61s.

@msg7086 msg7086 marked this pull request as ready for review February 13, 2025 03:35
@msg7086
Copy link
Member Author

msg7086 commented Feb 13, 2025

On my computer (Ryzen 7940H)

All tests passed!

Performance Benchmark (2000000 iterations):
----------------------------------------   
Original main_index:    1932367.15 ops/sec
Unrolled main_index:    39215686.27 ops/sec (20.3x faster)
Original video_index:   2096436.06 ops/sec
Unrolled video_index:   48780487.80 ops/sec (23.3x faster)
Original audio_index:   5681818.18 ops/sec
Unrolled audio_index:   250000000.00 ops/sec (44.0x faster)
----------------------------------------

On ARM64 VPS (Neoverse-N1)

All tests passed!

Performance Benchmark (2000000 iterations):
----------------------------------------
Original main_index:    2469129.71 ops/sec
Unrolled main_index:    23321439.40 ops/sec (9.4x faster)
Original video_index:   2811749.18 ops/sec
Unrolled video_index:   31693209.73 ops/sec (11.3x faster)
Original audio_index:   8242052.60 ops/sec
Unrolled audio_index:   140567894.29 ops/sec (17.1x faster)
----------------------------------------

Test & benchmark code https://gist.github.com/msg7086/f0cf87f73b4e4affa5e02a4f7c2973f4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants