-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flat span performance improvement by completely ignoring visplanes #35
base: main
Are you sure you want to change the base?
Conversation
… dist so it's not a division anymore
#else | ||
// Find span color per-subsector instead of per-column. - mindbleach | ||
R_LoadColorMap( frontsector->lightlevel ); | ||
// R_GetColorMapColor copies a colormap row to a fixed location. GBA heritage? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out current_colormap
has a fixed location. That's weird. I've removed current_colormap
.
|
||
uint16_t l = count >> 4; | ||
|
||
while( l-- ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, you're right. Unrolling R_DrawColumnFlat() ourselves is faster than what gcc-ia16 produces.
Thanks.
I've removed all visplanes. The sky is rendered as a regular texture. |
By drawing solid color directly instead of searching for / adding to / rendering out an untextured visplane, overall performance with FLAT_SPAN is 10% faster. The memory used by visplanes (and visplane functions) is no longer required. This commit only reduces the number of visplanes allocated, though.
Additionally: unrolling R_DrawColumnFlat is 5% faster.
Additionally: a naive loop writing individual bytes was faster than an unrolled loop calling R_DrawColumnPixel. An unrolled loop writing individual bytes was faster still.
These are 32-bit Open Watcom results, in DOSbox-X, acting as a 25 MHz 386DX. Nonetheless it has gone from 14 FPS to 19 FPS on demo3.