Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/driver/glfw: clean the cache periodically. #5112

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

knusbaum
Copy link

@knusbaum knusbaum commented Sep 4, 2024

Description:

The cache is supposed to be cleaned periodically, but the glfw driver does not do this as the mobile driver does.

This commit adds similar logic to the glfw driver as exists in the mobile driver, to periodically clean the cache on paint events.

Fixes #4903

Notes to Reviewers:

I'm calling cache.Clean in a similar way to how it's done in the mobile driver:

canvasNeedRefresh := c.FreeDirtyTextures() > 0 || c.CheckDirtyAndClear()
if canvasNeedRefresh {
newSize := fyne.NewSize(float32(d.currentSize.WidthPx)/c.scale, float32(d.currentSize.HeightPx)/c.scale)
if c.EnsureMinSize() {
c.sizeContent(newSize) // force resize of content
} else { // if screen changed
w.Resize(newSize)
}
d.paintWindow(w, newSize)
d.app.Publish()
}
cache.Clean(canvasNeedRefresh)

I don't fully understand what the boolean value passed to clean represents, or if what I'm doing makes sense. I'm just mimicking the code from mobile.

I have not written regression tests for this, as I'm not familiar enough with the code to confidently do so.

Checklist:

  • Tests included.
  • Lint and formatter run with no errors.
  • Tests all pass.

Not all the tests pass on the current develop branch, but this PR does not cause new test failures.

The cache is supposed to be cleaned periodically, but the glfw
driver does not do this as the mobile driver does.

This commit adds similar logic to the glfw driver as exists in
the mobile driver, to periodically clean the cache on paint events.

Fixes fyne-io#4903
@coveralls
Copy link

Coverage Status

coverage: 66.059% (+0.006%) from 66.053%
when pulling 7b2a4c1 on knusbaum:fix-cache-clean
into 5fb3d75 on fyne-io:develop.

Copy link
Member

@andydotxyz andydotxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good fix. Are you happy we land this before looking at the refactor we discussed @dweymouth ?

@dweymouth
Copy link
Contributor

I suppose, though I suspect we are now doing double cache cleaning work on each clean, since it is calling both cache.Clean and cache.CleanCanvases, which duplicate most of the cleanup tasks. I had thought perhaps cache.CleanCanvases was introduced because cache.Clean couldn't be used in the desktop driver for whatever reason, but if we are calling cache.Clean now, it is very likely CleanCanvases is totally redundant and we should evaluate and remove it

@dweymouth
Copy link
Contributor

From a quick glance it seems the only clean task missing from CleanCanvases is the call to destroyExpiredFontMetrics, so another fix could be just adding that call to CleanCanvases. I would really love if someone who was involved in the initial caching design could review these two funcs more deeply and figure out if we still need the separation between the two

@knusbaum
Copy link
Author

knusbaum commented Sep 9, 2024

From a quick glance it seems the only clean task missing from CleanCanvases is the call to destroyExpiredFontMetrics, so another fix could be just adding that call to CleanCanvases. I would really love if someone who was involved in the initial caching design could review these two funcs more deeply and figure out if we still need the separation between the two

CleanCanvases is already called in the frame draw code:

func (d *gLDriver) drawSingleFrame() {
for _, win := range d.windowList() {
w := win.(*window)
w.viewLock.RLock()
canvas := w.canvas
closing := w.closing
visible := w.visible
w.viewLock.RUnlock()
// CheckDirtyAndClear must be checked after visibility,
// because when a window becomes visible, it could be
// showing old content without a dirty flag set to true.
// Do the clear if and only if the window is visible.
if closing || !visible || !canvas.CheckDirtyAndClear() {
continue
}
d.repaintWindow(w)
refreshingCanvases = append(refreshingCanvases, canvas)
}
cache.CleanCanvases(refreshingCanvases)

The issue is that CleanCanvases does not run the cache clean for renderers (and maybe other things I haven't checked), but just deletes renderers for canvases that have expired or are refreshed (gathered into deletingObjs):

fyne/internal/cache/base.go

Lines 132 to 145 in 7d81356

renderersLock.Lock()
for _, dobj := range deletingObjs {
wid, ok := dobj.(fyne.Widget)
if !ok {
continue
}
rinfo, ok := renderers[wid]
if !ok || !rinfo.isExpired(now) {
continue
}
rinfo.renderer.Destroy()
overrides.Delete(wid)
delete(renderers, wid)
}

If we want CleanCanvases to clean stale renderers from the cache, it needs to call destroyExpiredRenderers as cache.Clean does:

destroyExpiredRenderers(now)

@dweymouth
Copy link
Contributor

I really hope @andydotxyz or someone else with deeper knowledge into the initial design of the cache and clean tasks can take another look at this, since I don't understand the design well enough to recommend between merging CleanCanvases and Clean into one func, or adding the missed destroyExpiredRenderers and destroyExpiredFontMetrics to CleanCanvases and continuing to not invoke Clean in the glfw driver.

@knusbaum
Copy link
Author

In my fork I've removed the renderer cache entirely since it was causing issues, and I can't see, from a technical perspective, what benefit it brings that outweighs the downsides.

Issues included:

  • Some canvas objects create sub-objects and then discard them regularly, and their renderers get stuck in the cache until they expire. This means we hang onto garbage for quite a while. This isn't usually a big issue, but can inflate the heap for windows with a large number of objects.
  • In applications with multiple windows, interacting with one window will cause the cache to periodically be cleaned (at least, once this PR goes in), but since the second window is never interacted with, it will not be redrawn, and its renderers will expire. It looks like this should not be an issue and that new renderers will be generated on demand. However, this is not happening and the window just freezes. I didn't bother to debug this.

There are two potential benefits to the render cache that I see, listed here along with counter-points:

  • Renderers for objects that are not being drawn can be discarded eventually, reducing memory cost
    • However: this is a small amount of memory, as renderers are usually not large objects. IMO, it is not worth it, especially since this doesn't appear to actually work today, and when renderers are discarded while objects are still active, things seem to break.
  • We know exactly when renderers are discarded (i.e. when they are evicted from the cache) so we can run their Destroy methods.
    • However: Destroy is really only used in a couple of places, mostly to stop and unregister animations. This can be accomplished with an object finalizer. The current implementation of the cache looks like it has some issues around this, specifically it calls Destroy on the renderers before it removes them from the cache, meaning there is a race, and code can be handed a renderer to use that has already been destroyed. These can be fixed of course, but these kinds of race conditions are easy to miss, and it is probably better to just let the runtime handle it.

In the process of these changes, I also discovered a number of bugs, mostly related to widgets failing to call ExtendBaseWidget, since removing the render cache requires widgets to call this, or else the code fails. I see this as a plus, since it can be easily tested for. Any widget that does not call ExtendBaseWidget when using the base widget will cause a crash, making it difficult to commit code with widgets broken in this way.

If the project is interested in my modifications, I will clean it up and put up another PR for review. Based on my profiling, this improves performance in a number of cases, and probably improves correctness.

@dweymouth
Copy link
Contributor

By getting rid of the render cache, do you mean in your fork the widget gets a new renderer created every time it is rendered? I imagine that would bring its own set of memory and performance issues, I think we'd be more interested in fixing the incorrectness with the current render cache than removing it entirely

@knusbaum
Copy link
Author

knusbaum commented Sep 10, 2024

No, each widget keeps a reference to its renderer. In practice, this is handled by the base widgets. I added a Renderer() method on the Widget interface, and implemented it in the base widget. The implementation is basically:

func (w *BaseWidget) Renderer() WidgetRenderer {
    if w.renderer == nil {
        w.renderer = w.super().CreateRenderer()
    }
    return w.renderer
}

And then all calls to r := cache.Renderer(widget) turn into r := widget.Renderer().

This, of course, does change the API, and so would probably require a major version bump. But there are ways around that if you want to keep the existing API.

@dweymouth
Copy link
Contributor

dweymouth commented Sep 10, 2024

Ah yes, that would be a breaking public API change, as existing widgets would now have to add this new method to continue to function, but could be an option for Fyne 3.x if we ever have the need to introduce other breaking changes (it is not planned anytime soon). In the meantime, fixing the render cache mechanism is something we will need to pursue. I haven't encountered any issues other than the memory leak you've identified with the current system though.

@knusbaum
Copy link
Author

When I have more free time to work on this, I'll try to produce a sample application that demonstrates the other issues. The race condition will probably be hard to trigger, but I should be able to trigger the freeze since that was happening very regularly.

@dweymouth
Copy link
Contributor

Yes, when you have the time please file an issue report for the freezing you're seeing along with a toy app to demonstrate it - as I don't believe any contributors have encountered anything like that and it's an unknown issue right now

Copy link
Contributor

@dweymouth dweymouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting to put a block on merging so we don't accidentally merge this and introduce the window-freezing bug @knusbaum described

@dweymouth
Copy link
Contributor

@knusbaum Is your change in your fork to remove the renderer cache on a public branch somewhere? I'd be curious to take a look at it (and maybe considering using it in my fork for my app as well if it's a simple diff)

@knusbaum
Copy link
Author

@dweymouth It's not public at the moment. It's mixed in with a bunch of other changes I've made testing out optimizations. Some have worked and others have not been valuable.

I will pull that change out and put it on a public branch. When it's done I'll let you know here.

@knusbaum
Copy link
Author

@dweymouth here's my branch with the cache removed, based on the develop branch:
https://github.com/fyne-io/fyne/compare/develop...knusbaum:fyne:remove-renderer-cache?expand=1

I'm not sure everything's fixed, and I haven't updated the tests at all.

@dweymouth
Copy link
Contributor

I wonder if we could somehow only clear the cache for the current window on redraw, if that would avoid issue #5131. Perhaps additional metadata would be needed in the renderer cache to associate them with a window. Also there is a possible complication of a widget being moved from one window to another (do we even support that?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants