Caching of daily prices in backtest #1469
Replies: 6 comments 11 replies
-
There is likely a trade off here between performance and memory usage. Whether that trade off is worthwhile will probably be a matter of opinion, but I think we should at least understand the effect before making this kind of change. Would you mind re-running your test and noting the effect on memory use? |
Beta Was this translation helpful? Give feedback.
-
Another argument for assuming the memory impact is small is to examine just how much is cached in the system object's cache (which might be an important driver of memory usage). For each instrument almost 500 items are cached (system with 40 rules). This includes get_raw_forecast twice for each rule (one from rules and one from forecastScaleCap), get_scaled_forecast for each rule, get_capped_forecast for each rule. At least 160 series each of the same length as daily_prices. One more is nothing. |
Beta Was this translation helpful? Give feedback.
-
Would you be able to provide a patch for this? I run my system on a cloud server (with limited memory) and I'll be happy to test. |
Beta Was this translation helpful? Give feedback.
-
From the docs on caching
|
Beta Was this translation helpful? Give feedback.
-
Before closing discussion … Adding (or removing) caching doesn't of itself interfere with stage wiring, as far as I understand it. Caching can even be turned off (backtests become glacially slow). Stage wiring seems to me to be the convention that when any stage needs (as an example) capped_forecasts, it calls As @tgibson11 has said, the thing with Separately, but linked, if memory usage is a real constraint for some, then it might be worth reviewing what gets cached or not (not everything decorated 30037 items were calculated and placed in the cache. |
Beta Was this translation helpful? Give feedback.
-
Reopened as I see @robcarver17 has looked in recently - Rob, can you think of any reason why rawdata.get_daily_prices() should not be cached? |
Beta Was this translation helpful? Give feedback.
-
Should method
get_daily_prices()
ofclass RawData
be cached? In the comments it is described as a 'KEY OUTPUT' but the cache decorator used isinput
, notoutput()
.I ran some tests, naively thinking that reading daily prices would be the simplest of activities, and very fast. But
get_daily_prices()
is not fast (at least, not on Windows) because in the sim_data class that it calls there is a pandas resample to Business Days (.resample('1B')
) and this is very slow (at least, it is on Windows). I don't use any other OS to compare.In my experience (40 rules, 45 instruments, no parameters estimated i.e. all read from yaml) this simplest of changes reduced the duration of the run_systems step by around 25%, by cutting the time to calculate all 1800 scaled and capped forecasts by around 75%.
Perhaps we should not be motivated by squeezing down runtimes, but caching daily prices - which are the basis of many calculations downstream - would seem right even if it made no great difference. I saw a comment on Ideas for improvements #1420 about not having to rerun entire backtests. Simple changes that improve the speed of backtests could play into that space.
Beta Was this translation helpful? Give feedback.
All reactions