Replies: 22 comments 24 replies
-
I've considered this before. I guess it would make sense to keep mongodb as the basic engine, although if we write to an intermediate layer which is anything that can store a dictionary (a bit like the rather ugly timed object class), then to mongo. |
Beta Was this translation helpful? Give feedback.
-
Doing some @ ing to get more interest on this thread @bug-or-feature |
Beta Was this translation helpful? Give feedback.
-
Pandas already has a .to_dict() method for data frames so in theory one could just write this dict to mongo, and there are plenty of toy examples on the web showing this. There are probably some issues with converting dates and weird data types, but nothing insurmountable and we don't have to address every case since the type of data is well known (all floats for the data currently used for arctic). My biggest question is around speed, something like the adjusted price series for Gold has nearly 50,000 rows and counting. Arctic seems blindingly fast, but then it's been written to cope with tick data, which isn't required here; and probably also has to deal with other special cases that aren't relevant here. |
Beta Was this translation helpful? Give feedback.
-
What problem are we trying to resolve here? If it's just that we cannot update to the latest pandas and artic, I think we should wait. There's a couple of live PRs (here and here) that hint that the issue with deprecation of pandas.Panel might go away pretty soon. More generally, in my experience I think its a really bad idea to write your own code for anything! ALWAYS try to use someone else's. Writing your own should be a last resort |
Beta Was this translation helpful? Give feedback.
-
Present, and following along.
I'd ask the same as Andy - what problem are we trying to solve?
My gut feeling is that Arctic and Mongo are both overkill for what we are
doing, at least with only daily data. OTOH, they seem to be working well
at the moment, other than the Pandas version issue, which is surmountable,
and will hopefully be resolved soon. Unless there is some significant
benefit to be gained that I'm not seeing, I say if it ain't broke, don't
fix it.
…On Wed, Nov 24, 2021 at 3:09 AM Andy Geach ***@***.***> wrote:
What problem are we trying to resolve here? If it's just that we cannot
update to the latest pandas and artic, I think we should wait. There's a
couple of live PRs (here <man-group/arctic#887>
and here <man-group/arctic#908>) that hint that
the issue with deprecation of pandas.Panel might go away pretty soon.
More generally, in my experience I think its a really bad idea to write
your own code for anything! ALWAYS try to use someone else's. Writing your
own should be a last resort
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#466 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3JWKUMFAGCK6XMPPBZLMTUNS2UZANCNFSM5IRG265Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure that AHL are using Arctic anymore, although this is pure rumour and speculation. I like the idea of storing raw data from ib_insync in mongo (useful for debugging) and then periodically building a timeseries in parquet from the ib_insync messages or historical data (csv/parquet/structured mongo collection). The performance of reading parquet data is fantastic, I would expect improvements in our IO speed. |
Beta Was this translation helpful? Give feedback.
-
Interesting new section on Arctic README, about a new generation version https://github.com/man-group/arctic There has been a lot of activity in the project, but they haven't got to the bits we want yet... |
Beta Was this translation helpful? Give feedback.
-
As has been said before, I don't think there is any massive hurry, but if arctic is never going to be upgraded in it's open source variant to cope with later pandas libraries, at some point the bullet will have to be bitten. At the same time I'd want to take some data that I've stored in my weird funky 'timed storage' mode and put that into time series (the main exception being optimal positions where a flexible record and class need to be stored). There is already an issue raised for this. I've heard good things about parquet, and am certainly leaning towards that for time series data. I don't know how well it would handle the concurrent nature of pysystemtrade, and of course there would need to be a specified file structure. .csv backups would be easier though; just crawl the parquet file structure and write .csvs out. The question then arises as to what to do about the non time series data. The irony is that my original choice of mongo was purely to run Arctic on top of it... Of course there is no hurry, it could stay in mongo forever in reality. But I agree that it does seem to be a rather industrial sized option if we aren't also storing time series data in the same place. I'm not sure I want to go all the way back to using sqllite - I don't think it copes well with multiple read/writes from different threads; and I'm not sure I ever want to write any SQL ever again. I think Redis would be an obvious alternative, but I don't know enough about databases to judge if that's a step in the right direction, or if the Redis/mongo decision is purely an ideological one. I also note in passing that for any non time series data where the record structure is fixed, it could be represented as a dataframe and thus put into parquet. That probably accounts for the overwhelming proportion of data; the only exception I can think of off hand are the log records and (again) the optimal position tables. However there are ways to get round the latter such as using strategy specific dataframes. Given I don't think I've ever actually searched the log records (I'm happier looking at diagnostic output), there is probably a better solution here such as just appending log outputs to a single text file which is ocasionally cleaned. Logging is also the main instance where there are concurrent writes to the same collection, which would be problematic I think with parquet. Although potentially suitable for dataframes, it may also not make sense to put the order stack state and algo information into parquet. Something like redis or mongo would make more sense here. After this stream of conciousness, I think it might be possible to switch to 100% parquet or something close to it. Certainly that would be simpler. |
Beta Was this translation helpful? Give feedback.
-
Having gone through the exercise of replacing Mongo and Arctic with SQL Server I can certainly say there is definitely a benefit on the NoSQL side of the argument to not have to write adapter classes for each table and figure what is been stored and idiosyncrasies in the code (optimisedPosition in Mongo not being the Optimised Position used in the code :) ) Now that I've done that and it works I can focus on optimising it but I suspect I'm going to end up with less performance when creating and writing Multiple and Adjusted Prices. Thats a trade off I'm happy with in my case to be able to query data with operators like ">" !! |
Beta Was this translation helpful? Give feedback.
-
Rob, |
Beta Was this translation helpful? Give feedback.
-
Nearly a year and a half later, but maybe finally some movement... |
Beta Was this translation helpful? Give feedback.
-
Ugh. I just setup pysystemtrade on Linux Mint 21, and let me tell you, it was a nightmare. Linux Mint 21 comes with python 3.10, but pysystemtrade requires python 3.8, because of its pandas version, because of Arctic. That means getting a second python version installed without messing up the system's python, which was not exactly trivial. (I ended up building python 3.8.16 from source...there might be simpler ways, but I don't think any of them are going to be THAT simple, and there is a real possibility of messing up the system's python instance if you're not careful.) I think this is becoming a more pressing issue. As time goes on, it will become more and more difficult to get pysystemtrade running on a new OS. |
Beta Was this translation helpful? Give feedback.
-
I'm going to start working on the simplification part of this now (https://github.com/robcarver17/pysystemtrade/issues/754)) first, to make things easier for the move to parquet, which now looks inevitable. (With the caveat that I need to test the performance of parquet on large tables with concurrent reads - concurrent writes are less of an issue if we assume that the logging is dropped from the database and just into text files) |
Beta Was this translation helpful? Give feedback.
-
I strongly disagree with this. You just need the right tools and the right process. I've created an installation guide, see #1065. Using the steps there it becomes trivial. I have done it successfully on all sorts of flavours of Linux, and MacOS, including ARM Silicon As for parquet, I'm neutral. But I'd like to be sure we're doing it for the right reasons. And I'd hope the task was "to add support for parquet" rather than "move to parquet". |
Beta Was this translation helpful? Give feedback.
-
I'm hearing lots of good things about DuckDB. Plays nicely with CSV, Parquet, Pandas. Kind of SQLite for big data. Crazy fast |
Beta Was this translation helpful? Give feedback.
-
So the folks from MAN have published the ArcticDB source now under the BSL 1.1 license. Leaving aside whether their current Additional Use Right breaches the 2nd covenant under the BSL license, it does look firmly pay-to-play for any production use. Becomes Apache 2.0 in 24 months time... As @bug-or-feature notes, DuckDB looks pretty interesting for this use-case. Another option that might be interesting to consider would be Ibis. This project is looking to provide a standardized dataframe like API over pluggable back end databases (including in-memory stores like DuckDB & Polars, big analytic stores like Druid and Clickhouse, simple SQL DBs such as SQLLite and MySQL, and some other interesting stores such as HeavyDB which is a GPU-accelerated DB). So potentially provides a unified API for system metadata, instrument history data, and maybe even logging. |
Beta Was this translation helpful? Give feedback.
-
Arctic 1.82.0 just released, with support for pandas<2, numpy<2 |
Beta Was this translation helpful? Give feedback.
-
I met James Monroe, ex AHL CTO, who came to hear me speak in London recently. He's now in charge of releasing Arctic as a commercial product... He offered me a free production license but I guess that wouldn't extend to you guys. |
Beta Was this translation helpful? Give feedback.
-
"I think this is becoming a more pressing issue. As time goes on, it will become more and more difficult to get pysystemtrade running on a new OS." Well of course my laptop died, and a new battery did not revive it, apparently a known problem with the USB-C charging port which decays over time. So I'm currently in this position, and it looks like I will be trying out the pyenv solution to see if it works.... if not then expect a very quick decision on this and some fairly frantic coding. |
Beta Was this translation helpful? Give feedback.
-
Hi All 👋 , This is James @ ArcticDB. We're happy to work with you all on making ArcticDB a possible backend if that's of interest, including things like MongoDB support. As @bug-or-feature says, we're chatting next week. We super appreciate that you've all been users of the original Arctic. |
Beta Was this translation helpful? Give feedback.
-
As per the roadmap, once I have a version working with parquet and up to date libraries, I'm obviously very happy if someone can get some kind of arctic working again with the python / pandas / etc versions that everything has been brought up to date on. There is no reason why there can't be support for both solutions going forward. At least a lot of what I have done in the last few days has made it easier to plug and play different database / storage solutions than before. |
Beta Was this translation helpful? Give feedback.
-
As useful as Arctic is I think that in the modern days of Parquet and Arrow it's usefulness is limited, especially in the simple case of data co-located on the same compute node.
Does anyone have thoughts on this?
I think we could roll our own columnar data store without too much hassle, some objectorientated timeseries structure around a simple DataFrame.to_parquet() would be very performant. e.g. similar to this project https://github.com/ranaroussi/pystore
The columnar store would be so simple it could be a sub module to this project.
Beta Was this translation helpful? Give feedback.
All reactions