feat: added ape cache plugin (#680)

* feat: added cache query * chore: reformatting following pre-commit * fix: mypy issues * fix: used more secure way to query the database, and removed TODO * fix: mypy and flake8 issues * chore: clean up of docstrings * fix: blocks database structure, need to work on updating by column * feat: moved init_db and purge_db functionality to CacheQueryProvider * feat: handles missing columns, database not existing, no nullable data to database, multiple databases depending on provider * chore: pre-commit * fix: cannot import Column from sqlalchemy.types, fixed this in models to remove warning * chore: linting * fix: added sqlalchemy to setup.py * feat: removed schemas.py, no need for pydantic here, can go to the API's and build these models. Can add separate schemas later down the road. * feat: allow to cache all data during query no matter what column you query * mypy issue * removed breakpoint * removed TODO columns are no longer going to be handled in the update of cache * moving DefaultQueryProvider back into managers/query.py * type ignore for logger due to mypy * fix: fixed query so it doesn't default to cache * chore: type ignore issue * fix: importing logger from ape.logging * fix: isort issue * feat: simplified logic in estimate_block_query * fix: docs build issue * fix: docs issue * fix: importing logger mistake * refactor: simplify logic in ape_cache query.py * feat: caching data added * fix: mypy issues * fix: isort and mypy * fix: mypy * fix: documentation for `update_cache` * fix: mypy * fix: mypy * fix: type in update cache * fix: type in update cache * fix: remove ignore * feat: added transactions cache * feat: created singledispatch for cache * refactor: remove ContractEvents for now * fix: cache was failing * feat: added contract_events table to caching system * fix: mypy * refactor: remove print * feat: major upgrade to CacheQueryProvider * fix: mypy issue and linters * fix: remove print * feat: added raises * fix: added missing argument to cache_query * feat: added test and set logger to error * feat: removed type ignore for sqlalchemy imports and added cache.md * fix: fixed markdowns and fixed issue where db was being created without init * fix: pytest failure because of raises in database_file method * fix: misspelling in index.md * fix: query manager was not using iterators * refactor: major update to how cache provider works * refactor: update ape cache cli to match * feat: query is operating as expected * fix: caching issue of integers being too large * fix: type hint * fix: isort * fix: mypy * fix: mypy * fix: mypy * fix: mypy * fix: mypy for the last time please * feat: added suggestions * fix: formatting markdown and managers.query * fix: suggested changes from fubu * feat: adding errors to logging for QueryEngine * fix: mypy and added type ignore to perform_query * fix: num_transactions in the blocks table is now integer * fix: CacheQuery docstring * fix: show warnings instead of errors when db is not inited * feat: add url for information on QueryEngineError * feat: add url for information on QueryEngineError * feat: add url for information on QueryEngineError * fix: remove breakpoint * feat: moved try for update_cache inside of the with statement * feat: adding txn_hash and signature to Transactions * feat: removed breakpoint * fix: docstring * feat: this would be a breaking change * fix: cache markdown * fix: remove bytes from transaction api * refactor: remove type ignore * refactor: raise message fix * refactor: revert transaction api back to original form * refactor: remove any from typing * feat: getting all data for transactions * fix: breakpoint removal * feat: transaction cache full operational * fix: type ignores for properties from transaction api * fix: solve missing data by setting to None * fix: unused import * fix: mypy issue with table columns * fix: black * fix: wrong order of arguments for estimate query call * fix: fetch estimate properly in CacheQueryProvider * fix: wrong estimates in DefaultQueryProvider estimates fixed for BlockTransactionQuery and ContractEventsQuery * fix: unable to unpack the two different returns here * fix: linters * fix: blocks queries properly * refactor: replace info with success messages * refactor: adding singledispatchmethod for perform_query * feat: transactions and blocks fully operational * feat: handle individual column queries for blocks * fix: mypy * feat: properly calculate query time in DefaultQueryProvider * feat: one liner for the estimate of query * fix: cache init and purge messages in test * fix: HexBytesString schema type doesn't handle string inputs * fix: updates to ape_ethereum.Block to ensure that values always exist also added some notes to describe this behavior * fix: docstring format in HexByteString * feat: added docstrings to all cli methods for ape cache * feat: added beta release note to cache markdown file * fix: query and purge docstrings * fix: need to convert hash and parent hash to hex in decode_block * fix: models must return bytes properly from new data type * refactor: docstrings added to ape_cache.query and cleaned * fix; database_connection should not be hidden * fix: mypy issue * fix: added minor changes to documentation * fix: comments and some logic tweaks * fix: perform_transaction_query uses a map * feat: remove perform_query_clause * fix: add required to network_option for init and purge * refactor: raise Error instead of returning None when no cache clause * feat: better bypass of database when corrupted or not initialized * fix(HACK): bypass BlockAPI.transactions requests until #994 * fix: bypass database connection whenever connected to local Co-authored-by: Just some guy <[email protected]>
ApeWorX · Aug 18, 2022 · 656b987 · 656b987
1 parent 39c99ea
commit 656b987
Show file tree

Hide file tree

Showing 15 changed files with 776 additions and 26 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -10,6 +10,7 @@
    userguides/installing_plugins
    userguides/projects
    userguides/compile
+   userguides/cache
    userguides/networks
    userguides/developing_plugins
    userguides/config

diff --git a/docs/userguides/cache.md b/docs/userguides/cache.md
@@ -0,0 +1,69 @@
+# Cache
+** Note: This is in Beta release. This functionality is in constant development and many features are in planning stages.
+Use the cache plugin to store provider data in a sqlite database.**
+
+```bash
+ape cache init --network <ecosystem-name>:<network-name>
+```
+
+If you want to set up your network connection for caching use [this guide](./network.html)
+
+```bash
+ape cache init --network ethereum:mainnet
+```
+
+This creates a SQLite database file in the hidden ape folder.
+
+## Get data from the provider
+
+Use `ape console`:
+
+```bash
+ape console --network ethereum:mainnet:infura
+```
+
+Run a few queries:
+
+```python
+In [1]: chain.blocks.query("*", stop_block=20)
+In [2]: chain.blocks[-2].transactions
+```
+
+On a deployed contract, you can query events:
+- Below, FooHappened is the event from your contract instance that you want to query from.
+
+```python
+contract_instance.FooHappened.query("*", start_block=-1)
+```
+
+where `contract_instance` is the return of owner.deploy(Contract)
+
+See [this guide](../userguides/contracts.html) for more information how to get a contract instance.
+
+Exit the IPython interpreter.
+
+You can query the cache database directly, for debugging purposes. For example, to get the `blocks` table data from the SQLite db we can do the following:
+```bash
+ape cache query --network ethereum:mainnet:infura "SELECT * FROM blocks"
+```
+
+Returns:
+
+```bash
+                                                 hash num_transactions  number                                        parent_hash  size   timestamp  gas_limit  gas_used base_fee   difficulty  total_difficulty
+0   b'\xd4\xe5g@\xf8v\xae\xf8\xc0\x10\xb8j@\xd5\xf...                0       0  b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...   540           0       5000         0     None  17179869184       17179869184
+1   b'\x88\xe9mE7\xbe\xa4\xd9\xc0]\x12T\x99\x07\xb...                0       1  b'\xd4\xe5g@\xf8v\xae\xf8\xc0\x10\xb8j@\xd5\xf...   537  1438269988       5000         0     None  17171480576       34351349760
+2   b'\xb4\x95\xa1\xd7\xe6f1R\xae\x92p\x8d\xa4\x84...                0       2  b'\x88\xe9mE7\xbe\xa4\xd9\xc0]\x12T\x99\x07\xb...   544  1438270017       5000         0     None  17163096064       51514445824
+```
+
+To get `transactions` or `contract_events`:
+
+```bash
+ape cache query --network ethereum:mainnet:infura "SELECT * FROM transactions"
+```
+
+or
+
+```bash
+ape cache query --network ethereum:mainnet:infura "SELECT * FROM contract_events"
+```
diff --git a/setup.py b/setup.py
@@ -24,10 +24,12 @@
         "types-requests",  # NOTE: Needed due to mypy typeshed
         "types-pkg-resources",  # NOTE: Needed due to mypy typeshed
         "pandas-stubs==1.2.0.62",  # NOTE: Needed due to mypy typeshed
+        "types-SQLAlchemy>=1.4.49",
         "flake8>=4.0.1,<5.0",  # Style linter
         "flake8-breakpoint>=1.1.0,<2.0.0",  # detect breakpoints left in code
         "flake8-print>=4.0.0,<5.0.0",  # detect print statements left in code
         "isort>=5.10.1,<6.0",  # Import sorting linter
+        "pandas-stubs==1.2.0.62",  # NOTE: Needed due to mypy types
     ],
     "doc": [
         "myst-parser>=0.17.0,<0.18",  # Tools for parsing markdown files in the docs
@@ -103,6 +105,7 @@
         "pydantic>=1.9.2,<2",
         "pygit2>=1.7.2,<2",
         "PyGithub>=1.54,<2",
+        "SQLAlchemy>=1.4.35",
         "pytest>=6.0,<8.0",
         "python-dateutil>=2.8.2,<3",
         "pyyaml>=6.0,<7",
@@ -129,6 +132,7 @@
         "pytest11": ["ape_test=ape.pytest.plugin"],
         "ape_cli_subcommands": [
             "ape_accounts=ape_accounts._cli:cli",
+            "ape_cache=ape_cache._cli:cli",
             "ape_compile=ape_compile._cli:cli",
             "ape_console=ape_console._cli:cli",
             "ape_plugins=ape_plugins._cli:cli",

diff --git a/src/ape/__modules__.py b/src/ape/__modules__.py
@@ -1,6 +1,7 @@
 __modules__ = [
     "ape",
     "ape_accounts",
+    "ape_cache",
     "ape_compile",
     "ape_console",
     "ape_ethereum",

diff --git a/src/ape/api/providers.py b/src/ape/api/providers.py
@@ -58,9 +58,12 @@ class BlockAPI(BaseInterfaceModel):
     An abstract class representing a block and its attributes.
     """
 
+    # NOTE: All fields in this class (and it's subclasses) should not be `Optional`
+    #       except the edge cases noted below
+
     num_transactions: int = 0
     hash: Optional[Any] = None  # NOTE: pending block does not have a hash
-    number: Optional[int] = None
+    number: Optional[int] = None  # NOTE: pending block does not have a number
     parent_hash: Any = Field(
         EMPTY_BYTES32, alias="parentHash"
     )  # NOTE: genesis block has no parent hash

diff --git a/src/ape/api/query.py b/src/ape/api/query.py
@@ -139,7 +139,7 @@ def perform_query(self, query: QueryType) -> Iterator:
             Iterator
         """
 
-    def update_cache(self, query: QueryType, result: Iterator):
+    def update_cache(self, query: QueryType, result: Iterator[BaseInterfaceModel]):
         """
         Allows a query plugin the chance to update any cache using the results obtained
         from other query plugins. Defaults to doing nothing, override to store cache data.

diff --git a/src/ape/managers/chain.py b/src/ape/managers/chain.py
@@ -146,12 +146,11 @@ def query(
         )
 
         blocks = self.query_manager.query(query, engine_to_use=engine_to_use)
-        data = map(lambda val: val.dict(by_alias=False), blocks)
-
         # NOTE: Allow any columns from ecosystem's BlockAPI class
         # TODO: fetch the block fields from EcosystemAPI
         columns = validate_and_expand_columns(columns, list(self.head.__fields__))  # type: ignore
-        return pd.DataFrame(columns=columns, data=data)
+        blocks = map(lambda val: val.dict(by_alias=False), blocks)  # type: ignore
+        return pd.DataFrame(columns=columns, data=blocks)
 
     def range(
         self,

diff --git a/src/ape/managers/query.py b/src/ape/managers/query.py
@@ -1,9 +1,11 @@
+from itertools import tee
 from typing import Dict, Iterator, Optional
 
-from ape.api import QueryAPI, QueryType
-from ape.api.query import BlockQuery, BlockTransactionQuery, ContractEventQuery
+from ape.api import QueryAPI, QueryType, TransactionAPI
+from ape.api.query import BaseInterfaceModel, BlockQuery, BlockTransactionQuery, ContractEventQuery
 from ape.contracts.base import ContractLog, LogFilter
 from ape.exceptions import QueryEngineError
+from ape.logging import logger
 from ape.plugins import clean_plugin_name
 from ape.utils import ManagerAccessMixin, cached_property, singledispatchmethod
 
@@ -26,13 +28,13 @@ def estimate_block_query(self, query: BlockQuery) -> Optional[int]:
 
     @estimate_query.register
     def estimate_block_transaction_query(self, query: BlockTransactionQuery) -> int:
-
-        return 100
+        # NOTE: Very loose estimate of 1000ms per block for this query.
+        return self.provider.get_block(query.block_id).num_transactions * 100
 
     @estimate_query.register
     def estimate_contract_events_query(self, query: ContractEventQuery) -> int:
         # NOTE: Very loose estimate of 100ms per block for this query.
-        return 100
+        return (query.stop_block - query.start_block) * 100
 
     @singledispatchmethod
     def perform_query(self, query: QueryType) -> Iterator:  # type: ignore
@@ -43,12 +45,14 @@ def perform_block_query(self, query: BlockQuery) -> Iterator:
         return map(
             self.provider.get_block,
             # NOTE: the range stop block is a non-inclusive stop.
-            #       Where as the query method is an inclusive stop.
+            #       Where the query method is an inclusive stop.
             range(query.start_block, query.stop_block + 1, query.step),
         )
 
     @perform_query.register
-    def perform_block_transaction_query(self, query: BlockTransactionQuery) -> Iterator:
+    def perform_block_transaction_query(
+        self, query: BlockTransactionQuery
+    ) -> Iterator[TransactionAPI]:
         return self.provider.get_transactions_by_block(query.block_id)
 
     @perform_query.register
@@ -91,30 +95,35 @@ def engines(self) -> Dict[str, QueryAPI]:
 
         engines: Dict[str, QueryAPI] = {"__default__": DefaultQueryProvider()}
 
-        for plugin_name, (engine_class,) in self.plugin_manager.query_engines:
+        for plugin_name, engine_class in self.plugin_manager.query_engines:
             engine_name = clean_plugin_name(plugin_name)
-            engines[engine_name] = engine_class()
+            engines[engine_name] = engine_class()  # type: ignore
 
         return engines
 
-    def query(self, query: QueryType, engine_to_use: Optional[str] = None) -> Iterator[QueryAPI]:
+    def query(
+        self,
+        query: QueryType,
+        engine_to_use: Optional[str] = None,
+    ) -> Iterator[BaseInterfaceModel]:
         """
         Args:
             query (``QueryType``): The type of query to execute
             engine_to_use (Optional[str]): Short-circuit selection logic using
-              a specific engine. Defaults to None.
+              a specific engine. Defaults is set by performance-based selection logic.
 
         Raises: :class:`~ape.exceptions.QueryEngineError`: When given an
             invalid or inaccessible ``engine_to_use`` value.
 
         Returns:
-            Iterator[QueryAPI]
+            Iterator[BaseInterfaceModel]
         """
+
         if engine_to_use:
             if engine_to_use not in self.engines:
                 raise QueryEngineError(f"Query engine `{engine_to_use}` not found.")
 
-            engine = self.engines[engine_to_use]
+            selected_engine = self.engines[engine_to_use]
 
         else:
             # Get heuristics from all the query engines to perform this query
@@ -126,15 +135,21 @@ def query(self, query: QueryType, engine_to_use: Optional[str] = None) -> Iterat
             try:
                 # Find the "best" engine to perform the query
                 # NOTE: Sorted by fastest time heuristic
-                engine, _ = min(valid_estimates, key=lambda qe: qe[1])  # type: ignore
+                selected_engine, _ = min(valid_estimates, key=lambda qe: qe[1])  # type: ignore
+
             except ValueError as e:
                 raise QueryEngineError("No query engines are available.") from e
 
         # Go fetch the result from the engine
-        result = engine.perform_query(query)
+        result = selected_engine.perform_query(query)
 
         # Update any caches
         for engine in self.engines.values():
-            engine.update_cache(query, result)
+            if not isinstance(engine, selected_engine.__class__):
+                result, cache_data = tee(result)
+                try:
+                    engine.update_cache(query, cache_data)
+                except QueryEngineError as err:
+                    logger.error(str(err))
 
         return result
diff --git a/src/ape_cache/__init__.py b/src/ape_cache/__init__.py
@@ -0,0 +1,18 @@
+from ape import plugins
+from ape.api import PluginConfig
+
+from .query import CacheQueryProvider
+
+
+class CacheConfig(PluginConfig):
+    size: int = 1024**3  # 1gb
+
+
+@plugins.register(plugins.Config)
+def config_class():
+    return CacheConfig
+
+
+@plugins.register(plugins.QueryPlugin)
+def query_engines():
+    return CacheQueryProvider
diff --git a/src/ape_cache/_cli.py b/src/ape_cache/_cli.py
@@ -0,0 +1,82 @@
+import click
+import pandas as pd
+
+from ape import networks
+from ape.cli import NetworkBoundCommand, network_option
+from ape.logging import logger
+from ape.utils import ManagerAccessMixin
+
+
+def get_engine():
+    return ManagerAccessMixin.query_manager.engines["cache"]
+
+
+@click.group(short_help="Query from caching database")
+def cli():
+    """
+    Manage query caching database (beta).
+    """
+
+
+@cli.command(short_help="Initialize a new cache database")
+@network_option(required=True)
+def init(network):
+    """
+    Initializes an SQLite database and creates a file to store data
+    from the provider.
+
+    Note that ape cannot store local data in this database. You have to
+    give an ecosystem name and a network name to initialize the database.
+    """
+
+    provider = networks.get_provider_from_choice(network)
+    ecosystem_name = provider.network.ecosystem.name
+    network_name = provider.network.name
+
+    get_engine().init_database(ecosystem_name, network_name)
+    logger.success(f"Caching database initialized for {ecosystem_name}:{network_name}.")
+
+
+@cli.command(
+    cls=NetworkBoundCommand,
+    short_help="Call and print SQL statement to the cache database",
+)
+@network_option()
+@click.argument("query_str")
+def query(query_str, network):
+    """
+    Allows for a query of the database from an SQL statement.
+
+    Note that without an SQL statement, this method will not return
+    any data from the caching database.
+
+    Also note that an ecosystem name and a network name are required
+    to make the correct connection to the database.
+    """
+
+    with get_engine().database_connection as conn:
+        results = conn.execute(query_str).fetchall()
+        if results:
+            click.echo(pd.DataFrame(results))
+
+
+@cli.command(short_help="Purges entire database")
+@network_option(required=True)
+def purge(network):
+    """
+    Purges data from the selected database instance.
+
+    Note that this is a destructive purge, and will remove the database file from disk.
+    If you want to store data in the caching system, you will have to
+    re-initiate the database following a purge.
+
+    Note that an ecosystem name and network name are required to
+    purge the database of choice.
+    """
+
+    provider = networks.get_provider_from_choice(network)
+    ecosystem_name = provider.network.ecosystem.name
+    network_name = provider.network.name
+
+    get_engine().purge_database(ecosystem_name, network_name)
+    logger.success(f"Caching database purged for {ecosystem_name}:{network_name}.")
diff --git a/src/ape_cache/base.py b/src/ape_cache/base.py
@@ -0,0 +1,17 @@
+from typing import Any
+
+from sqlalchemy.ext.declarative import as_declarative, declared_attr
+
+
+@as_declarative()
+class Base:
+    """
+    Base class to generate ``__tablename__`` automatically
+    """
+
+    id: Any
+    __name__: str
+
+    @declared_attr
+    def __tablename__(cls) -> str:
+        return cls.__name__.lower()