Preprocessing? As in... more programmatic fetching? (For paginated endpoints) #38

felixakiragreen · 2021-06-02T19:59:27Z

felixakiragreen
Jun 2, 2021

Problem: Inefficient fetching of data from multiple sources. Particularly with paginated endpoints.

Inefficient by:

needing to manually type out pagination
requiring manual updating to the action when there are more pages in the future
doing extra fetching that isn't required (some of the data I'm fetching is static, so data that has been fetched previously doesn't need refetching, only new data)

Context:
I'm working on an interactive data visualization that is getting data from 3 different endpoints.

I figured out how to run multiple steps by following the COVID example.

And... while I do have it working right now, it's a little less than ideal, because 2 of them are fetching data from paginated endpoints, meaning I have a bunch of this happening:

# 0000-0099
- name: Fetch data 0000-0049
  uses: githubocto/flat@v3
  with:
    http_url: https://api.opensea.io/api/v1/assets?asset_contract_address=0x585a2c37858d3b03824bc683829e4dbbf58969ee&order_direction=asc&offset=0&limit=50
    downloaded_filename: flatdata/opensea_junks_00a.json
- name: Fetch data 0050-0099
  uses: githubocto/flat@v3
  with:
    http_url: https://api.opensea.io/api/v1/assets?asset_contract_address=0x585a2c37858d3b03824bc683829e4dbbf58969ee&order_direction=asc&offset=50&limit=50
    downloaded_filename: flatdata/opensea_junks_00b.json
# this endpoint only returns 50, and I need to fetch 2000... XD

The second one is more tedious because I have to manually specify the pagination using block numbers

# 12153412-12355753
- name: Fetch data etherscan logs
  uses: githubocto/flat@v3
  with:
    http_url: https://api.etherscan.io/api?module=logs&action=getLogs&fromBlock=12153412&toBlock=12355753&address=0x585A2C37858D3B03824BC683829e4DBbF58969ee&apikey=APIKEY
    downloaded_filename: flatdata/etherscan_logs_00.json
# this returns 1000, so I only have a couple of them

Then I have several post processing steps to parse, extract, filter, and merge the data for how I need it.

I understand that I am... slightly pushing the limits of what flat data was meant to do. But what I would really like to be able to do is write up a JS file, kind of like the post processing one, that readJSONs the files I already have, and then performs only the fetches it needs to.

(Then run the postprocessing normally, to merge it all back together)

Any input or advice is very welcome!

PS. I was also hoping to specify different schedules for each of the endpoints... so I tried making several yml files, but it created some issues with git commit/merging so I moved them back into a single file with one schedule.

PPS. @Wattenberger big fan of your work! I absolutely love this flat data project & I already have 7! different use-cases for it across multiple projects.

Answered by felixakiragreen

Jun 2, 2021

I think I'm going to try just trying writing my own action that runs Deno like this: https://github.com/githubocto/flat-demo-covid-dashboard/blob/main/.github/workflows/flat.yml#L50-L51 which links together several programmed fetches.

View full answer

felixakiragreen · 2021-06-02T23:56:03Z

felixakiragreen
Jun 2, 2021
Author

I think I'm going to try just trying writing my own action that runs Deno like this: https://github.com/githubocto/flat-demo-covid-dashboard/blob/main/.github/workflows/flat.yml#L50-L51 which links together several programmed fetches.

1 reply

felixakiragreen Jun 4, 2021
Author

This worked. So not directly using flat data anymore, but it helped inspire what was possible.

siddhpant · 2022-01-11T16:17:26Z

siddhpant
Jan 11, 2022

You can just get a small file (like a 1 byte file) and discard it. The fetching can be done from the code itself. It works for me.

I know I am quite late, but thought I should put it out here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing? As in... more programmatic fetching? (For paginated endpoints) #38

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Preprocessing? As in... more programmatic fetching? (For paginated endpoints) #38

felixakiragreen Jun 2, 2021

Replies: 2 comments · 1 reply

felixakiragreen Jun 2, 2021 Author

felixakiragreen Jun 4, 2021 Author

siddhpant Jan 11, 2022

felixakiragreen
Jun 2, 2021

Replies: 2 comments 1 reply

felixakiragreen
Jun 2, 2021
Author

felixakiragreen Jun 4, 2021
Author

siddhpant
Jan 11, 2022