Weekly Status Log: Caching Project #1

domoritz · 2024-09-11T14:18:04Z

This is for uwdata#385. Code is in https://github.com/cmudig/mosaic/tree/cache.

domoritz · 2024-09-11T14:18:51Z

September 11, 2024

Discussed plan for setting up experiment
We need to log: query, size (use apache arrow blob size), query time/latency, hit rate ...
Eventually we want to have a cache that works well across different scenarios: latency, workload, cache sizes, etc

audilin · 2024-09-12T20:34:56Z

September 12, 2024

current goal: create a script to collect data on current LRU caching strategy
the script will:

run commands on dataset locally
keep track of cache & queries → log important data somehow

in order to do this, we first:

need to figure out what happens on the server side
need to figure out how queries work

side note: need to figure out how to measure latency

questions:

should the script be inside pre-existing files or should it be separate?
how does the logging work, i.e. can we write to a txt file or something other than the default JS log?

audilin · 2024-09-18T14:29:25Z

September 18, 2024

what we have:

logging integrated into current cache file, gets current log: queries, latency, cache size, etc.

next steps:

make download button for current log (write a new, custom logger), only saves if key is Query
separate folder/webapp for analysis, maybe use observable framework
figure out how to download to another file using JSON object (tutorial) (example)
see what pre-existing "Log Queries" feature does, and determine how to best merge it into what we're doing
make instructions on how to collect logs, and add features such as reset button, etc in readme.

AllllenLuo · 2024-09-25T14:23:54Z

September 25th

What We Have:

A Web Interface that allow user to upload json file and display all the logs in a table view.

What's Next:

Download the cache as json file, have a button when user check the "Log Query" checkbox.
Move the Webapp to the package folder, allow both upload new files and read existing files. Can use checkboxs to select and compare the data across several log files.
Build a cache interface that allows add item and check item. This should be generic and apply to different cache algorithm.
On the webapp page, create compute hit rate & miss rate over item (or more accurately, use size) and create corresponding plots.

AllllenLuo · 2024-10-02T04:43:42Z

October 2nd

What's New This Week:

A cache webapp that allow user to upload single json file and enter custom cache size.
Display the hit rate based on the input (currently supports LRU cache)

Questions:

Do we need to plot a hit rate vs cache size plot? If so, how to pick the range of the cache size?

What's Next:

To be discussed during regular meeting
Remove get/set distinction, depend on whether executed in the server side
Create a plot of cache-size vs hit rate, calculate sum of all distinct query size, making the hit rate 100% at the end of the plot.
Create a plot of how many % of cache is occupied.
Create another branch for cache depend on size.

audilin · 2024-10-08T14:46:09Z

October 8, 2024

What we have:

Created a working download button for cache.json
Create two graphs on the cache webapp, one for cache size vs hit rate, and another one for log index vs cache used rate

Questions:

what do the pre-existing buttons do? specifically "Log Queries, Query Cache, Query Consolidation"
how should these buttons affect the downloaded cache

Next steps:

what other information about the cache should be collected, and how do I do this
will need to update what information is downloaded based on pre-existing buttons

domoritz · 2024-10-08T17:25:58Z

I think Log queries and log cache should be the same.

Also, shouldn't the hit rate be 1 at some cache size?

AllllenLuo · 2024-10-08T17:33:59Z

Please correct me if I'm wrong, but I think hit rate = number of hit queries / total number of queries. And for example when we are doing the first query, we are guaranteed to miss that query since nothing is in the cache, so I think the hit rate should not be 1.

I think Log queries and log cache should be the same.

Also, shouldn't the hit rate be 1 at some cache size?

domoritz · 2024-10-08T18:54:19Z

Maybe we should measure the hit rate relative to repeated queries so that 1 == perfect cache. But maybe we should measure both?

AllllenLuo · 2024-10-09T14:13:44Z

What's next week:

Separate the different graphs by sections, highlight the correlation between the input & graph
Drop down for file selection
Add checkbox for the log queries, show whether each query is hit/miss
UI improvement: slider for cache size but not user input
instead of looking at what’s in the cache, look at what’s queried and submitted in the query manager (QueryManager.js)
- can use pre-existing record() function
log queries checkbox: prints out queries that are sent to the back end
record queries checkbox: save all queries in an array
add download button to download recorded queries

AllllenLuo · 2024-10-22T17:23:17Z

October 23, 2024

What's New:

File dropdown selection
New plot to find hit rate for repeated queries specifically (maximum is 100% for the right graph). For the plot on the left, we keep using the formula of hit rate = hit queries count / total queries count
Separate by sections, all the plots related to custom cache size is placed in the section called "Plots Based On Cache Size"
Improve UI, now user can input cache size through a slider
The log queries table now indicate whether each query is a hit or a miss

What's Next:

Make interaction between plots (eg. create lines in hit rate graph that can change the cache size in the later section)
Implement different cache algorithms
pull request for change the cache size-based in the main branch

audilin · 2024-11-12T21:43:27Z

November 12, 2024

add new record result function to query manager, that'll record when the result of a query comes back
want result, time it took to run the query, and the size of the result
alternate solution: write proxy for connector?? since we only care about the consolidated queries sent
- something like this:

so many questions about pre-existing record function and it's purpose and why there's multiple recorders

domoritz · 2024-11-12T21:45:15Z

export async function setDatabaseConnector(type, addLogger) {
  let connector;
  switch (type) {
    case 'socket':
      connector = socketConnector();
      break;
    case 'rest':
      connector = restConnector();
      break;
    case 'rest_https':
      connector = restConnector('https://localhost:3000/');
      break;
    case 'wasm':
      connector = wasm || (wasm = wasmConnector());
      break;
    default:
      throw new Error(`Unrecognized connector type: ${type}`);
  }
  console.log('Database Connector', type);


  if (addLogger) {
    connector = loggerConnector(connector)
  }

  coordinator.databaseConnector(connector);
}

export function loggerConnector(connector) {
  const logs = [];

  return {
    snapshot() {
      return logs;
    },
    async query(query) {
      const result = await connector.query(query);
      logs.push(query, result);
      return result;
    }
  }
}

domoritz assigned audilin and AllllenLuo Sep 11, 2024

mhli1260 pushed a commit that referenced this issue Oct 2, 2024

add comments (#1)

aacac75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly Status Log: Caching Project #1

Weekly Status Log: Caching Project #1

domoritz commented Sep 11, 2024 •

edited

Loading

domoritz commented Sep 11, 2024 •

edited by audilin

Loading

audilin commented Sep 12, 2024 •

edited

Loading

audilin commented Sep 18, 2024 •

edited by domoritz

Loading

AllllenLuo commented Sep 25, 2024 •

edited

Loading

AllllenLuo commented Oct 2, 2024 •

edited

Loading

audilin commented Oct 8, 2024 •

edited

Loading

domoritz commented Oct 8, 2024

AllllenLuo commented Oct 8, 2024

domoritz commented Oct 8, 2024

AllllenLuo commented Oct 9, 2024 •

edited by audilin

Loading

AllllenLuo commented Oct 22, 2024 •

edited

Loading

audilin commented Nov 12, 2024

domoritz commented Nov 12, 2024

Weekly Status Log: Caching Project #1

Weekly Status Log: Caching Project #1

Comments

domoritz commented Sep 11, 2024 • edited Loading

domoritz commented Sep 11, 2024 • edited by audilin Loading

September 11, 2024

audilin commented Sep 12, 2024 • edited Loading

September 12, 2024

audilin commented Sep 18, 2024 • edited by domoritz Loading

September 18, 2024

AllllenLuo commented Sep 25, 2024 • edited Loading

September 25th

AllllenLuo commented Oct 2, 2024 • edited Loading

October 2nd

audilin commented Oct 8, 2024 • edited Loading

October 8, 2024

domoritz commented Oct 8, 2024

AllllenLuo commented Oct 8, 2024

domoritz commented Oct 8, 2024

AllllenLuo commented Oct 9, 2024 • edited by audilin Loading

AllllenLuo commented Oct 22, 2024 • edited Loading

October 23, 2024

What's New:

What's Next:

audilin commented Nov 12, 2024

November 12, 2024

domoritz commented Nov 12, 2024

domoritz commented Sep 11, 2024 •

edited

Loading

domoritz commented Sep 11, 2024 •

edited by audilin

Loading

audilin commented Sep 12, 2024 •

edited

Loading

audilin commented Sep 18, 2024 •

edited by domoritz

Loading

AllllenLuo commented Sep 25, 2024 •

edited

Loading

AllllenLuo commented Oct 2, 2024 •

edited

Loading

audilin commented Oct 8, 2024 •

edited

Loading

AllllenLuo commented Oct 9, 2024 •

edited by audilin

Loading

AllllenLuo commented Oct 22, 2024 •

edited

Loading