Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-batching functionality #71

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ShinyMeta
Copy link

Added an "pool" to client to hold static auto-batching endpoints.
Endpoint now can enableAutoBatch() which creates a set for queueing ids for the next batch.

Example:
client.autoBatch('items', 500).get(100)
client.autoBatch('items', 500).get(100)
client.autoBatch('items', 500).get(101)
client.autoBatch('items', 500).many([100,101,102])
client.autoBatch('items', 500).get(101)
client.autoBatch('items', 500).get(101)
//when all of these requests are made in sequence/from multiple locations in code, results in a single api request for 100,101,102, and resolves each one with the requested ids

@codecov-commenter
Copy link

codecov-commenter commented Jun 14, 2021

Codecov Report

Merging #71 (063a10c) into master (b2ff20a) will decrease coverage by 3.10%.
The diff coverage is 13.33%.

❗ Current head 063a10c differs from pull request most recent head fc1fa42. Consider uploading reports for the commit fc1fa42 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##            master      #71      +/-   ##
===========================================
- Coverage   100.00%   96.89%   -3.11%     
===========================================
  Files           62       62              
  Lines         1854     1898      +44     
  Branches        92      104      +12     
===========================================
- Hits          1854     1839      -15     
- Misses           0       50      +50     
- Partials         0        9       +9     
Impacted Files Coverage Δ
src/client.js 93.67% <9.09%> (-6.33%) ⬇️
src/endpoint.js 83.10% <14.70%> (-16.90%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2ff20a...fc1fa42. Read the comment docs.

Copy link
Owner

@queicherius queicherius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution 🙂 I have flagged some issues in the code, please have a look through the comments.

@ShinyMeta
Copy link
Author

I moved most of the logic in Client to a new class, AutoBatchNode, which is essentially a wrapper class for Client/Endpoint. This changes the syntax to match the current library as follows:

Example:

api.autoBatch().items().get(100)
api.autoBatch().items().get(100)
api.autoBatch().items().get(101)
api.autoBatch().items().many([100,101,102])
api.autoBatch().items().get(101)
api.autoBatch().items().get(101)

This also works in chaining with settings/child endpoints now:
api.autoBatch().authenticate('api key').account().achievements().many([1,2,3])

Also, if it is desired for ALL bulk expanding requests to autobatch, the Autobatch node can be stored and used almost identically to a normal client:
api = client().autoBatch()

CONSIDERATIONS:
Since the autobatching endpoints are static, their settings such as apikey and schema should not be changed individually. Thus there is a line of code to force their settings to match the source client on every subsequent method call:

this.parent
      .schema(this.client.schemaVersion)
      .language(this.client.lang)
      .authenticate(this.client.apiKey)
      .debugging(this.client.debug)

Auto batching endpoints also override the _skipCache setting which is normally changed by .live(), to force cache usage.

Default batch delay is 50, but this CAN be changed per endpoint through subsequent calls to the enableAutoBatch(batchDelay) method.

The chain after .autoBatch() will continue to return AutoBatchNodes wrapping the client/endpoint until a method is called that would return anything other than a Client/Endpoint (i.e. Promise/bool/string).

@ShinyMeta ShinyMeta requested a review from queicherius June 28, 2021 01:40
@@ -74,6 +77,23 @@ module.exports = class AbstractEndpoint {
return this
}

// Turn on auto-batching for this endpoint
enableAutoBatch (autoBatchDelay) {
Copy link
Owner

@queicherius queicherius Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also call this autoBatch like in the client, so it's consistent. Similar to language, debugging etc.

(Expanded on more below)

return this
}
this._autoBatch = {
// autoBatchDelay: autoBatchDelay,
Copy link
Owner

@queicherius queicherius Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented out code

Comment on lines +289 to +293
if (this._autoBatch !== null) {
if (!skipAutoBatch) {
return this._autoBatchMany(ids)
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this._autoBatch !== null) {
if (!skipAutoBatch) {
return this._autoBatchMany(ids)
}
}
if (this._autoBatch !== null && !skipAutoBatch) {
return this._autoBatchMany(ids)
}

@@ -230,9 +283,17 @@ module.exports = class AbstractEndpoint {
}

// Get multiple entries by ids from the live API
_many (ids, partialRequest = false) {
_many (ids, partialRequest = false, skipAutoBatch) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_many (ids, partialRequest = false, skipAutoBatch) {
_many (ids, partialRequest = false, skipAutoBatch = false) {

// Get multiple entries by ids
many (ids) {
many (ids, skipAutoBatch) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
many (ids, skipAutoBatch) {
many (ids, skipAutoBatch = false) {

@@ -0,0 +1,86 @@

const Endpoint = require('./endpoint')
const _ = require('lodash')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You don't use it here so it can be removed, but just FYI in 2021 using Lodash is pretty much always an antipattern)

return this.children[childName]
}

}
Copy link
Owner

@queicherius queicherius Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say the entire wrapping logic here feels a little over-engineered. What benefits does it have over just passing a autoBatching = true flag, just like we pass flags for e.g. debug?

It feels like that would ultimately achieve the same functionality and would e.g. also make it possible to pass the autoBatchDelay to the client if we did something like this:

// Client

autoBatch (flag = true, autoBatchDelay = 50) {
  this.autoBatching = flag
  this.autoBatchDelay = autoBatchDelay
  return this
}
// Endpoint

constructor (parent) {
  // ...
  this.autoBatching = parent.autoBatching
  this.autoBatchDelay = parent.autoBatchDelay

  this._autoBatch = {
    idsForNextBatch: new Set(),
    nextBatchPromise: null,
  }
}

autoBatch (flag = true, autoBatchDelay = 50) {
  // We could also print the debug warning message here, if this.isBulk = false
  // that the flag will just be ignored. But in some cases it might make sense anyway,
  // like this one where it gets passed to a child:
  // client().account().autoBatch(true).characters().many([...])
  // So IMO it would be fine to just set the flag and silently ignore it (see below)
  // if the endpoint is not bulk.

  this.autoBatching = flag
  this.autoBatchDelay = autoBatchDelay
  return this
}

_many (ids, partialRequest = false, skipAutoBatch = false) {
  if (this.isBulk && this.autoBatching && !skipAutoBatch) {
    return this._autoBatchMany(ids)
  }
}
// Usage
const api = client().autoBatch(true, 50)
api.items().many([12]) // -> autobatches
api.account().bank().get() // -> doesn't
api.items().autoBatch(false).many([12]) // -> doesn't
client().items().autoBatch(true).get(12) // -> does

Since the autobatching endpoints are static, their settings such as apikey and schema should not be changed individually.

I don't think this is an issue, we need to handle. If the user mutates the language on an already existing endpoint with autobatching enabled, while a request is in-flight, I feel like that's on them.

Auto batching endpoints also override the _skipCache setting which is normally changed by .live(), to force cache usage.

live() should win over everything, it is forced to overwrite the cache usage and immediately fire the request.

Copy link
Author

@ShinyMeta ShinyMeta Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be cleaner if autobatch were handled via setting flag, however I think it could be confusing. For example:

let api = client().autoBatch()
api.items().get(101)
api.items().get(102)

This normal behavior of the library is to return a new endpoint from items(), which would have seperate autoBatch sets of ids with my current endpoint proposed edit.
In my mind, if the client is set to autoBatch, then subsequent calls for the same endpoint should also autoBatch. The main thing the wrapper is doing is caching child endpoints so that subsequent calls are returning the same endpoint objects. For the most part, other method calls are passed through without interference.

The reason I added logic to re-inherit settings from the parent, was probably over-cautious, and unnecessary. I was worried that people might try to change settings on the endpoint like this:

await api.items().live().get(101)
await api.items().get(101)

Which is fine with the current library because it makes a new endpoint on each line so the second line will use the cache. But if the api is returning saved endpoints from a pool for autobatching, then the setting stays. This can be avoided with proper usage of the library, and removing it cuts some fluff from my edit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normal behavior of the library is to return a new endpoint from items(), which would have seperate autoBatch sets of ids with my current endpoint proposed edit. In my mind, if the client is set to autoBatch, then subsequent calls for the same endpoint should also autoBatch.

Ah! I didn't think about batching being shared between endpoints of the same type. That's very valid, and I agree that it should work how you described.

The main thing the wrapper is doing is caching child endpoints so that subsequent calls are returning the same endpoint objects. For the most part, other method calls are passed through without interference.

Instead of wrapping the client and returning the same endpoint objects, what do you think about a singleton shared data object that holds the ID list and timeout based on the endpoint URL? That should solve the issues around the shared settings and still have the simple interface I suggested.

Something like this?

const autoBatchSharedData = {}

module.exports = class AbstractEndpoint {
  constructor(parent) {
    // ...

    this.setupAutoBatchSharedData()
  }

  // Setup the shared data for auto batching
  setupAutoBatchSharedData() {
    const autoBatchId = this.baseUrl + this.url + ':' + this.schemaVersion
    if (!autoBatchSharedData[autoBatchId]) {
      autoBatchSharedData[autoBatchId] = {
        idsForNextBatch: new Set(),
        nextBatchPromise: null,
      }
    }

    this._autoBatch = autoBatchSharedData[autoBatchId]
  }

  // Set the schema version
  schema(schema) {
    this.schemaVersion = schema
    this.setupAutoBatchSharedData()
    this.debugMessage(`set the schema to ${schema}`)
    return this
  }
}

The reason I added logic to re-inherit settings from the parent, was probably over-cautious, and unnecessary. I was worried that people might try to change settings on the endpoint like this:

await api.items().live().get(101)
await api.items().get(101)

Yeah, I would also expect the first one to go through and the second one to hit the cache.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been staring at this and wondering why I didn't think of that design before.

Should language/apikey be added to the batchId with schema or remove schema altogether?
If we care, it should probably match _cacheHash. However this will only cause an issue if the user changes the setting inside of the batch window, which is 50ms by default... Probably just a result of irresponsible use, and could be a consideration in documentation about autoBatching

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should language/apikey be added to the batchId with schema or remove schema altogether?
If we care, it should probably match _cacheHash. However this will only cause an issue if the user changes the setting inside of the batch window, which is 50ms by default...

I think using _cacheHash is a good idea! Then it'd behave correctly in all instances (without race conditions) and the only overhead would be to add the extra function call to the setters for language & api key. Then if a user changes the settings in the batch window it doesnt get batched but it gets the correct result.

@queicherius queicherius marked this pull request as draft October 13, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants