Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows spot instances can fail to register due to incorrect tagging #350

Open
chadlwilson opened this issue Aug 3, 2024 · 0 comments
Open

Comments

@chadlwilson
Copy link
Member

There seems to be some race conditions possible when creating and tagging spot instance requests which can lead to a Windows spot instance being created with name ${cluster}_LINUX_SPOT_INSTANCE despite being a Windows instance.

image

At root this seems to be due to the LINUX fallback at

return Arrays.stream(values()).filter(p -> p.name().equalsIgnoreCase(platform)).findFirst().orElse(LINUX);

This seems to be used by the code here during server ping

public void tagSpotInstances(PluginSettings pluginSettings) {
List<SpotInstanceRequest> spotRequests = spotInstanceHelper.getSpotRequestsWithARunningSpotInstance(pluginSettings, pluginSettings.getClusterName());
LOG.debug("[server-ping] There are total of: '{}' Spot Requests with Spot-Request-Ids: '{}' which have a running Spot Instance. Starting tagging of Spot Instance.",
spotRequests.size(), spotRequestIds(spotRequests));
Map<String, List<SpotInstanceRequest>> requestsByPlatform = groupSpotRequestsByPlatform(spotRequests);
requestsByPlatform.forEach((platform, spotInstanceRequests) -> {
List<String> spotInstanceIds = spotInstanceRequests.stream().map(SpotInstanceRequest::getInstanceId).collect(toList());
spotInstanceHelper.tagSpotResources(pluginSettings, spotInstanceIds, Platform.from(platform));
});
}

Somewhere along the line, the platform tag is either empty (rather than not found?) or has got some default value of Linux set. Not quite sure of the root here, but likely some kind of race condition when creating new spot requests and/or tagging untagged instances via the server ping.

There is also some hard-to-grok code here that is outside the synchronized block and suspicious looking synchronization in general:

/*
A spot instance request is tagged post creation. AWS takes time to sync up the the tags on the spot request, hence
querying aws for the spot requests with tag filters does not yield results. Hence the SpotInstance service maintains
a list of new spot instance requests and removes from the list in future calls or as part of server ping.
*/
if (isSpotRequestValid(spotInstanceRequest)) {
untaggedSpotRequests.add(spotInstanceRequest);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant