Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Protect against multiple concurrent downloads of the same model #116869

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

davidkyle
Copy link
Member

The tasks API is used to check for an existing model download when starting a model deployment but started tasks (or tasks about to start) are not immediately visible in the Tasks API leading to a race condition where successive calls to start model deployment may trigger multiple downloads. This is what happens in the Inference API where the default end points are used and multiple calls to inference will trigger the model download.

Download is a master node action so there is only one node in the cluster where the download can occur. Once we get to the download action and have access to the local taskManager checks for an existing download can be made more atomically. The change here is to look for a matching download task in the taskManager and if found register a listener on that tasks completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants