[ML] Protect against multiple concurrent downloads of the same model #116869

davidkyle · 2024-11-15T13:15:06Z

The tasks API is used to check for an existing model download when starting a model deployment but started tasks (or tasks about to start) are not immediately visible in the Tasks API leading to a race condition where successive calls to start model deployment may trigger multiple downloads. This is what happens in the Inference API where the default end points are used and multiple calls to inference will trigger the model download.

Download is a master node action so there is only one node in the cluster where the download can occur. Once we get to the download action and have access to the local taskManager checks for an existing download can be made more atomically. The change here is to look for a matching download task in the taskManager and if found register a listener on that tasks completion.

davidkyle added 2 commits November 15, 2024 13:00

Check for existing download

7d98561

Test concurrent inference on default endpoint

4605650

elasticsearchmachine added the v9.0.0 label Nov 15, 2024

Allow for deployments already started

2509f92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Protect against multiple concurrent downloads of the same model #116869

[ML] Protect against multiple concurrent downloads of the same model #116869

davidkyle commented Nov 15, 2024

[ML] Protect against multiple concurrent downloads of the same model #116869

Are you sure you want to change the base?

[ML] Protect against multiple concurrent downloads of the same model #116869

Conversation

davidkyle commented Nov 15, 2024