Pod disruption schedule #1719

jukie · 2024-09-29T03:25:18Z

Description

What problem are you trying to solve?
I have some workloads that are sensitive to interruptions at certain points of the day and thus are using the karpenter.sh/do-not-disrupt annotation. I'd like the ability to allow disruptions to these pods at specific points via cron format schedule.

How important is this feature to you?
In order to allow reclaiming nodes for expiration or underutilization I'm currently running my own controller that watches DisruptionBlocked events and then removes the do-not-disrupt annotation if the pods are marked with another one indicating the schedule for when disruptions are allowed. I'd like something similar to be added upstream and get rid of my own controller.

karpenter.sh/disruption-schedule - cron format of when disruptions are allowed (e.g. 0 14 * * 6)
karpenter.sh/disruption-schedule-duration - duration for which the schedule is active (e.g. 3h)

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-09-29T03:25:26Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

njtran · 2024-10-01T22:39:29Z

Is this for your job/task related pods? Would it be sufficient for you if the do-not-disrupt annotation respected a duration string for how long it couldn't be disrupted, and then otherwise is fine to ignore?

jukie · 2024-10-02T00:35:15Z

This would be for always-running job/task workers or singleton services. terminationGracePeriod solves the duration piece for do-not-disrupt and gives us a guaranteed max lifetime for a node but the use case would be for workloads that want to allow disruption at specific times of the day such as a legacy monolith that only runs during business hours. In that scenario I want an extension on do-not-disrupt so that if a node is marked for disruption during the disruption-schedule then it's safe to disrupt immediately.

redhug1 · 2024-10-07T09:09:06Z

Would you be able to share your "own controller" code ?
Thanks.

njtran · 2024-10-07T20:22:42Z

@jukie what i'm wondering is if this is a function of the lifetime of the pod in any way? What sort of workloads only want to be disrupted at a certain time as opposed to some other signal in the cluster (like other pods going away). I'm not sure I like encoding this sort of API surface onto the pod itself. It's very loosely defined, easier to run into validation issues, and doesn't promote elasticity.

On top of this, Karpenter has to reason about when it's fine to enqueue a disruption vs when it's fine to actually drain the pod. Let's say I had this schedule + duration, do I want to nominate the node for consolidation if it's not in it's disruptable period? Or do i wait for it to be in a disruptable period before I nominate it? If so, if it goes out of being able to disrupt, then i'm now left with a pod I can't evict until my TGP, which could be overall higher cost.

jukie · 2024-10-08T05:35:58Z

@njtran I'll try to expand a bit on the long running task executor example - these don't execute as Jobs or ephemeral pods but are polling a queue constantly checking for available work. Some of these tasks could take minutes and some might even take days and are expensive to re-execute so I want to limit disruption to these workers outside of a "maintenance window" such as between 12am-3am on a particular day of the week.

Configuring terminationGracePeriod and setting do-not-disrupt on the pod won't allow for this as the pods will stay running until TGP is reached where they'll be forcefully killed at any time of day.
NodePool disruption budgets are also insufficient since Karpenter would still reclaim Expired NodeClaims outside the budget window.

For your other questions:

Let's say I had this schedule + duration, do I want to nominate the node for consolidation if it's not in it's disruptable period?

My PR (#1720) uses the existing logic that do-not-disrupt uses by updating podutil.IsEvictable() and podutil.IsDisruptable() to consider this new annotation. If the window is inactive it would lead to the same DisruptionBlocked events that refresh every ~5min until it becomes active.

If so, if it goes out of being able to disrupt, then i'm now left with a pod I can't evict until my TGP, which could be overall higher cost.

Wouldn't the higher cost scenario already be the default? Adding the ability to consider a disruptable window would lower overall cost by being able to disrupt nodes before TGP. It'd probably be a good idea to set a minimum window duration to avoid the scenario you describe though.

jukie · 2024-10-08T06:33:25Z

@redhug1 https://github.com/jukie/karpenter-deprovision-controller

zack-johnson5455 · 2024-10-16T21:49:31Z

I'd like to add support to this issue. My use case:

We have services with varying tolerance for disruption. We'd like to allow services to express their own requirements: "I can tolerate X restarts every Y days and (optionally), as long as it's within Z timeframe"

We'd prefer to limit the number of different nodepools we manage.

We're currently running 0.36 and are having to implement something similar to https://github.com/jukie/karpenter-deprovision-controller (where we strategically remove do-not-disrupt annotations).

Our understanding is that in 1.0, we can take advantage of expiration + terminationGracePeriod to enforce a maximum node age, regardless of the do-not-disrupt annotation. But that still would make it so that any service who uses the do-not-disrupt annotation is subject to the same frequency of disruption.

I think the proposal described in this issue would give us what we want. Any thoughts or recommendations?

jukie · 2024-10-17T16:51:59Z

@njtran any more thoughts on this one?

jukie added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 29, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 29, 2024

jukie changed the title ~~Pod disruption window~~ Pod disruption schedule Sep 29, 2024

jukie linked a pull request Sep 29, 2024 that will close this issue

feat: pod disruption schedule #1720

Draft

jukie mentioned this issue Oct 17, 2024

v1: Bring back a "soft expiration" mechanism #1750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod disruption schedule #1719

Pod disruption schedule #1719

jukie commented Sep 29, 2024 •

edited

Loading

k8s-ci-robot commented Sep 29, 2024

njtran commented Oct 1, 2024

jukie commented Oct 2, 2024 •

edited

Loading

redhug1 commented Oct 7, 2024

njtran commented Oct 7, 2024

jukie commented Oct 8, 2024 •

edited

Loading

jukie commented Oct 8, 2024 •

edited

Loading

zack-johnson5455 commented Oct 16, 2024

jukie commented Oct 17, 2024

Pod disruption schedule #1719

Pod disruption schedule #1719

Comments

jukie commented Sep 29, 2024 • edited Loading

Description

k8s-ci-robot commented Sep 29, 2024

njtran commented Oct 1, 2024

jukie commented Oct 2, 2024 • edited Loading

redhug1 commented Oct 7, 2024

njtran commented Oct 7, 2024

jukie commented Oct 8, 2024 • edited Loading

jukie commented Oct 8, 2024 • edited Loading

zack-johnson5455 commented Oct 16, 2024

jukie commented Oct 17, 2024

jukie commented Sep 29, 2024 •

edited

Loading

jukie commented Oct 2, 2024 •

edited

Loading

jukie commented Oct 8, 2024 •

edited

Loading

jukie commented Oct 8, 2024 •

edited

Loading