-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dynamic server count detect logic of agent #358
Comments
I'd be interested to know more about what others in the community feel regarding this and is there a scope/priority for a better solution ? I'd like to collaborate with relevant folks and formulate an approach. |
I agree that the current approach is problematic. Introducing a new lightweight method makes sense to me, but I'd like someone with more experience on this project to weigh in. |
I'd be ok with having a ServerCount method to the server. Presumably then the agent would then poll the server. Not sure if we would need to secure the ServerCount method. If we did I'm not sure it would save us much over calling Connect. Another option would be to add a UpdateMetaData packet to our protocol. Then the servers could push an update server count to the agents. One thing we haven't address yet is how the server gets that value. Today it gets it from https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/master/cmd/server/app/options/options.go#L103. One advantage to the method you mentioned is that you can just a new server with the new value to our LB and eventually the agent will notice. It does mean that the agent has to assume it if gets conflicting answers it should always use the larger of the two vaues. Extending the protocol means you would want a mechanism to update the serverCount on the running server. That is more work but it means agent would respond more quickly and also that you can support down sizing. Mechanisms for lowering could be an admin command to the server, dynamic config or loading it from a CRD on the KAS. |
Good point. I've noticed that there was another rotting issue #273 tracking on this. We may consider that together with this issue. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
/lifecycle frozen |
We could keep this issue open, to improve "how the agents learn server count". There are some ideas here that weren't considered there. #273 can track the remaining server half of the feature. |
This is a complaint at kubernetes-sigs#358 Note that this extends an existing pattern.
FYI, if an agent receives an increased server count from a server it is already connected to, it doesn't reset the backoff and just uses the last set duration. I'm guessing that this isn't the desired behavior and that the agent should immediately reset the backoff and shift into "fast sync" mode if it is supposed to be connected to more servers than it currently is. This will only be a problem once #273 is implemented as there is currently no way to update the server count on the server side. EDIT: Here are some logs. Note that the connection attempts are separated by 30 seconds even after the server count gets updated to be greater than the client count.
|
Fixed by #643 ! |
Currently if we want the agent to dynamic detect the server count change, we have to enable the
syncForever
mode, however, with this mode on, agent just call theConnect
method of server constantly, and for most cases it just grap the sever count and close the connection (since the server count is not chaging that frequently). This cause a lot of garbage error log (specificallyserver.go:767] "stream read failure" err="rpc error: code = Canceled desc = context canceled"
on server side) and a waste of networking and computing resource.So would it be applicable that we introduce a lightweight method
ServerCount
to server and let agent call that method to sync server count instead of the heavierConnect
?The text was updated successfully, but these errors were encountered: