Support both local kernels and remote (via kernel gateway) at the same time. #1187

ojarjur · 2023-01-25T21:45:55Z

Problem

I love the option of connecting a Jupyter server to a kernel gateway, but it is currently an all-or-nothing experience; either all of your kernels run locally or they all run using the kernel gateway.

I would like it if I could pick either local or remote when I am selecting a kernelspec.

For example, I want to be able to have two notebooks open in JupyterLab, and be able to run one of them using a kernel started by my local server, and have the other one using a kernel started by a kernel gateway.

Proposed Solution

It is possible to solve this by using some sort of an intermediary proxy as a kernel gateway, which is responsible for deciding whether to run the kernels locally or remotely.

In fact, I have a proof-of-concept implementation of this and was able to verify that it works as you might hope.

However, this approach has a big drawback in that you have to then run two separate instances of the jupyter server locally; one for creating kernels and one for connecting to this proxy, and those two different jupyter servers have to use different configs (telling them where to run the kernels).

It would be much simpler (both in the sense of being cleaner and easier to use) if the jupyter server was able to do this switching natively instead of relying on an intermediary proxy.

Additional context

The proof of concept I linked to above is very specific to my use case and not a general solution to this problem (e.g. it assumes a specific, hard-coded form of auth, etc).

The general approach, however, should be reusable and work in an in-jupyter-server based solution:

For kernelspecs, take both the local and remote kernelspecs and combine them into a unified view... adding a prefix onto each kernelspec name to identify if it is local or remote.
For creating kernels, figure out if the kernelspec is local or remote, strip off the prefix, and then forward the request to the corresponding backend.
For switching kernels, send a delete request to the old backend and then a create request to the new one.
Keep a map in memory from kernel IDs to the backend that holds them.
Forward all other kernel requests to the corresponding backend.

kevin-bates · 2023-01-25T23:09:56Z

Hi @ojarjur - thank you for opening this issue. I'm unable to respond to this right now but have spent a fair amount of time thinking about this and sharing some of those ideas with others and can see your general approach is similar. (I don't think number 4 is necessary as that functionality should "just work" via the existing Change Kernel behavior.)

I hope to be able to circle back to this in a few days (hopefully sooner).

kevin-bates · 2023-01-27T22:37:43Z

The general approach I've been mulling over is to introduce the notion of a Kernel Server where a single jupyter server could be configured with one or more Kernel Servers. A Kernel Server would consist of a MappingKernelManager, a KernelSpecManager and, in some fashion, a KernelWebsocketHandler. In essence, a KernelServer is essentially a GatewayClient with a special "local" KernelServer that exists by default.

I'm not sure you had joined the Server/Kernels Team Meeting by this time, but I raised the question about how traits could be configured to apply to multiple instances where each instance had potentially different values. Since traits are class-based, specifying configuration settings for each KernelServer (besides the "local' KernelServer since it really wouldn't require any configuration for B/C purposes) would probably require some kind of "config loader" mechanism, that I'm sure we could tackle. (Thinking about a kernel_servers subdirectory in jupyter_server_config.d that contains a set of "named" files.)

Each KernelServer has a name that, as you also intimated, would be used to prefix that server's kernelspecs. We may also want to adjust the Display Names since these are what the user sees and these would require uniqueness.

The handlers would essentially do what they do today but call into the KernelServers (rather than MappingKernelManager) and the KernelServers would act as a broker taking a kernel name, locating its prefix in the set of KernelServers, and forwarding the request to that KernelServer.

As you also intimated, a second index, key'd by kernel_id, would also result in the applicable KernelServer's instance. This would be used when a WebSocket request (or any lifecycle request) is submitted, which uses the kernel_id, the KernelServer would then be identified and that request forwarded to that kernel server's KernelWebsocketHandler, or server, etc.

So I think this becomes a matter of the following (at a high level of course):

Introduce a brokering layer - KernelServers - which is essentially injected in front of the MappingKernelManager.
Address how specific configurations can be loaded into the KernelServers. Thinking that this would iterate the previously mentioned files in juptyer_server_config.d/kernel_servers and instantiate instances when KernelServers is instantiated - in addition to the default "local" KernelServer (which could be configured to be off if any deployments didn't want to support any local kernels).
Update the handlers to forward the requests to the KernelServers broker. This may be as easy as supporting the same methods, requiring minimal changes.
We'd probably want to introduce KernelSpecCaching (could port over EG's) since these would get pounded. This would be a great place to introduce events so the front end isn't requesting these every 10 seconds like it does today.

At any rate, I think we're on the same page here. At a high level, this is doable and would be a useful addition. By default, the server would behave just as today - supporting only local kernels. I think we could also accomplish backward compatibility for single-gateway configs keying off --gateway-url or the single configuration items in the server config file itself.

Thoughts?

ojarjur · 2023-01-31T00:50:49Z

@kevin-bates thanks for the detailed and thoughtful response, and for bringing the topic up in the weekly meeting.

Also, thanks for the class references, that will help if I try to prototype this by converting my existing proof-of-concept into a Jupyter server extension.

The general approach I've been mulling over is to introduce the notion of a Kernel Server where a single jupyter server could be configured with one or more Kernel Servers. A Kernel Server would consist of a MappingKernelManager, a KernelSpecManager and, in some fashion, a KernelWebsocketHandler. In essence, a KernelServer is essentially a GatewayClient with a special "local" KernelServer that exists by default.

I like this approach and I had considered something along these lines but I wasn't sure if the jupyter-server project was the right home for that level of configurability.

Conceptually, if we take it as granted that we always want to support local kernels, then this can be viewed as an instance of the "zero, one, or infinitely-many" design question in terms of kernel gateways... by default there are zero kernel gateways supported, you can currently opt into supporting one kernel gateway, and the approach you've described extends that to infinitely-many kernel gateways.

The "infinitely-many" case is the most general, but supporting it inside of jupyter-server itself opens up a huge dimension of complexity that can be hard to manage.

The question of configuration that you mentioned is one example of this complexity, but it's not the only one. There's also complexity in terms of the question of how multiple backends are managed, and I don't think that a single approach to that will satisfy all users.

For example, the simplest approach would be a fixed set of backends. However, I suspect most users would be better served by having some sort of automated discovery mechanism that dynamically finds all of the Jupyter servers available to them. That's inherently specific to the user's environment so we can't build a one-size-fits-all solution to it.

After you mentioned this at the weekly meeting I had some time to mull it over, and wanted to present another option:

What if, instead of a set of static configs, we defined a base class for providing the set of KernelServers? It could have just a single method that returns a set of KernelServer instances.

We could provide canned implementations of this class for the local-only use case, the one-remote-only use case, and the both-one-local-and-one-remote use case.

Then, users who wanted to use arbitrarily many backends could provide their own implementation of this base class that took advantage of what they know about their particular environment (e.g. knowing how to look up backends and which configs are common to all of them).

What do you think of that option? Would that still line up with what you wanted?

Each KernelServer has a name that, as you also intimated, would be used to prefix that server's kernelspecs. We may also want to adjust the Display Names since these are what the user sees and these would require uniqueness.

Yes, you are right. I forgot to mention it, but that is exactly what I've been doing. I add a suffix on each display name which defaults to " (local)" for local kernelspecs and " (remote)" for remote ones. I wanted these suffixes to be configurable so that the user can change them to their local language.

The handlers would essentially do what they do today but call into the KernelServers (rather than MappingKernelManager) and the KernelServers would act as a broker taking a kernel name, locating its prefix in the set of KernelServers, and forwarding the request to that KernelServer.

As you also intimated, a second index, key'd by kernel_id, would also result in the applicable KernelServer's instance. This would be used when a WebSocket request (or any lifecycle request) is submitted, which uses the kernel_id, the KernelServer would then be identified and that request forwarded to that kernel server's KernelWebsocketHandler, or server, etc.

That sounds good to me.

kevin-bates · 2023-01-31T16:56:44Z

Hi @ojarjur - thanks for the response. I think there's some alignment here (in the majority) but I'm hoping we could perhaps have a call together because I believe there may be some "terminology disconnects" that I'd like to iron out. Could you please contact me via email (posted on my GH profile) and we can set something up?

In the meantime, I would like to respond to some items.

if we take it as granted that we always want to support local kernels

I don't think we should take this for granted. Users configuring the use of a gateway today essentially disable their local kernels and I don't think we should assume that every installation tomorrow will want local kernel support. Many will, as evidenced by those that have asked those kinds of questions, but I think there's some value to operators to know there won't be kernels running locally. Nevertheless, this is still a zero, one, or many proposition with respect to gateway servers.

One thing I think we can take for granted is that a given Jupyter Server deployment will have at least one server against which kernels can be run. I also think we can assume that that server (against which kernels are run) will be managed via the existing api/kernels (and api/kernelspecs) REST API.

I suspect most users would be better served by having some sort of automated discovery mechanism that dynamically finds all of the Jupyter servers available to them. That's inherently specific to the user's environment so we can't build a one-size-fits-all solution to it.

I'd like to better understand how this discovery mechanism would work. (I'm assuming by "Jupyter servers" [and elsewhere "backends"] you mean "Gateway servers" - or "Kernel servers".) In this particular instance, I think operators would prefer explicit configurations that are loaded at startup, so they know exactly where their requests are going. But, again, I may not be understanding how discovery would work.

users who wanted to use arbitrarily many backends could provide their own implementation of this base class that took advantage of what they know about their particular environment (e.g. knowing how to look up backends and which configs are common to all of them)

I agree that any class we provide should be extensible and substitutable, provided there's a well-known interface that the server can interact with. However, given the assumption above that all kernel servers will honor the existing REST APIs, I think a single implementation would be sufficient (at least for a vast majority of cases). Today's GatewayClient is essentially a class that communicates with a server to manage kernels. Yes, each destination would require a separate configuration, but, from an implementation standpoint, I think it's relatively straightforward. Perhaps I'm not understanding your comment correctly, but I would definitely like to avoid the need for operators to have to implement their own "KernelServer" class in order to connect to a different server via the REST API.

e.g. knowing how to look up backends and which configs are common to all of them

I think this is driving at the discovery stuff and could see an implementation of KernelsServers (note the plurality) that "discovers" its KernelServer instances, not by reading files in a directory and loading each instance, but by discovering then in some other way, like hitting a clearinghouse (DB) of sorts, for example.

This is an interesting conversation. Please reach out to me via my email. If we're in widely separate TZs I'd still like to have the conversation.

If others are interested in joining our sidebar, please let me know.

Zsailer · 2023-01-31T19:39:02Z

Just got back from parent leave and catching up on this thread.

If y'all have a meeting, I'd love to join (or at least see some notes! 😎).

kevin-bates · 2023-01-31T20:41:03Z

@ojarjur - could you please send Zach an invite for our 2 pm (PST) chat today?

This change adds support for kernel spec managers that rename kernel specs based on configured traits. This is a necessary step in the work to support multiplexing between multiple kernel spec managers (jupyter-server#1187), as we need to be able to rename kernel specs in order to prevent collisions between the kernel specs provided by multiple kernel spec managers.

ojarjur added the enhancement label Jan 25, 2023

ojarjur mentioned this issue Apr 7, 2023

Gateway manager retry kernel updates #1256

Merged

ojarjur mentioned this issue Apr 20, 2023

Merge the gateway handlers into the standard handlers. #1261

Merged

ojarjur mentioned this issue May 2, 2023

Add support for renaming kernelspecs on the fly. #1267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support both local kernels and remote (via kernel gateway) at the same time. #1187

Support both local kernels and remote (via kernel gateway) at the same time. #1187

ojarjur commented Jan 25, 2023

kevin-bates commented Jan 25, 2023

kevin-bates commented Jan 27, 2023

ojarjur commented Jan 31, 2023

kevin-bates commented Jan 31, 2023

Zsailer commented Jan 31, 2023

kevin-bates commented Jan 31, 2023

Support both local kernels and remote (via kernel gateway) at the same time. #1187

Support both local kernels and remote (via kernel gateway) at the same time. #1187

Comments

ojarjur commented Jan 25, 2023

Problem

Proposed Solution

Additional context

kevin-bates commented Jan 25, 2023

kevin-bates commented Jan 27, 2023

ojarjur commented Jan 31, 2023

kevin-bates commented Jan 31, 2023

Zsailer commented Jan 31, 2023

kevin-bates commented Jan 31, 2023