-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak if writing to a channel that is never read from later #384
Comments
Right, yes. 🙂
|
I have discovered that RedisChannelLayer already has a concept of a "group expiry", which we may want to hook into for the "set a default TTL" scenario:
This is alluded to briefly in Channels documentation:
I was unable to find documentation for changing the
Do you think it would be appropriate to hook into the existing "group_expiry" concept, or do you think a separate "message_expiry" (or similar) would be more appropriate? |
No, let's use what's there.
Ultimately I think this is the issue: I end up having to look at the channels_redis source to work out what the options are. So first step would be a PR explaining what's already possible. (Maybe that'd be enough) |
Agreed that the documentation should be extended to show how to configure the built-in backend types. Unfortunately Fix 1 Sketch
I propose that
Fix 2 Sketch
Extend Fix 1 above so that RedisChannelLayer spawns a thread that calls expire_unread_channels() on itself every M seconds, for some M (perhaps |
Prior art: Sessions has a similar problem with clearing expired session records. It provides the following API:
We could spell expire_unread_channels() instead as clear_expired() for consistency, although I like the original name a bit better. :) |
Yes, good. I like the build up. If you're keen here, can I ask you to look at #165 first? It looks good at first sight, but I haven't had the bandwidth to sit down with it properly, so another pair of eyes there would be super handy! |
@carltongibson Is #165 potentially related to this bug? Reviewing that would require me to learn in detail how the current and proposed receiving system works, which take a block of time (2hrs) that is difficult for me to schedule in a timely fashion. |
Hey @davidfstr.
Yep. Open source. :) (This is the exact hold-up there... BANDWIDTH). No stress, no rush — if you don't have capacity to look at it, no problem! But... if you can slowly have it on the back burner, that would be a great help. (As I say, no stress about it!!!) Is it related? To the extent that it's on the short list of improvements to be made here, and we're looking at touching the receive buffer, yes. But... have a glance at it and see what you think. I'm happy to accept your input as it's available, so you tell me. 🙂 |
Got it. Touches the receive buffer. So we'd have to reconcile these changes eventually anyway. I'll keep the review in my queue. Projections at work have me looking at this again in 3 weeks, but I may get lucky and find some time earlier. |
OK, thanks @davidfstr! (As I say, no stress! It's OSS 🙂) |
Hello all! What's the status here? And what's the (average?) severity of it not being fixed (if that's even a reasonable question)? All of the interactions into channels redis on my project happen on a celery worker so worst case the worker fails (as opposed to ballooning RAM on an API machine), but again, rather find a way around that! |
Status: Bug doesn't hurt enough that anyone has put in the effort to fix it yet. To workaround this (and other kinds of slowish memory leaks), I've configured most of my web services to restart after serving (N + random jitter) requests. Gunicorn in particular has a config option for this. |
@jheld You can use https://github.com/CJWorkbench/channels_rabbitmq (disclaimer: I'm the author) instead of channels_redis. It uses message TTLs, limits buffer sizes, warns on buffer overflow and recovers gracefully. @davidfstr this bug did hurt enough that we put in the effort to fix it. The fix is |
A version of this bug that I've also seen - you don't just see this if the channel is never read from later. You can also see if it the channel is read from too slowly. In the default I'd argue that these per-channel https://github.com/django/channels_redis#capacity I do think a goal of passive cleanup makes sense, but I think a reasonable upper bound on queue size would likely prevent many people from getting into bad situations in the first place. |
Hi @ryanpetrello -- Thanks for your work looking at this. Your points all seem reasonable. If you want to block out proof-of-concept sketches, really happy to discuss those. I'm working on the v3 update at the moment, and won't have capacity to cycle back to this immediately, so all help very much appreciated! |
I think this issue is on the wrong project. It should be on channels_rabbitmq doesn't have this bug. It has local_capacity and local_expiry to solve this problem. I have helpful (but abrasive!) comments about the Redis layer's architecture -- and Channels' architecture in general. If you're dismayed by The Channels layer spec -- and, heck, Channels' main loop -- has a big flaw: clients poll for messages instead of subscribing to messages. By definition, a client calls That leads to a corollary on the delivery side. When a Channels layer is delivering a message, it must deliver to non-existent channels!
|
Hey @adamhooper — We like the comments 😀 Yes, I know it should be on channels_redis, but I don't want to close the issues until I've got the bandwidth to resolve them.
Really happy to look at sketches of a reworking there. |
A I think it would be easier to write against this API. I don't have time to actually submit and test a pull request, though :). |
Hello, everybody! Please, tell is there any progress on the issue? Maybe close the issue if it's abandoned. I'm also experiencing a memory leak in daphne-redis-websockets stack. Can't find the cause though. |
When somebody steps forward to fix the issue then it will be closed. |
Got it! Thank you. |
is |
I workaround via restarting the worker that runs channels. From a previous comment of mine:
|
It appears that if you write a message to a channel, for example via
group_send
, and no reader ever appears on that channel, the messages will remain in the in-memory queuechannels.layers.channel_layers.backends['default'].receive_buffer
indefinitely when using theRedisChannelLayer
backend. In particular I have captured a server that has over 100k items in that dictionary.One way to avoid this problem would be to extend the API for
group_send
with a time-to-live parameter so that messages would expire over time if they weren't read. Thoughts?My
pip freeze
, in case it's useful:The text was updated successfully, but these errors were encountered: