-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LNC Receive #1342
base: master
Are you sure you want to change the base?
LNC Receive #1342
Conversation
We have a |
I've been using |
Yep, but what do you need a voltage node for if you're doing that? |
i've tested it on the dev environment and on voltage, just to test it on another environment |
Ah I see! |
Just a heads up: I'll try to review this tomorrow |
Fell behind today with meetings. I'll pull this first thing tomorrow. |
I've done a QA pass. It works really well! A lot of the code has lots of great workarounds for lnc's bugs - I've never used I'll begin looking closer now, but two main things I'll be looking at:
|
So, still unsure what the right move is, but things we can do for the above:
If you'd like to do these revisions, I can promote this issue to |
Thank you for your fairness in handling these issue, but I wouldn't feel comfortable accepting a reward for incomplete work. So, unless you prefer to take over, I’d like to make these revisions myself 🫡 .
My understanding is that we are going to need a new pg session for each lnc call for this to work, since session-level advisory locks do not prevent the session for which they are acquired from acquiring them again, so async tasks that share the same session will run at the same time. Instead we could use a combination of:
Does this make sense?
Not knowing much about the production environment, i went for vm because it allows to isolate the context without booting a new nodejs instance or requiring IPC, so performance wise it is much better. But if spawning a new runtime for every LNC call is not an issue, then |
In general, I prefer not to be concerned with hypothetical performance problems before a feature has seen production performance problems, and while spawning subprocesses can be CPU intensive ime this will probably be manageable (our prod instances only get ~15% cpu consumption on average).
Sort of. There might be a better way to use advisory locks than I suggested, but we shouldn't need in-memory mutexes except to maybe limit us to one subprocess running at a time. The way I imagine it: subprocess opens a db connection (session), grabs a session-level advisory lock, returns bolt11/errors via ipc, exits. To be honest, I'm uncertain about how to handle this even knowing the full production context. The only thing I'm certain about is that we'll have to use an advisory lock. (We could also use a skip lock postgres queue but lnc-web's namespace pollution and closures (ime we can't reuse an existing lnc instance) and bugs will probably lead us back to vm/subprocesses anyway.) I'd like to avoid you having to guess about stuff. You seem super capable, so if you're hacking on SN I don't want stuff out of your control to be a bottleneck. Still, if you want to keep at this problem, I'm happy to support it until it we can get something merged. |
Makes sense, i'll adapt the code as you proposed 👍
Ok, i was thinking about using the advisory lock for queuing the launch of the lnc subprocesses from the main sn process. If you are ok with it i will give a try to implement it in the way i've proposed first (that should work fine with the same connection pool used by the backend) and then once we can see the whole code, if it ends up being suboptimal for whatever reason, i can move the lock to the subprocess as you proposed, or whatever we figure makes sense. |
Not if we only let one subprocess run at a time per backend, which is what I think I'd do, but again, I'm not sure what will still work here without being some huge hack. |
New and removed dependencies detected. Learn more about Socket for GitHub ↗︎
🚮 Removed packages: npm/[email protected]), npm/[email protected]) |
I've ended up with an implementation that is something in-between of what was discussed here. Subprocess instead of vmThe child_process (aka worker.cjs) accepts a list of actions via IPC, runs them sequentially and then returns an output and quits. LockingThe
The The lock is acquired with pg_advisory_lock or by polling pg_try_advisory_lock if PARALLEL=true and SINGLE_SESSION=true The in-memory mutex is used:
This behavior is toggled with PARALLEL=false, but imo once you have a bunch of wallets triggering autowithdraw it might degrade pretty quickly with a queue that is too long and causes calls to testCreateInvoice for wallet attachment or createInvoice to timeout or wait for a very long time. The code is currently set to run in PARALLEL=true SINGLE_SESSION=true, that should be the most performant, imo. We can test all the modes and decide which one to keep in the final PR, or just keep it configurable so you can easily switch if you notice performance degradation in production. |
Man you turned this out fast. It's really impressive considering how hamstrung LNC is. I'll take a closer look this weekend, but tbh, it's looking like lnc receives may be more trouble than they're worth. If LNC only had this single connection limitation, it'd be worth complex code to give people the option of using it, but it's also extremely slow and flakey. This isn't your fault obviously - LNC is what it is. Again, I'll take a look this weekend and pay you regardless of what we decide. |
Just brainstorming ... If LNC is built assuming a single long-lived connection on an end-user device, which seems to be the case given the way lnc and its libraries are written, perhaps the ideal solution will be wrapping https://github.com/lightninglabs/lightning-node-connect into a daemon ( Currently, it only builds wasm and mobile libraries, but all the logic is there - we're just missing the right interface. We'd basically just write another Another option is building https://github.com/lightninglabs/lightning-node-connect into binary with a cli. We'd still need mutexes, but it'd be way more nimble than creating a nodejs subprocess just to use wasm. |
I am not 100% sure if keeping a long living connection to the mailbox on the server, for each lnc user, is scalable. We should figure out at least how many concurrent connections from the same host the mailbox will accept.
probably would improve boot time a bit That said, before thinking about maintaining a fork for this library, that already doesn't seem something very stable, do we have an idea on how slow/heavy the current implementation is and if an improvement is needed? |
Most of my concerns relate to:
So you’re right that a go sub process wouldn’t be much better. Re: too many connections. I’d be surprised if more than 10 people use LNC receives to start, let alone so many it’s overwhelming. It’d be pretty trivial to create a FIFO to keep it manageable too. I’m still chewing on all of this though. I’ve been really unhappy with our lnc sending (very slow and unreliable). |
Description
LNC-web based LNC receive attachment.
Solves #1141 .
Additional Context
The lnc-web library, the lnc wasm client and the Go wasm_exec runtime are designed for browser environments, not for backend services. As a result, I had to implement some workarounds, which are less than ideal. I've been invited to send the PR anyway and the implementation appears to be stable overall, so I leave it to your judgment whether it is worth to merge this or not.
Workaround 1: Global context isolation and polyfill
The lnc-web and wasm_exec libraries heavily rely on browser standard apis and the global context to store their state, to protect the backend global context from being polluted and to avoid lnc calls from conflicting with eachother, this pr runs every call inside an isolated
node:vm
instance.Workaround 2: Force kill the go runtime
There is a bug in lnc-web that causes an infinite loop when the connection can't be established ( lightninglabs/lnc-web#83 ), to work around this issue i made a
kill()
function that sets the internalexited
flag causing the go runtime to kill all pending tasks with the "Go program has already exited" exception. This happens inside thenode:vm
runtime that is thrown away after every call, so it shouldn't leave anything dangling.Checklist
Are your changes backwards compatible? Please answer below:
yes
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7
tested with a local connection created with
litcli sessions add --type custom --label <your label> --account_id <account_id> --uri /lnrpc.Lightning/AddInvoice
and with voltage (but i had to disable strict permissions for that)
Did you introduce any new environment variables? If so, call them out explicitly here:
no