[RESEARCH] Performance of Foam in large projects #1375

pderaaij · 2024-07-24T18:22:08Z

Describe the bug

Various users reported performance issues with large notes or projects with many notes. This issue serves as a collection of those reports. It acts as the documentation of ongoing research on performance.

Small Reproducible Example

No response

Steps to Reproduce the Bug or Issue

..

Expected behavior

We want to optimise the performance of Foam, even in large projects.

Screenshots or Videos

No response

Operating System Version

All OS

Visual Studio Code Version

Latest at least

Additional context

No response

pderaaij · 2024-07-24T18:37:51Z

For now, I am doing some tests and research with https://github.com/github/docs. A relatively large project with many markdown files that I can use for research.

pderaaij · 2024-07-24T18:49:54Z

Just came across https://github.com/rcvd/interconnected-markdown/tree/main/Markdown. This contains not only many notes, but also highly linked. Great for researching the performance of Foam

pderaaij · 2024-07-24T19:33:54Z

Some initial investigation points to the function listByIdentifier in workspace.ts This function is used in both find as in wikilink-diagnostics.ts.

The function is defined as:

  public listByIdentifier(identifier: string): Resource[] {
    const needle = normalize('/' + identifier);
    const mdNeedle =
      getExtension(needle) !== this.defaultExtension
        ? needle + this.defaultExtension
        : undefined;
    const resources: Resource[] = [];
    for (const key of this._resources.keys()) {
      if (key.endsWith(mdNeedle) || key.endsWith(needle)) {
        resources.push(this._resources.get(normalize(key)));
      }
    }
    return resources.sort(Resource.sortByPath);
  }

For my test repo thethis._resources is a Map of 10,000 entries. This function is called for every processing of a wikilink. Both on boot and graph update. I am thinking that the for loop is too inefficient for large projects. Whether it is many notes or many links.

I will do some experiments with optimising the for loop in this area and see if that boosts performance.

DrakeWhu · 2024-07-25T10:27:06Z

I am interested in this. I have a graph of aproximately 3k notes and 8k links. I also have some Python scripts I am using to make community detection over the graph. It results in around 50~100 or more communities. On the visualization, everything shows no problem.

The problem arises if I want to explore only one community or a couple of them. The community of each note is saved at it's type so I can take out communities in the FOAM viz. If I only want to show one community of around 50 notes, it shows but then the physics break, the force directed approach can't handle it for some reason. If I try to select two communities, the moment I select the second one, the physics break and the links dissapear. I don't know what the reason might be but maybe the nodes not shown are still loaded and the physics engine tries to calculate anyways? whatever it is, the most types existing on the graph, the worse the performance gets.

I've been thinking several solutions to this for some time and I've though a couple:

There could be a command where you just plot a set of types.
A force-directed approach is not ideal for big networks. Asking the webview to calculate approximately 1600 forces each frame and do the according displacements is overkill. We could try to use a better thought approach as clustering far away nodes or something alike. Even getting rid of the physics all together might be a solution but they are always nice to have. Something like a statistical approach would make things smoother

I can share some plots I've made but using python. Not that we should change the viz, but the physics engine of the plots I mention is less realistic and more aimed towards aesthetics not physical realism. I know that D3.js has a lot of options for visualization and that FOAM uses force-graph, which in turn uses D3, but I've never personally used it.

Anyway I will gladly help to research this.

pderaaij · 2024-09-04T17:57:17Z

I've been looking at the initial workspace loading time. At first sight, not much to be gained in this area. Most time is spent in reading the files from the datastore. Perhaps the parser could be made more efficient in the end, but don't see an opportunity here in short term.

theAkito · 2024-11-03T09:04:06Z

Second this.

Long story short, it is so extremely terrible, I have no choice but to change to Dendron and see, if it works with that framework, at least.

Whenever I want to write any document, the editor has too extreme lags. This extension makes it unusable.
The second Foam is disabled, everything is as smooth as gliding down butter.

theAkito · 2024-11-06T01:36:40Z

To make this "research" succeed, one has to strip the whole project from all advanced features. I would personally start with only letting it generate files from templates. Then, proceed with enabling single features inside the same huge repository of files and directories. Then, once a specific feature is added and performance starts to degrade, we know the culprit. Maybe it's also due to a set of features, as a whole.

riccardoferretti · 2024-11-06T14:30:44Z

@theAkito - The work that @pderaaij has done to improve performance has just been released (0.26.2), would love to hear how things are for you now

theAkito · 2024-11-06T14:49:38Z

@riccardoferretti

Nice, good timing. Yesterday, I was trying to figure out how to quickly build the extension locally with the most recent commits, but then my limited time ran out.

I will check out the update today!

theAkito · 2024-11-06T15:36:42Z

@riccardoferretti

I just used it in production and it is not unbearably slow anymore, it is only very slow now. Lot of lags still continue to bother the author, however, it is manageable to some degree.

pderaaij · 2024-11-06T18:48:02Z

Could you elaborate which actions are slow? Would help me to look to dive into specifics. Is the project publicly available? Perhaps I can open it on my machine and do some research.

theAkito · 2024-11-06T19:16:47Z

Could you elaborate which actions are slow?

Typing is super slow. Typing anything, anywhere in the project. It does not matter.
It gets especially extremely slow, when pressing Enter for a newline or pressing Backspace for deleting characters. Feels like I'm typing on nano through an SSH tunnel, connected via dial-up modem.

My gut tells me, that it constantly scans all text for links/highlights or whatever feature needs to scan all text.

I'm additionally 95% sure, that it gets worse with very large markdown files, again pointing at the scanning idea.
It's a long time ago, but I had some notes, where I copy pasted a lot of script dumps into and these were particularly slow. Hence, I stopped doing it back then, as it it became unbearable quickly.

That said, I just wish, there were a lot of feature flags. For me, the most important feature is to create notes from templates.
I never use the graph for seeing linked notes, ever, as its performance was always abysmally atrocious, especially when back then you weren't able to ignore specific folder.
I never use the right sidebar for selecting chapters.
I set up tags, but do not actively use them.
I rarely link notes between each other.
However, I really need the templating feature. It's the big selling point for me.

So, I'd be already happy, when I could just strip down the extension to this basic feature, of allowing me to generate new notes from a template. With only this feature enabled, there shouldn't be any way to slow down anything, as it's only a one-shot feature, which never requires any kind of consistent or frequent document scanning.

I would go as far as saying, that a stripped down extension like that could easily be published as a separate extension.
Yesterday, I have been researching Visual Studio Code extensions, which provide just this templating feature and found exactly zero results. Any result that came close to it, aren't maintained, anymore.

Is the project publicly available? Perhaps I can open it on my machine and do some research.

It's my private knowledge base. You would look into my brain, if you would see it.

However, I'm pretty sure, if you continue to test with those huge projects you already had mentioned, you should be able to roughly replicate what I'm going through.

Here is some further data, in case it helps.

I have a big PC. Not the freshest one, but 64GB DDR4 high speed RAM with a pretty nice i7 CPU from just over 2 years ago should be enough for some Visual Studio Code note taking extension.
I was experimenting with GitDoc a while ago, so the project has almost 4000 commits. Not using it anymore, though, so not the culprit, right now.
I have a directory hierarchy in my project. It's not super deep or anything, but I imagine it is probably deeper than what most people do. I guess, 3 levels of directory depth should be about the average. If I count in the top folder, where all my relevant notes are in, you can count it as 5 levels of directory depth on average, roughly.
For troubleshooting this madness, I had turned off all editor suggestions, which I thought were the culprit for document scanning. Did not help, performance remained the same. Only disabling the Foam extension helped.
I have plenty of extensions, but none of them are document related. They are all related to technologies like Docker, Terraform, Ansible, etc. So, at most, related to YAML documents.
My project is based on the official Foam template for Foam projects.

If you have any further questions, that would help you fix this performance hell, I would be glad to be of assistance.

theAkito · 2024-11-06T19:23:49Z

Typing is super slow. Typing anything, anywhere in the project. It does not matter.

Let me elaborate a tiny bit on what "slow" means.

It's like packet loss.
You type, type, nothing is shown, then 5(!!!) seconds later, all the typed text suddenly appears.
It feels like, you have 900ms ping in Counter Strike or whatever.
This does not happen rarely, or sometimes. It always happens, constantly & reproducibly.
As already mentioned multiple times, disabling the Foam extension fixes all these issues within the blink of an eye.

pderaaij · 2024-11-06T19:49:46Z

Thanks for the info. I'll use it to have a look at things later this week. To be honest, I am not sure that it is some continuous scanning process. Especially as Foam builds the graph on load and after that it uses file watchers to monitor changes. But, I might be missing something here.

Just wanted to check if you see anything strange in CPU usage? For example, the symptoms mentioned in #1161.

riccardoferretti · 2024-11-07T13:51:54Z

I just used it in production and it is not unbearably slow anymore, it is only very slow now.

@theAkito I will take that as a win for now :) glad to hear that @pderaaij 's work made quite an improvement.

But you are right that there is also some live parsing we do on the document itself, see document-decorator.ts, hover-provider.ts, navigation-provider.ts and wikilink-diagnostic.ts.
Currently each one of these processes is parsing the file independently; this is inefficient, and we could optimize the process by, in some way, sharing this process.
I haven't thought about this, nor I know how much it would improve the situation (I guess best case scenario is 75%, but there are lots of caveats there), but I would expect the improvement to be quite noticeable for large documents.

Regarding your point on feature flags, I understand what you mean, but I am not planning to implement that just yet.
I see how it could be a good workaround, but most of the computation is necessary for Foam regardless, so I believe the performance improvements we are discussing here are the right approach to make things better at this stage.
I would prefer to disable long processes for large files rather than turning off the feature altogether. I want things to work out of the box, I don't want users to tinker just to get Foam to work with their repo. I am not idealistic about it, but this is my current thinking.

Thanks for sharing your thoughts and pointing things out to us!

pderaaij · 2024-11-11T19:40:34Z

Diving into the issue brought me to WikilinkCompletionProvider.provideCompletionItems. In this method, we list all resources of the workspace and iterate over each resource.

const resources = this.ws.list().map(resource => {
      const resourceIsDocument =
        ['attachment', 'image'].indexOf(resource.type) === -1;

      const identifier = this.ws.getIdentifier(resource.uri);

This is not very performant in large workspaces and that causes the delay. It seems vscode simply hangs on this function and stalls the editor.

Will tinker about a solution here..

theAkito · 2024-11-11T19:44:45Z

Small update report incoming.
On average, the performance has improved noticeably following the most recent update.
I do not have to necessarily disable the extension, anymore.

Keep going and please reduce more overhead in whatever the extension does!

theAkito · 2024-11-11T19:47:31Z

Diving into the issue brought me to WikilinkCompletionProvider.provideCompletionItems. In this method, we list all resources of the workspace and iterate over each resource.

This sounds like what I initially expected to be the culprit. My initial debugging step was to disable all completion methods, as it seemed like it hung the most and the hardest, whenever it tried to give me a 100 useless suggestions, I neither needed nor wanted in that particular case.

However, after disabling all auto-completions known to me, it did not help. Maybe this type of completion isn't even configurable via settings...

pderaaij · 2024-11-11T19:50:46Z

Could you do a little check for me? Just open a fresh vscode session. After Foam is loaded, open a note and hover over a wikilink. I would expect this would show a tooltip quickly.

theAkito · 2024-11-11T19:52:16Z

we list all resources of the workspace and iterate over each resource.

One workaround for this would be to implement some type of cache, so it iterates only on certain occasions, which should be configurable via settings.
For example, it iterates at least on project startup, when launching Visual Studio Code.
Then, it may be configured to index every hour, or whatever interval the user chooses.
Or, its schedule can be disabled altogether and the user has to manually run the Visual Studio Code command "Foam: Re-index workspace" or something like that.
Of which the latter is the one, I would personally prefer, because it is again such a rarely used functionality, I do not even need to have it run every 8 hours. Maybe on project launch, at most.

theAkito · 2024-11-11T19:53:52Z

I would expect this would show a tooltip quickly.

Yes, although I would debate the term quickly in this context. ;)

pderaaij · 2024-11-11T19:58:55Z

Thanks for testing. It helps me to validate my hypothesis.

My current hypothesis is that the problem is the collection of resources in WikilinkCompletionProvider.provideCompletionItems. We are never cancelling this operation, for example, after the user types the next character. This overflows the process. Next to that, it seems we are not filtering the resources based on the present input.

I want to see if we can improve this.

pderaaij · 2024-11-12T19:26:05Z

@theAkito I am a bit on a limp here, but I am quite certain the work of #1411 will improve the performance in your situation. Please let us know after the release of the PR. I am curious!

theAkito · 2024-11-14T19:12:08Z

@theAkito I am a bit on a limp here, but I am quite certain the work of #1411 will improve the performance in your situation. Please let us know after the release of the PR. I am curious!

I just used it in production twice, today.

First, a document, which started at 79 lines and grew to 510 lines @ 20535 characters over 3 hours.

Second, a document which grew from ~30 mostly empty lines to 82 lines @ 3068 characters.

Both times, the lags were much worse, right from the beginning. At some point, I wanted to disable the Foam extension to work around the problem, as I needed quick response, however I did not manage to find the extension within the few seconds I had time to do it.

I cannot explain, why the performance got worse. The only thing I can think of, is that, as far as I remember, Visual Studio Code was already on and idling for about 10 hours, before I started to write those documents.
Does not seem related, but I have no other hint or remotely reasonable explanation for why the performance degraded with the new version, suddenly, instead of at least staying the same.

For next time, I can specifically make sure to restart Visual Studio Code, before using it in production, just to make sure.

Maybe, I will find a way to test again soon, however I won't use it in production until next week, so the real life test will have to wait.

The types of lags were very familiar. Pressing Enter and Backspace were especially lagging heavily. Sometimes, I typed a word like "word", but it typed it as "wrod" or something, because some characters are apparently delayed in such laggy situations.

That said, I have only minimal word completion enabled. I can only tab-complete Markdown snippets. No completion for files or anything requiring to traverse through any hierarchy are enabled.

theAkito · 2024-11-15T10:56:32Z

Used it in production with a fresh Visual Studio Code instance, for only ~15 minutes. Performance seemed, as it was before this patch, however the timespan to test was too short, to give a clear view on the situation.
Still, it seems like a fresh instance works better, right away.

pderaaij mentioned this issue Sep 2, 2024

[WIP] improve graph performance #1391

Closed

This was referenced Nov 12, 2024

0.26.2 hanging on link completion #1410

Closed

Stop iterating over all resources for finding matching identifiers #1411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RESEARCH] Performance of Foam in large projects #1375

[RESEARCH] Performance of Foam in large projects #1375

pderaaij commented Jul 24, 2024

pderaaij commented Jul 24, 2024

pderaaij commented Jul 24, 2024

pderaaij commented Jul 24, 2024

DrakeWhu commented Jul 25, 2024

pderaaij commented Sep 4, 2024

theAkito commented Nov 3, 2024 •

edited

Loading

theAkito commented Nov 6, 2024

riccardoferretti commented Nov 6, 2024

theAkito commented Nov 6, 2024

theAkito commented Nov 6, 2024

pderaaij commented Nov 6, 2024

theAkito commented Nov 6, 2024

theAkito commented Nov 6, 2024

pderaaij commented Nov 6, 2024

riccardoferretti commented Nov 7, 2024 •

edited

Loading

pderaaij commented Nov 11, 2024

theAkito commented Nov 11, 2024

theAkito commented Nov 11, 2024

pderaaij commented Nov 11, 2024

theAkito commented Nov 11, 2024

theAkito commented Nov 11, 2024

pderaaij commented Nov 11, 2024

pderaaij commented Nov 12, 2024

theAkito commented Nov 14, 2024 •

edited

Loading

theAkito commented Nov 15, 2024

[RESEARCH] Performance of Foam in large projects #1375

[RESEARCH] Performance of Foam in large projects #1375

Comments

pderaaij commented Jul 24, 2024

Describe the bug

Small Reproducible Example

Steps to Reproduce the Bug or Issue

Expected behavior

Screenshots or Videos

Operating System Version

Visual Studio Code Version

Additional context

pderaaij commented Jul 24, 2024

pderaaij commented Jul 24, 2024

pderaaij commented Jul 24, 2024

DrakeWhu commented Jul 25, 2024

pderaaij commented Sep 4, 2024

theAkito commented Nov 3, 2024 • edited Loading

theAkito commented Nov 6, 2024

riccardoferretti commented Nov 6, 2024

theAkito commented Nov 6, 2024

theAkito commented Nov 6, 2024

pderaaij commented Nov 6, 2024

theAkito commented Nov 6, 2024

theAkito commented Nov 6, 2024

pderaaij commented Nov 6, 2024

riccardoferretti commented Nov 7, 2024 • edited Loading

pderaaij commented Nov 11, 2024

theAkito commented Nov 11, 2024

theAkito commented Nov 11, 2024

pderaaij commented Nov 11, 2024

theAkito commented Nov 11, 2024

theAkito commented Nov 11, 2024

pderaaij commented Nov 11, 2024

pderaaij commented Nov 12, 2024

theAkito commented Nov 14, 2024 • edited Loading

theAkito commented Nov 15, 2024

theAkito commented Nov 3, 2024 •

edited

Loading

riccardoferretti commented Nov 7, 2024 •

edited

Loading

theAkito commented Nov 14, 2024 •

edited

Loading