-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIDSification of Winkler et al. dataset and other datasets for training/validation of ICA labeling models #8
Comments
For that matter, bidsification of as many datasets as possible, but organized semantically according to the criterion that was specified in #5 (comment) (i.e. different recording montages, hardware, etc.) |
I can have a look at the bidsification of this dataset, as I also will have to do the ANT and EGI ones as well anyway. |
Some things to think about:
|
For the raw EEG datasets that can be used for benchmarking, I have:
It would be good to centralize those datasets, any idea where? Additional points to keep in mind for the processing of those datasets into labelled ICs:
For winkler et al. I looked quickly and the ICA decomposition seems to use: https://www.jmlr.org/papers/volume5/ziehe04a/ziehe04a.pdf
We basically have 30 IC / file labeled. But anyway, for bidsification, I looked into the specification derivatives for electrophysiological data, and the only reference I find is https://docs.google.com/document/d/1PmcVs7vg7Th-cGC-UrX8rAhKUHIzOI-uIOh69_mvdlw/edit#heading=h.f548zgpgxhiu so it's still at an extension proposal stage? I'll follow that for now, but if you know of another specification for ICA, I'm looking for it. |
Perhaps for now, we can share via OneDrive, or Dropbox? I have access to OneDrive via institution still and can setup that if you guys don't have Dropbox pro? Open to other ideas too. I think long-term we want to store it on openneuro.org if that's okay? We might actually be able to leverage openneuro.org right away if you're okay with it. We can store and create private BIDSified datasets that we can then pull from even. I think programmatic access would require the dataset to be public(?)
Yeah there is no agreed-upon spec yet for ICA, but I think we should just follow the "format" that is suggested for derivatives and ICA. E.g. filenaming and directory structure at the very minimum. We can store files in the ICA format output by MNE-Python. |
Good idea for |
Other datasets: https://github.com/agramfort/artifact-learn/tree/master/1-%20extract%20basic%20info%20from%20databases looks like we can probably download online? |
For more data and inspiration, we can look at this review paper: https://iopscience.iop.org/article/10.1088/1741-2560/12/3/031001/pdf |
For references: ANT to BIDS: https://gist.github.com/mscheltienne/fe3dcc7dafef7539018a6a00ba73afed |
See: https://github.com/adam2392/improve_icalabel for now scripts centralized into one repo. |
Some open questions that we can defer to later of course:
@anandsaini024 do you have any existing code for building out benchmark models that you want to push up to the Anything that you are able to work on while we sort out the GUI and pipeline for annotating the data? |
+1 for the IC time-series as RawArray, with an extension e.g. In this convert function: https://github.com/adam2392/improve_icalabel/blob/96522dacd045a5caa50f7f4653d9fd988a29bfa1/mnestudy/ica_to_bids/mara.py#L35-L44 @anandsaini024 This is what I briefly described to you this evening. It would be great if you could finish this conversion function. |
Alright, I will pick this up. |
@anandsaini024 have you preprocessed the ANTS dataset already? I am going to use one of the subjects as a test subject for the hs student to QA his annotations. If you did, do you mind pushing up the script to |
@adam2392 You should have received something from openneuro on your Gmail for dataset Note: I deleted the old dataset with only raw data and replaced it with this one.. I did not figure out how to easily update the existing dataset 🤯 |
Can you share with me again? Yeah updating is a pain. Adding files is easy, Deleting files is kind of a pain. Modifying files is a super pain. Then my plan for the hs student is to:
It seems we might not be able to get him to fully annotate the ICA components as desired, but hopefully we can get at least some of the raw annotated. |
So.. the dataset does not appear even on my account.. except if I explicitly enter the corresponding URL. |
I unfortunately did not. Perhaps it just didn't finish uploading yet? Openneuro even tho it's "nice" seems pretty buggy -__- |
Yep.. I had multiple issues with it recently. |
Oh I see it now on openneuro :p |
Same it finally popped up on my account.. |
Agenda for tomorrow so I don't forget:
Outcome action items:
|
Hello, Thank you for mne-iclabel! I would like to test a new IC feature extracted from the time series in training a multi-class IC classifier (ideally, more than 2 types of ICs).
Thank you!! 😃 |
@chmendoza As far as I know, there is no large publicly available dataset for IC classification. We were working on processing a dataset (referenced above) to test the IC classification. The feature/label dataset for ICLabel is available, but not the original IC's. That dataset is available here: https://github.com/lucapton/ICLabel-Dataset. |
@chmendoza Feel free to make a separate GH issue/PR, if/when your model is ready for review. We would love to include this into MNE-ICALabel to propagate it to the MNE community. |
The dataset referenced in https://github.com/agramfort/artifact-learn/issues/1#issuecomment-906141483 has good and bad components labeled after ICA.
To facilitate easy training/testing, it would be good to construct a BIDsification script to convert the dataset into BIDs format sometime using mne-bids.
cc: @jacobf18 @mscheltienne @anandsaini024
The text was updated successfully, but these errors were encountered: