Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_datatree fails for a HDF5 file over network if import netCDF4 is included #9743

Open
5 tasks done
DFEvans opened this issue Nov 7, 2024 · 5 comments
Open
5 tasks done

Comments

@DFEvans
Copy link

DFEvans commented Nov 7, 2024

What happened?

Attempting to open a (specific?) HDF5 file over the network via fsspec and xarray.open_datatree() fails with an error inside h5py, if import netCDF4 occurs in the script

What did you expect to happen?

The dataset opens successfully

Minimal Complete Verifiable Example

import earthdata
import netCDF4 # unused, but causes the bug
import xarray

# Requires the `earthdata` package and a NASA Earthdata login configured
earthdata.login()

url = "https://data.laadsdaac.earthdatacloud.nasa.gov/prod-lads/CLDMSK_L2_VIIRS_SNPP/CLDMSK_L2_VIIRS_SNPP.A2023316.0124.001.2023316135139.nc"

fs = earthaccess.get_fsspec_https_session()
with fs.open(url) as f:
    xarray.open_datatree(f)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/danielevans/repositories/image-processing/viirs/viirs/query_and_geolocate_viirs.py", line 409, in <module>
    main()
  File "/home/danielevans/repositories/image-processing/viirs/viirs/query_and_geolocate_viirs.py", line 385, in main
    get_cloudfree_area_con(
  File "/home/danielevans/repositories/image-processing/viirs/viirs/query_and_geolocate_viirs.py", line 197, in get_cloudfree_area_con
    with open_viirs_file(i.data_links()[0], data_dir) as cloudmask_datatree:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/danielevans/repositories/image-processing/viirs/viirs/query_and_geolocate_viirs.py", line 142, in open_viirs_file
    yield xarray.open_datatree(f, engine="h5netcdf")
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/api.py", line 1089, in open_datatree
    backend_tree = backend.open_datatree(
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 478, in open_datatree
    groups_dict = self.open_groups_as_dict(
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 549, in open_groups_as_dict
    group_ds = store_entrypoint.open_dataset(
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/store.py", line 43, in open_dataset
    vars, attrs = filename_or_obj.load()
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/common.py", line 231, in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 256, in get_variables
    return FrozenDict(
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/core/utils.py", line 417, in FrozenDict
    return Frozen(dict(*args, **kwargs))
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 257, in <genexpr>
    (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 205, in open_store_variable
    dimensions = var.dimensions
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/h5netcdf/core.py", line 260, in dimensions
    self._dimensions = self._lookup_dimensions()
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/h5netcdf/core.py", line 154, in _lookup_dimensions
    if _unlabeled_dimension_mix(self._h5ds) == "labeled":
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/h5netcdf/core.py", line 453, in _unlabeled_dimension_mix
    dimset = set([len(j) for j in dimlist])
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/h5netcdf/core.py", line 453, in <listcomp>
    dimset = set([len(j) for j in dimlist])
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/danielevans/.cache/pypoetry/virtualenvs/image-processing-LWpwzOMN-py3.10/lib/python3.10/site-packages/h5py/_hl/dims.py", line 60, in __len__
    return h5ds.get_num_scales(self._id, self._dimension)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5ds.pyx", line 72, in h5py.h5ds.get_num_scales
  File "h5py/defs.pyx", line 4461, in h5py.defs.H5DSget_num_scales
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)

Anything else we need to know?

I see the same issue attempting to read the dataset over HTTPS and directly from S3, both on NASA's own server and in my own S3 bucket. The file opens successfully locally, and via xarray.open_dataset.

Give me a shout if you need the file in question put somewhere, with a suggestion of a location - it's ~100MB in size.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.153.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development

xarray: 2024.10.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.8.2
distributed: None
matplotlib: 3.9.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 74.1.2
pip: 24.1.2
conda: None
pytest: 7.4.4
mypy: 1.11.2
IPython: 8.27.0
sphinx: 7.4.7

@DFEvans DFEvans added bug needs triage Issue that has not been reviewed by xarray team member labels Nov 7, 2024
Copy link

welcome bot commented Nov 7, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Nov 7, 2024

@DFEvans The traceback indicates it used h5netcdf, did you also use h5netcdf when calling locally or used open_dataset? Could you explicitly request engines h5netcdf and netcdf4 to see if this is a single h5netcdf issue or if both engines are affected?

I remember there have been issues wrt fsspec not working over the network but locally, but can't find them atm.

@kmuehlbauer kmuehlbauer added topic-backends and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 7, 2024
@DFEvans DFEvans changed the title open_datatree fails for a HDF5 file over network open_datatree fails for a HDF5 file over network if import netCDF4 is included Nov 8, 2024
@DFEvans
Copy link
Author

DFEvans commented Nov 8, 2024

So, a few more bits:

  • netcdf4 requires a filepath, not an opened file handle (whether fsspec or just open). I've forced the engine to h5netcdf locally just in case, but it still opens successfully
  • The bug actually occurs only if import netCDF4 occurs in the file - it's not used in the code I'm working on, but was a leftover from a pre-DataTree implementation. This suggests that there's some nasty interaction between the various libraries!
  • Without import netCDF4, the file opens successfully, but a segfault occurs when the script ends. I'll open a separate issue for that if it seems an actual problem (I think it's to do with the order in which closes are called).

@kmuehlbauer
Copy link
Contributor

Thanks for the details, @DFEvans.

  • Without import netCDF4, the file opens successfully, but a segfault occurs when the script ends. I'll open a separate issue for that if it seems an actual problem (I think it's to do with the order in which closes are called).

Not sure, but you also might use the with-contextmanager with xr.open_datatree, this should normally take care of close.

@headtr1ck
Copy link
Collaborator

Might be also caused by lazy importing in xarray?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants