Ray OutOfDiskError even when disk space is available

  Kiến thức lập trình

I’m using Ray on my Mac to run some Python computations. The code will write with large HDF files with size ranging from 25 to 50 GB. I received the following error message while executing my code:

(raylet) [2024-05-16 11:56:24,761 E 1443 16308] (raylet) file_system_monitor.cc:111: /tmp/ray is over 95% full, available space: 18296090624; capacity: 494384795648. Object creation will fail if spilling is required.

Error executing job with overrides: []

Traceback (most recent call last):
  File "/Users/***/1_Projects/***/t1_main.py", line 107, in main
    outputs = ray.get(futures)
              ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/env_030/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/env_030/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/env_030/lib/python3.11/site-packages/ray/_private/worker.py", line 2623, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/env_030/lib/python3.11/site-packages/ray/_private/worker.py", line 861, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OutOfDiskError): ray::process_hdf() (pid=1462, ip=127.0.0.1)
ray.exceptions.OutOfDiskError: Local disk is full
The object cannot be created because the local object store is full and the local disk's utilization is over capacity (95% by default).Tip: Use `df` on this node to check disk usage and `ray memory` to check object store memory usage.

This error is despite significant storage available on the mac drive (~161GB), allocated space for spill on a separate drive (~538 GB), and space available for data storage (~1TB). I’m trying to understand why this error is happening and how can I fix it.

Snippets from my code:

ray.init(
_system_config={
    "automatic_object_spilling_enabled": True,
    "object_spilling_config": json.dumps({
        "type": "filesystem",
        "params": {"directory_path": "/Volumes/T7/Ray_Spill"}
    })
}
)

Processing and writign data:

futures = [process_hdf.remote(x, y, z) for x, y, z in zip(batch,
repeat(time_stamps),
repeat(cell_centers))]
outputs = ray.get(futures)

log.info('Writing the files to disk...')

# Save the data with compression
file_path = f'{file_storage_path}/model_runs_chunk_{idx}.h5'
write_to_hdf5(file_path, outputs)

LEAVE A COMMENT