pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

oedokumaci · October 22, 2024, 10:02am

I have been getting this error for my submissions.

I tried specifying the library versions I use in my local setup, and also tried not specifying the versions (because crunch cli is usiing pandas and pyarrow so I was suspecting that I may be messing that up), neither helped.

Could you help me understand what’s going on before I reach the submission quota limits please?

Here’s the longer traceback

File "/usr/local/lib/python3.10/site-packages/crunch/runner/cloud.py", line 528, in sandbox

9:51:49 AM

return utils.read(self.prediction_path)

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/crunch/utils.py", line 125, in read

9:51:49 AM

return pandas.read_parquet(path, **kwargs)

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 667, in read_parquet

9:51:49 AM

return impl.read(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 274, in read

9:51:49 AM

pa_table = self.api.parquet.read_table(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1793, in read_table

9:51:49 AM

dataset = ParquetDataset(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1360, in __init__

9:51:49 AM

[fragment], schema=schema or fragment.physical_schema,

9:51:49 AM

File "pyarrow/_dataset.pyx", line 1431, in pyarrow._dataset.Fragment.physical_schema.__get__

9:51:49 AM

File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status

9:51:49 AM

File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status

9:51:49 AM

pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

enzo · October 24, 2024, 9:23pm

Sorry, I just saw your post.

Could you give me your Run ID?

oedokumaci · October 25, 2024, 6:03am

Sure no problem, #22584 is an example.

enzo · October 25, 2024, 9:30pm

The prediction does not seem to have been saved for some reason. Probably just a fluke, or did you manually call exit(0) in your code?

Also, what library are you using for your progress bar? It messes up the logs and I would like to fix it. A minimally reproducible example would be great.

oedokumaci · October 26, 2024, 10:24am

Nope, I didn’t call exit(0). I suspect it has something to do with Dask, as I got the same error all 3 times I tried running Dask. It works fine locally so could be something related to insufficient privileges.

For the progress bar, it should be tqdm.notebook that messed up the logs when running in a .py file. So the following should do the trick to reproduce:

from tqdm.notebook import tqdm
for _ in tqdm(Iterable):
    ...

Also, quick question: my last successful run is #22760, which at the time I manually selected as my submission. This will be selected to run against the private test data, correct?

Topic		Replies	Views
Error: pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes Broad Institute Crunch #1	21	214	January 29, 2025
Can you please help fix this bug? Broad Institute Crunch #3	1	41	January 29, 2025
Data download for local machine ADIA Lab	3	246	June 21, 2023
Y_test is getting downloaded, but not accessible. More details in below ADIA Lab	5	196	August 14, 2023
Submission format-- CSV or notebook? Broad Institute Crunch #1	9	134	November 28, 2024

pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

Related topics