pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

I have been getting this error for my submissions.

I tried specifying the library versions I use in my local setup, and also tried not specifying the versions (because crunch cli is usiing pandas and pyarrow so I was suspecting that I may be messing that up), neither helped.

Could you help me understand what’s going on before I reach the submission quota limits please?

Here’s the longer traceback

File "/usr/local/lib/python3.10/site-packages/crunch/runner/cloud.py", line 528, in sandbox

9:51:49 AM

return utils.read(self.prediction_path)

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/crunch/utils.py", line 125, in read

9:51:49 AM

return pandas.read_parquet(path, **kwargs)

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 667, in read_parquet

9:51:49 AM

return impl.read(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 274, in read

9:51:49 AM

pa_table = self.api.parquet.read_table(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1793, in read_table

9:51:49 AM

dataset = ParquetDataset(

9:51:49 AM

File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1360, in __init__

9:51:49 AM

[fragment], schema=schema or fragment.physical_schema,

9:51:49 AM

File "pyarrow/_dataset.pyx", line 1431, in pyarrow._dataset.Fragment.physical_schema.__get__

9:51:49 AM

File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status

9:51:49 AM

File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status

9:51:49 AM

pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

Sorry, I just saw your post.

Could you give me your Run ID?

Sure no problem, #22584 is an example.

The prediction does not seem to have been saved for some reason. Probably just a fluke, or did you manually call exit(0) in your code?

Also, what library are you using for your progress bar? It messes up the logs and I would like to fix it. A minimally reproducible example would be great.

Nope, I didn’t call exit(0). I suspect it has something to do with Dask, as I got the same error all 3 times I tried running Dask. It works fine locally so could be something related to insufficient privileges.

For the progress bar, it should be tqdm.notebook that messed up the logs when running in a .py file. So the following should do the trick to reproduce:

from tqdm.notebook import tqdm
for _ in tqdm(Iterable):
    ...

Also, quick question: my last successful run is #22760, which at the time I manually selected as my submission. This will be selected to run against the private test data, correct?