Hi @enzo
I’m struggle to find out why all my submissions are failing since this morning. crunch test worked fine. Here are 3 submissions that returned an error at the beginning (after 42 seconds):
Run #78581
Run #78594
Run #78578
Yesterday everthing was fine.
The error is: Task state is updated from RUNNING to FAILED on zones/europe-west1-b/instances/6766334117289933714 with exit code 1. with code 1
Thanks.
I think it’s a problem with the underlying infrastructure that fails.
enzo
May 11, 2026, 10:48am
4
Hi @mpware and @salty-francisco ,
We apologize for the downtime for the runs using GCP as a compute provider.
We identified the issue and deployed a fix.
Don’t hesitate to reach us if you are still experiencing issues.
1 Like
Hello, staff!
Because of these constant failures(just related to AWS), I have exausted all my compute time. It’s stack. I could not even pause or turn off such submissions. Same code for next time - works perfect.
How could we eliminate such problems?
What could I do with exausted limits(15 hours) because of that?
enzo
May 14, 2026, 8:55pm
6
Hello Efim,
To avoid people being stuck without any quota by accidentally consuming it all, failed runs do not count in the weekly quota.
Your run, #78934 , took over 12 hours by itself. After that, you ran multiple runs at ~30 minutes each.
Technically, you used your quota as expected.
I just checked the provider’s website (it’s GCP this year, not AWS) and the runtime is really 12 hours.
Are you saying that the 12 hours ran in only 30 minutes by submitting the exact same code?
Thank you for quick answer!
Yes, you are right. I missed this super long run. I was thinking that Terminated runs are considered in overall.
I did some investigation for this super long run:
As I see, it was because of super long determinism check:
Could we consider super long determinism check(required by the platform) as failed submission?
Because, training/inference/whole setup was pretty fast. Also, I always do my local runs before submission. So, it’s just because stacked determinism.
Here is the full log:
Created Task: projects/311731030305/locations/europe-west1/jobs/tournament--run-78934--1778706294783
started
downloading runner...
downloading code...
/context/code/catboost_info/time_left.tsv: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/catboost_info/time_left.tsv (16143 bytes)
/context/code/requirements.txt: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/requirements.txt (170 bytes)
/context/code/catboost_info/catboost_training.json: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/catboost_info/catboost_training.json (96759 bytes)
/context/code/catboost_info/learn/events.out.tfevents: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/catboost_info/learn/events.out.tfevents (54870 bytes)
/context/code/notebook.ipynb: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/notebook.ipynb (5878 bytes)
/context/code/main.ipynb: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/main.ipynb (24751 bytes)
/context/code/main.py: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/main.py (14040 bytes)
/context/code/catboost_info/learn_error.tsv: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/submissions/56483/catboost_info/learn_error.tsv (16780 bytes)
installing python requirements...
Running pip... Toggle 'Show advanced logs' in order to see more details
installing crunch-cli...
Running pip... Toggle 'Show advanced logs' in order to see more details
Changed status: RUNNING
Running pip... Toggle 'Show advanced logs' in order to see more details
downloading data...
/context/data/X_train.parquet: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/data-releases/234/X_train.parquet (218514418 bytes)
/context/data/y_train.parquet: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/data-releases/234/y_train.parquet (8356193 bytes)
/context/data/X_test.parquet: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/data-releases/234/X_test.parquet (216845130 bytes)
/context/data/y_train_index.parquet: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/data-releases/234/y_train_index.parquet (100089 bytes)
downloading model...
/context/code/resources/gru_weights.npz: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/models/49450/gru_weights.npz (71992 bytes)
/context/code/resources/model.joblib: download from https:crunchdao--competition--production.s3-accelerate.amazonaws.com/models/49450/model.joblib (539414 bytes)
prepare prediction directory...
executing - command=train
trained LGBM on 5036517 rows x 25 feats; positives=1283914 (25.49%)
GRU weights ready at /context/code/resources/gru_weights.npz
executing - command=infer
/context/code/main.py:355: RuntimeWarning: overflow encountered in exp
z = 1.0 / (1.0 + np.exp(-(iz + hz)))
/context/code/main.py:354: RuntimeWarning: overflow encountered in exp
r = 1.0 / (1.0 + np.exp(-(ir + hr)))
checking determinism by executing the inference again with 30% of the data (tolerance: 1e-08)
executing - command=infer
/context/code/main.py:355: RuntimeWarning: overflow encountered in exp
z = 1.0 / (1.0 + np.exp(-(iz + hz)))
/context/code/main.py:354: RuntimeWarning: overflow encountered in exp
r = 1.0 / (1.0 + np.exp(-(ir + hr)))
determinism check: passed
uploading result...
prediction: found file name=`prediction.parquet` size=22463612
prediction: done walking files.len=1 total_size=22463612
prediction: uploading name=`prediction.parquet`
model: found file name=`model.joblib` size=672822
model: found file name=`gru_weights.npz` size=71992
model: done walking files.len=2 total_size=744814 has_changed=True
model: uploading name=`model.joblib`
model: uploading name=`gru_weights.npz`
result submitted
ended
enzo
May 14, 2026, 9:40pm
9
Hi stormy-efim,
I just invalidated it as an exceptional occasion. You are now reporting only two hours of consumed quota.
The determinism check is only run on 30% of the data and should not normally take longer than your usual inference speed. It is a mandatory requirement to ensure your code is deterministic, the output is eventually discarded.