Variables outside of train() function are not being recognized

Hello team,
I have a problem with the train() function. I was able test my code locally using cruch.test(). But Cloud run is failing.
ID: #3691

Description:
I defined model definition outside the train() function and tested locally which works fine.

12:22:21 no forbidden library found
12:22:21 
12:22:21 running local test
12:22:21 internet access isn't restricted, no check will be done
12:22:21 

download data/X_train.parquet from https://datacrunch-com.s3.eu-west-1.amazonaws.com/production/adialab/data-releases/1/X_train.parquet
already exists: file length match
download data/y_train.parquet from https://datacrunch-com.s3.eu-west-1.amazonaws.com/production/adialab/data-releases/1/y_train.parquet
already exists: file length match
download data/X_test.parquet from https://datacrunch-com.s3.eu-west-1.amazonaws.com/production/adialab/data-releases/1/X_test_reduced.parquet
already exists: file length match

12:22:26 ---
12:22:26 loop: moon=269 train=True (1/5)
12:22:26 handler: train(data/X_train.parquet, data/y_train.parquet, resources)

Training model....
Fold: 1
[LightGBM] [Warning] lambda_l1 is set=27.411559849354177, reg_alpha=0.0 will be ignored. Current value: lambda_l1=27.411559849354177
Saving model in resources/model_fold_1.joblib
Fold: 1 --> Score: ....
Fold: 2
[LightGBM] [Warning] lambda_l1 is set=27.411559849354177, reg_alpha=0.0 will be ignored. Current value: lambda_l1=27.411559849354177
Saving model in resources/model_fold_2.joblib
Fold: 2 --> Score: ....
Fold: 3
[LightGBM] [Warning] lambda_l1 is set=27.411559849354177, reg_alpha=0.0 will be ignored. Current value: lambda_l1=27.411559849354177
Saving model in resources/model_fold_3.joblib
Fold: 3 --> Score: .....
Fold: 4
[LightGBM] [Warning] lambda_l1 is set=27.411559849354177, reg_alpha=0.0 will be ignored. Current value: lambda_l1=27.411559849354177
Saving model in resources/model_fold_4.joblib
Fold: 4 --> Score: ....
Fold: 5
[LightGBM] [Warning] lambda_l1 is set=27.411559849354177, reg_alpha=0.0 will be ignored. Current value: lambda_l1=27.411559849354177

12:51:10 handler: infer(data/X_test.parquet, resources)

Saving model in resources/model_fold_5.joblib
Fold: 5 --> Score: .....
Avg. Score: .....

12:51:15 ---
12:51:15 loop: moon=270 train=True (2/5)
12:51:15 handler: train(data/X_train.parquet, data/y_train.parquet, resources)

Processing data
Memory usage of dataframe is 1306.04 MB

But when running the same notebook in the cloud:

1:12:10 pm
user-code
Training model....
1:12:10 pm
user-code
Fold: 1
1:12:10 pm
user-code
train took 392.8317 seconds
1:12:10 pm
user-code
Matplotlib created a temporary cache directory at /tmp/matplotlib-83smpjhv because the default path (/root/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
1:12:10 pm
user-code
Traceback (most recent call last):
1:12:10 pm
user-code
  File "/runner/executor.py", line 195, in <module>
1:12:10 pm
user-code
    cli()
1:12:10 pm
user-code
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
1:12:10 pm
user-code
    return self.main(*args, **kwargs)
1:12:10 pm
user-code
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
1:12:10 pm
user-code
    rv = self.invoke(ctx)
1:12:10 pm
user-code
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
1:12:10 pm
user-code
    return ctx.invoke(self.callback, **ctx.params)
1:12:10 pm
user-code
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
1:12:10 pm
user-code
    return __callback(*args, **kwargs)
1:12:10 pm
user-code
  File "/runner/executor.py", line 185, in cli
1:12:10 pm
user-code
    timeit_noarg(handler)(x_train, y_train, model_directory_path)
1:12:10 pm
user-code
  File "/runner/executor.py", line 49, in wrapper
1:12:10 pm
user-code
    return func(**kwargs)
1:12:10 pm
user-code
  File "/context/code/main.py", line 287, in train
1:12:10 pm
user-code
    trained_model, score = train_model(model_lgb, X_tr, X_val, y_tr, y_val)
1:12:10 pm
user-code
NameError: name 'model_lgb' is not defined
1:12:10 pm
runner
command not exited correctly: 1

Even without looking at the code, I think its because your declared your model at the top level.
Everything need to be into functions.

Take at your submission’s code (on the ADIA Lab platform), you will see that all top level code have been commented.

So…The Parser will comment out everything except functions and Imports?

Yes, and except the classes too.

Everything inside functions is fine.

You need to load your model inside the infer function.