How can I find the number of rows in X_test passed to infer()?

I’m trying to determine how many rows are in X_test when passed to the infer() function. From what I can tell, X_test is an iterable (a generator wrapper) that requires returning a result before yielding the next item.

This design makes it difficult to:

  • Parallelize inference of multiple rows across all available CPU cores.
  • Schedule tasks or allocate the 15-hour compute budget effectively, since the total number of iterations is unknown.

Could you clarify how many items infer() is supposed to process, or whether there’s a way to prefetch or inspect the size of X_test?

Any insights would be appreciated.

You are right; it does make parallelization difficult because you need to yield the result before you can get the next dataset. This is by design to prevent finding correlations between datasets.

However, I will add support for __len__ so that tools like tqdm can properly provide an estimate.

The number is: 10,000 datasets