Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For example, in machine learning, a single time of training set and testing set split is always not representative enough to avoid bias. Even a single time of k-fold cross validation sometimes is still not enough. A solution is to set a loop across some integers, and use each integer as the seed of the random state variable, and run multiple splits using different random seeds. Thus we can simply send out each loop as a sub job using the sub job ID as the random seed.

Using a job array can even do more. If multiple pre-processing methods and different machine learning models need to be compared against each other on the same input data, we can set up those choices as different attributes in our manifest file, and generate sub jobs for each combination.

...