Determining the appropriate statistical test for comparing performance differences between ML/DL models in survival analysis
I have conducted an experiment where I trained and tested eight ML and DL models, each undergoing hyperparameter optimization, on survival analysis tasks. After tuning, each model was trained once on training data and tested once on test data, resulting in eight c-index scores representing the models’ performance.