I had built a classification model using xgboost in R. While I found a lot of comprehensive tutorials on the whole process (data cleaning, imputations, dividing to test and training, tuning, feature importance), I could not find much information on how to use your trained model on new data set. How to finally use your model on data that you don’t have the y label for.
The procedure was mostly the same. The only thing that was of crucial importance was that the new data matrix be exactly like your matrix for training set. That meant same order of columns, and same data type.
In our case, the new set is saved in the data-frame Testing (not to be confused with the test set, for evaluating model performance). The training set is read in the data-frame Training.
After creating the test and new set Dmatrix, we make predictions using the relevant parameter and threshold values, that we finalized basis cross validation, looking at AUC etc.
The code snippet can also be found at: Github_link