As we build a model to fit our data, we use training sample to train the model. From the training sample a portion of data is held out separately so that when we have our model ready we can actually cross check it using that held out data for which we already know the results/outcomes. By comparing what our model gave as outcome to what we already knew, we can get an idea of how good or bad our model is.
However, there remains a problem with this kind of approach. There could be a bit variation in that held out data which may cause a difference in outcome & hence we may infer that our model isn't good enough, even if in reality the difference was caused by some anomaly present in that small portion of held out data alone. In order to fix this problem statisticians came up with this concept of "K-fold Cross Validation ".
Here the steps go like this.
1. At first 1/k of the training data is held out.
2. Model is trained on the remaining data.
3. Apply the model on that 1/k held out data & record the measurement metrics.
4. Restore the 1/k held out data into the training sample & pick 1/k data again excluding those which were taken already before.
5. Repeat steps 2 & 3 until all the training data has been used.
6. Average or otherwise combine the model measurement metrics collected throughout each 1/k folds.
Important Notice for readers:
The intention of this website is not to re-invent the wheel & hence we recommend our readers not to use these resources on our website as a replacement for a standard book. There already are a number of great books available on these subjects. Rather our best effort always is to provide you the best possible intuitive understanding of the concepts used in Data Science so that it becomes easier for you to grasp the subject with greater clarity, as you study it in any regular curriculum or on your own. Thanks for being a splendid reader. Keep supporting our work if you feel our content helps even the slightest.
0 Comments
BRING THIS DEAD POST TO LIFE