2026-03-18 14:54 Tags:
Introduction to Cross Validation
Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsData Example
df = pd.read_csv("../DATA/Advertising.csv")
df.head()Train | Test Split Procedure
Workflow
-
Clean and adjust data as necessary for
Xandy -
Split data into Train/Test for both
Xandy -
Fit/Train scaler on training
Xdata -
Scale
Xtest data -
Create model
-
Fit/Train model on
X_train -
Evaluate model on
X_testby creating predictions and comparing toy_test -
Adjust parameters as necessary and repeat steps 5 and 6
Create X and y
X = df.drop('sales', axis=1)
y = df['sales']Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=101
)Scale Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)Note:
-
The scaler is fit only on
X_train -
Then the same fitted scaler is used to transform both
X_trainandX_test
Create Model
from sklearn.linear_model import Ridge
model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)Poor alpha choice on purpose:
model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)Evaluation
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)Adjust Parameters and Re-evaluate
model = Ridge(alpha=1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)Another Evaluation
mean_squared_error(y_test, y_pred)Observation:
-
alpha=1performs much better thanalpha=100in this example -
This process can be repeated until satisfied with performance metrics
Note:
-
RidgeCVcan automate this for Ridge regression -
The purpose here is to understand the general cross-validation process for any model
Train | Validation | Test Split Procedure
This is also called a hold-out set approach.
Key idea:
-
Do not adjust parameters based on the final test set
-
Use the final test set only for reporting final expected performance
Workflow
-
Clean and adjust data as necessary for
Xandy -
Split data into Train/Validation/Test for both
Xandy -
Fit/Train scaler on training
Xdata -
Scale evaluation data
-
Create model
-
Fit/Train model on
X_train -
Evaluate model on evaluation data by creating predictions and comparing to
y_eval -
Adjust parameters as necessary and repeat steps 5 and 6
-
Get final metrics on test set
- not allowed to go back and adjust after this
Create X and y
X = df.drop('sales', axis=1)
y = df['sales']Split Twice: Train | Validation | Test
from sklearn.model_selection import train_test_split
# 70% of data is training data, set aside other 30%
X_train, X_OTHER, y_train, y_OTHER = train_test_split(
X, y, test_size=0.3, random_state=101
)
# Remaining 30% is split into evaluation and test sets
# Each is 15% of the original data size
X_eval, X_test, y_eval, y_test = train_test_split(
X_OTHER, y_OTHER, test_size=0.5, random_state=101
)Scale Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_eval = scaler.transform(X_eval)
X_test = scaler.transform(X_test)Create Model
from sklearn.linear_model import Ridge
# Poor Alpha Choice on purpose!
model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_eval_pred = model.predict(X_eval)Evaluation
from sklearn.metrics import mean_squared_error
mean_squared_error(y_eval, y_eval_pred)Adjust Parameters and Re-evaluate
model = Ridge(alpha=1)
model.fit(X_train, y_train)
y_eval_pred = model.predict(X_eval)Another Evaluation
mean_squared_error(y_eval, y_eval_pred)Final Evaluation
After this step, parameters should no longer be changed.
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)Cross Validation with cross_val_score
X = df.drop('sales', axis=1)
y = df['sales']Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=101
)Scale Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)Create Model
from sklearn.linear_model import Ridge
model = Ridge(alpha=100)Run Cross Validation
from sklearn.model_selection import cross_val_score
# SCORING OPTIONS:
# https://scikit-learn.org/stable/modules/model_evaluation.html
scores = cross_val_score(
model,
X_train,
y_train,
scoring='neg_mean_squared_error',
cv=5
)
scoresNote:
-
cv=5means 5-fold cross validation -
For error metrics like MSE, scikit-learn returns the negative version, so lower error corresponds to a larger negative score
-
To interpret MSE more naturally, take the absolute value of the mean
Average CV Score
abs(scores.mean())Adjust Model Based on Metrics
model = Ridge(alpha=1)
scores = cross_val_score(
model,
X_train,
y_train,
scoring='neg_mean_squared_error',
cv=5
)Mean CV Error
abs(scores.mean())Final Evaluation
# Need to fit the model first!
model.fit(X_train, y_train)
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)Cross Validation with cross_validate
Difference from cross_val_score
cross_validate differs from cross_val_score in two ways:
-
It allows specifying multiple metrics for evaluation
-
It returns a dictionary containing:
-
fit times
-
score times
-
test scores
-
optionally training scores and fitted estimators
-
Return Values
For single metric evaluation
If scoring is a string, callable, or None, the keys will be:
['test_score', 'fit_time', 'score_time']For multiple metric evaluation
The returned dictionary contains keys like:
['test_<scorer1_name>', 'test_<scorer2_name>', 'test_<scorer...>', 'fit_time', 'score_time']Training Scores
-
return_train_score=Falseby default -
This saves computation time
-
To evaluate training scores too, set
return_train_score=True
Create X and y
X = df.drop('sales', axis=1)
y = df['sales']Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=101
)Scale Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)Create Model
from sklearn.linear_model import Ridge
model = Ridge(alpha=100)Run cross_validate
from sklearn.model_selection import cross_validate
# SCORING OPTIONS:
# https://scikit-learn.org/stable/modules/model_evaluation.html
scores = cross_validate(
model,
X_train,
y_train,
scoring=['neg_mean_absolute_error', 'neg_mean_squared_error', 'max_error'],
cv=5
)
scoresView Results
pd.DataFrame(scores)pd.DataFrame(scores).mean()Adjust Model Based on Metrics
model = Ridge(alpha=1)
scores = cross_validate(
model,
X_train,
y_train,
scoring=['neg_mean_absolute_error', 'neg_mean_squared_error', 'max_error'],
cv=5
)
pd.DataFrame(scores).mean()Final Evaluation
# Need to fit the model first!
model.fit(X_train, y_train)
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)Summary
Train/Test Split
-
Simple
-
Fast
-
Good starting point
-
But model tuning may depend too much on one split
Train/Validation/Test Split
-
Separates tuning from final testing
-
Test set stays untouched until the end
-
More reliable than only train/test
cross_val_score
-
Performs cross-validation directly
-
Good for one evaluation metric
-
Returns an array of scores
cross_validate
-
More flexible than
cross_val_score -
Supports multiple metrics
-
Also returns fit time and score time
cross_val_score vs cross_validate
1. Core difference
| Function | Purpose |
|---|---|
cross_val_score | Simple CV → returns only scores |
cross_validate | Advanced CV → returns detailed results |
2. cross_val_score
What it does
scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')Output
array([-10.2, -9.8, -11.0, -10.5, -9.9])👉 Only gives:
- test scores for each fold
When to use
-
You only care about one metric
-
You want something quick and simple
3. cross_validate
What it does
scores = cross_validate(
model,
X,
y,
cv=5,
scoring=['neg_mean_squared_error', 'neg_mean_absolute_error']
)Output
{
'test_neg_mean_squared_error': [...],
'test_neg_mean_absolute_error': [...],
'fit_time': [...],
'score_time': [...]
}👉 Returns a dictionary
What extra info you get
-
Multiple metrics
-
Fit time
-
Score time
-
(optional) training scores
4. Side-by-side comparison
| Feature | cross_val_score | cross_validate |
|---|---|---|
| Multiple metrics | ❌ | ✅ |
| Fit time | ❌ | ✅ |
| Score time | ❌ | ✅ |
| Train score | ❌ | ✅ (optional) |
| Output type | array | dict |
5. Subtle but important
cross_val_score
scores.mean()👉 directly usable
cross_validate
pd.DataFrame(scores).mean()👉 need to extract from dict
6. When YOU should use which
Given your project (ML + model comparison):
Use cross_val_score when:
-
quick check of model performance
-
tuning one metric (e.g. AUC)
Use cross_validate when:
-
comparing multiple metrics
-
analyzing model behavior
-
debugging performance
7. One line intuition
-
cross_val_score= just give me the score -
cross_validate= give me everything about training + evaluation