2025-10-04 11:20 Tags:
Linear Regression and the Cost Function
1. Prediction Function (Hypothesis)
We assume can be predicted as a linear combination of inputs:
- = predicted value
- = input features (with for the intercept term)
- = parameters (weights) we want to learn
2. Error (Residual)
For each data point , the error is:
- = actual value
- = predicted value
3. Cost Function (Squared Error)
We want to measure how “bad” our predictions are.
So we square the errors and average them:
Mean Squared Error (MSE):
where = number of rows (data points).
To make derivative math cleaner, we add :
Cost function:
4. Why the 1/2m Factor?
- Dividing by gives us the average error.
- The is just for convenience:
when we differentiate, the “2” from squaring cancels out.
5. Minimization via Calculus
We want to minimize .
From calculus: take derivative, set = 0.
Gradient for parameter :
Intuition
- Prediction: draw a line .
- Error: check how far actual points are from the line.
- Cost function: square errors, average them → get a “badness score”.
- Gradient descent: follow the slope downhill to find the best line.