2025-10-04 11:20 Tags:

Linear Regression and the Cost Function

1. Prediction Function (Hypothesis)

We assume can be predicted as a linear combination of inputs:

  • = predicted value
  • = input features (with for the intercept term)
  • = parameters (weights) we want to learn

2. Error (Residual)

For each data point , the error is:

  • = actual value
  • = predicted value

3. Cost Function (Squared Error)

We want to measure how “bad” our predictions are.
So we square the errors and average them:

Mean Squared Error (MSE):

where = number of rows (data points).

To make derivative math cleaner, we add :

Cost function:


4. Why the 1/2m Factor?

  • Dividing by gives us the average error.
  • The is just for convenience:
    when we differentiate, the “2” from squaring cancels out.

5. Minimization via Calculus

We want to minimize .
From calculus: take derivative, set = 0.

Gradient for parameter :


Intuition

  1. Prediction: draw a line .
  2. Error: check how far actual points are from the line.
  3. Cost function: square errors, average them → get a “badness score”.
  4. Gradient descent: follow the slope downhill to find the best line.