2026-03-24 16:58 Tags:
1. Problem Setup
We are solving a binary classification problem:
Goal:
We want a model that outputs a probability between 0 and 1.
2. Why Linear Regression Fails
Linear regression assumes:
Problems:
-
Output is unbounded:
- [ (-\infty, +\infty) ] $$→ invalid for probabilities
-
No probabilistic interpretation
-
Not suitable for classification boundaries
You are already very close. The missing piece is:
why do we go from probability to odds, and then from odds to log-odds?
Let’s do only that piece.
Step 1: What are we trying to build?
We want a model that takes in:
This (z) can be anything:
-
very negative
-
zero
-
very positive
But we want the final output to be a probability:
and probability must be:
So yes: real-number input, probability output.
Step 2: Why not just force z to equal probability?
Suppose we say:
Then if (z=3), probability is 3.
Impossible.
If (z=-2), probability is -2.
Also impossible.
So we need some transformation:
that turns any real number into something between 0 and 1.
Step 3: What kind of transformation do we need?
We need a function that does this:
-
input: any real number
-
output: between 0 and 1
That is the actual problem.
Now here is the key:
there are many functions that can do this.
So the real question is not:
why must it be odds?
It is:
why is odds/log-odds a convenient bridge?
That is the part we now unpack.
Step 4: Start from probability
Probability is bounded:
This bounded interval is annoying for linear modeling.
Why?
Because linear models naturally live on the whole real line:
So instead of trying to make linear models live inside ((0,1)), we do the opposite:
take probability, and transform it into something unbounded.
This is the important move.
We are not yet choosing odds because of magic.
We are choosing a way to “unbound” probability.
Step 5: First transformation: probability to odds
Define:
Let’s see what this does.
If (P=0.5):
If (P=0.8):
If (P=0.2):
So now odds lives in:
Good news:
- no longer capped at 1
Bad news:
-
still cannot be negative
-
still not the full real line
So odds gets us part of the way, but not all the way.
Step 6: Second transformation: odds to log-odds
Now take log:
What happens now?
-
if odds is very small, log is very negative
-
if odds = 1, log = 0
-
if odds is very large, log is very positive
Now the range becomes:
Perfect.
This is exactly the same range as the linear predictor:
So now we can say:
and that is legal, natural, and mathematically clean.
Step 7: So why “go to odds” first?
Because log cannot be taken directly on probability in the right way.
Let’s compare.
If you take just:
then since (P\in(0,1)),
This only gives you negative numbers.
Not enough.
If you take:
same problem.
So we need a transformation of probability that:
-
respects the fact that probability has two sides: event and non-event
-
gives something that can go from very small to very large
-
then log makes it span all real numbers
That is why:
is so useful.
It compares:
probability of event / probability of non-event
That is the odds.
Then log turns that positive ratio into the whole real line.
Step 8: Intuition in plain language
Probability alone says:
- chance event happens
Odds says:
- how much more likely the event is than the non-event
Log-odds says:
- a version of that comparison that can be modeled linearly
So the path is:
-
probability is bounded
-
odds removes the upper bound
-
log-odds removes the lower bound too
-
now it matches a linear expression
Step 9: The key bridge
You were asking:
how did we suddenly turn to odds ratio?
The answer is:
we did not jump there randomly.
We needed to transform probability from a bounded scale to an unbounded scale.
The odds is an intermediate step that compares event vs non-event:
Then the log of that gives a quantity on the full real line:
That makes it compatible with:
Step 10: One sentence summary
We go to odds, then log-odds, because probability is bounded in ((0,1)), while linear models live on , and log-odds is the simplest transformation that connects those two worlds.