Key Concept
Filter DataFrames using boolean conditions (“masks”) to select specific rows.
Basic Filtering
Single Condition
# Create boolean mask
mask = df['total_bill'] > 30
# Apply mask
large_bills = df[mask]Equivalent shorthand:
df[df['total_bill'] > 30]How It Works:
df['total_bill'] > 30creates boolean Series (True/False values)- DataFrame returns rows where mask is True
Multiple Conditions
AND (&) Operator
# Both conditions must be True
df[(df['total_bill'] > 30) & (df['sex']!='Male')]OR (|) Operator
# Either condition can be True
df[(df['day'] == 'Sun') | (df['day'] == 'Sat')]isin() Method
Check if values are in a list of options:
# Weekend filter
options = ['Sat','Sun']
df['day'].isin(options)
df[df['day'].isin(['Sat', 'Sun'])]
# Equivalent to:
df[(df['day'] == 'Sat') | (df['day'] == 'Sun')]Why Parentheses Matter
Python operator precedence can cause unexpected behavior:
# ❌ Wrong - evaluates 'Male' & df first!
df[df['total_bill'] > 30 & df['sex'] == 'Male']
# ✅ Correct
df[(df['total_bill'] > 30) & (df['sex'] == 'Male')]Filtering Process
- Create Mask: Generate boolean Series from condition
- Apply Mask: Use
df[mask]to filter rows - Combine Masks: Use logical operators (
&,|,~)
Key Takeaways
- Boolean Masking: Core pandas filtering technique
- Operator Precedence: Always use parentheses
isin()Efficiency: Cleaner than multiple OR conditions- Readability: Break complex filters into multiple lines