Pivoting data can help clarify relationships and connections by restructuring how information is displayed. This is especially useful for data analysis, visualization, and exploration.
🔗 Full documentation on pivot methods: Pandas User Guide - Reshaping
📌 Data Import
import numpy as np
import pandas as pd
df = pd.read_csv('Sales_Funnel_CRM.csv')
df.head()🔄 pivot() Method
The pivot() method reshapes data based on column values and reassignment of the index. It does not perform aggregation, so it requires a unique index for each combination of values.
❓ When to use pivot()
Before using pivot(), ask yourself: ✅ What question are you trying to answer?
✅ What should the transformed DataFrame look like?
✅ Do you need all the original columns?
Example: How many licenses of each product did Google purchase?
# Take a subset to avoid duplicate row errors
licenses = df[['Company', 'Product', 'Licenses']]
licenses.head()
# Apply pivot
pd.pivot(data=licenses, index='Company', columns='Product', values='Licenses')🛠️ pivot_table() Method
The pivot_table() method is similar to pivot() but allows for aggregation functions to be applied.
Basic Example: Sum by Company
pd.pivot_table(df, index="Company", aggfunc='sum')🔹 Selecting Specific Columns
# Option 1: Selecting columns after pivoting
pd.pivot_table(df, index="Company", aggfunc='sum')[['Licenses', 'Sale Price']]
# Option 2: Specifying columns before pivoting
pd.pivot_table(df, index="Company", aggfunc='sum', values=['Licenses', 'Sale Price'])🔹 Alternative Using groupby()
df.groupby('Company').sum()[['Licenses', 'Sale Price']]🔹 Using Multiple Index Levels
pd.pivot_table(df, index=["Account Manager", "Contact"], values=['Sale Price'], aggfunc='sum')🔹 Adding Columns for Further Segmentation
pd.pivot_table(df, index=["Account Manager", "Contact"], values=["Sale Price"], columns=["Product"], aggfunc=np.sum)🔹 Filling Missing Values with fill_value=0
pd.pivot_table(df, index=["Account Manager", "Contact"], values=["Sale Price"],
columns=["Product"], aggfunc=np.sum, fill_value=0)📌 Multiple Aggregation Functions
pd.pivot_table(df, index=["Account Manager", "Contact"], values=["Sale Price"], columns=["Product"],
aggfunc=[np.sum, np.mean], fill_value=0)📌 Pivoting with Multiple Columns
pd.pivot_table(df, index=["Account Manager", "Contact"], values=["Sale Price", "Licenses"], columns=["Product"],
aggfunc=[np.sum], fill_value=0)📌 Pivoting with Multiple Indexes
pd.pivot_table(df, index=["Account Manager", "Contact", "Product"], values=["Sale Price", "Licenses"],
aggfunc=[np.sum], fill_value=0)📌 Adding a Grand Total (margins=True)
pd.pivot_table(df, index=["Account Manager", "Contact", "Product"], values=["Sale Price", "Licenses"],
aggfunc=[np.sum], fill_value=0, margins=True)or
pd.pivot_table(df, index=["Account Manager", "Status"], values=["Sale Price"],
aggfunc=[np.sum], fill_value=0, margins=True)🎯 Key Takeaways
✔ Use pivot() when you need to restructure data without aggregation.
✔ Use pivot_table() when aggregation (e.g., sum, mean) is required.
✔ Define your question first to avoid confusion with index, columns, and values.
✔ Use fill_value=0 to handle missing values.
✔ Use margins=True to get totals.