Creating a DataFrame from Python Objects
A Pandas DataFrame consists of multiple Pandas Series that share index values.
np.random.seed(101)
mydata = np.random.randint(0,101,(4,3))
myindex = ['CA','NY','AZ','TX']
mycolumns = ['Jan','Feb','Mar']
df = pd.DataFrame(data=mydata,index=myindex,columns=mycolumns)
df.info()Obtaining Basic Information About DataFrame
df.columns
df.index
df.head(3)
df.tail(3)
df.describe() #Statistical summary
df.describe().transpose()Column Operations
Selection:
df['total_bill'] # Single column → Series
df[['total_bill','tip']] # Multiple columns → DataFrameModification:
# Create new column
df['tip_pct'] = 100 * df['tip'] / df['total_bill']
# Round values
df['price_per_person'] = np.round(df['total_bill']/df['size'], 2)
# Delete column
df = df.drop('tip_pct', axis=1) Index Management:
df.set_index('Payment ID', inplace=True) # Set custom index
df.reset_index(inplace=True) # Revert to defaultRow Operations
Selection:
df.iloc[0] # By position → Series
df.loc['Sun2959'] # By index label → Series
df.iloc[0:4] # Slice by position
df.loc[['Sun2959','Sun5260']] # Specific labelsModification:
# Delete row
df.drop('Sun2959', axis=0)
# Add row (rarely used)
new_row = df.iloc[0]
df.append(new_row)Pro Tips
-
Path Check: Use
pwdandlsto verify file locations -
Axis Remember:
-
axis=0→ Rows (vertical) -
axis=1→ Columns (horizontal)
-
-
Modification Best Practices:
-
Use
inplace=Trueor reassign (df = df.drop(...)) -
Prefer vectorized operations over loops
-
- These changes are not permanent, unless like
df=df.drop('Sun2959', axis=0) - Sometimes, selecting logic is easier than dropping logic, so maybe we can use selecting logic more.