By Debasis Das (17-Feb-2021)
In this post we will see Different ways of creating a pandas DataFrame and editing it
Lets first import the Python Pandas and numpy module
import pandas as pd
import numpy as np
import random
pd.set_option('display.width', 1000)
Creating a dataframe by reading from a CSV
salesDataDf = pd.read_csv("SalesData.csv",low_memory=False)
print(salesDataDf)
Output:
Region Country Product WK_1 WK_2 WK_3 Wk_4
0 America USA Laptop 1241 1160 1929 1174
1 America USA Phone 1098 1092 1089 1819
2 America Canada Laptop 1441 1099 1950 1394
3 America Canada Phone 1990 1057 1656 1060
4 Europe Belgium Laptop 1084 1116 1002 1566
5 Europe Belgium Phone 1574 1958 1793 1213
6 Europe Finland Laptop 1325 1374 1300 1579
7 Europe Finland Phone 1347 1736 1782 1921
Create a DataFrame from a List
seasons = ['Winter','Spring','Summer','Fall']
df = pd.DataFrame(seasons)
print(df)
Output:
0
0 Winter
1 Spring
2 Summer
3 Fall
Creating a DataFrame from an Array of Array:
data = [[100,200,300],[400,500,600],['John','Jane','Mary','Jin']]
df = pd.DataFrame(data)
print(df)
Output:
0 1 2 3
0 100 200 300 None
1 400 500 600 None
2 John Jane Mary Jin
Creating a DataFrame by passing an arrays of values and an array of column header names:
df = pd.DataFrame([['Debasis Das',100],['John Doe',98],['Jane Doe',93]], columns = ["name","score"],dtype = float)
print(df)
Output:
name score
0 Debasis Das 100.0
1 John Doe 98.0
2 Jane Doe 93.0
Create a DataFrame from a Dictionary:
dictionary = {"names":["John Doe","Jane Doe","Mary Jane"], "score":[90,91,93]}
df = pd.DataFrame(dictionary)
print(df)
Output:
names score
0 John Doe 90
1 Jane Doe 91
2 Mary Jane 93
Create a DataFrame from a List of Dictionaries:
listOfDict = [{"names":"John Doe", "age":30},{"names":"Jane Doe", "age":10, "score":98.0}]
df = pd.DataFrame(listOfDict)
print(df)
Output:
names age score
0 John Doe 30 NaN
1 Jane Doe 10 98.0
Add a column to the DataFrame based on a condition:
df = pd.DataFrame({'Score':[100,20,30,80,90]})
print(df)
df.loc[df.Score <= 80, "Grade"] = "B"
df.loc[df.Score > 80, "Grade"] = "A"
df.loc[df.Score < 35, "Grade"] = "F"
print(df)
Output:
Score
0 100
1 20
2 30
3 80
4 90
After adding a new column
Score Grade
0 100 A
1 20 F
2 30 F
3 80 B
4 90 A
Creating a DataFrame from a numpy array:
df = pd.DataFrame(np.random.randint(low=80, high=100, size=(3, 5)), columns=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],index=['Temp Morning', 'Temp Afternoon', 'Temp Evening'])
print(df)
Output:
Monday Tuesday Wednesday Thursday Friday
Temp Morning 98 88 91 87 91
Temp Afternoon 92 83 80 83 81
Temp Evening 84 94 92 96 84
Creating a DataFrame with Random Values in few columns
salesData = {"Region":["Americas","Americas","Americas","Americas","Europe","Europe","Europe","Europe"],
"Country":["USA","USA","Mexico","Mexico","Belgium","Belgium","Finland","Finland"],
"Product":["Phone","TV","Phone","TV","Phone","TV","Phone","TV"],
"wk1":np.random.uniform(1000,2000,8),
"wk2":np.random.uniform(1000,2000,8),
"wk3":np.random.uniform(1000,2000,8),
"wk4":np.random.uniform(1000,2000,8)
}
salesDF = pd.DataFrame(salesData)
print(salesDF)
Output:

Adding New Columns to an Existing DataFrame:
salesDF["Total"] = salesDF['wk1'] + salesDF['wk2'] + salesDF['wk3'] + salesDF['wk4']
salesDF["Total (K)"] = salesDF["Total"] /1000
print(salesDF)
Output: In this sample we added 2 new columns (Total and Total In K Format)

![]()
Creating a DataFrame using Assign Function
# Assign returns a copy of the data frame as a new object with the new columns added to the original data frame.
salesDF1 = salesDF[["Region","Country","Product","wk1"]]
salesDF2 = salesDF1.assign(week1_k=lambda x: salesDF1['wk1']/1000)
salesDF2
Output:

![]()
Transpose a DataFrame:
data = {'Col1': [1, 2, 3], 'Col2': [4, 5, 6], 'Col3': [7, 8, 9]}
df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'])
df_transpose = df.T
print("\nBefore Transpose")
print(df)
print("\nAfter Transpose")
print(df_transpose)

Selecting a Subset of a DataFrame using loc
data = {'Col1': [1, 2, 3], 'Col2': [4, 5, 6], 'Col3': [7, 8, 9]}
df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'])
print(df)
df1 = df.loc['Row1':'Row2','Col1':'Col2']
print("\nSubset of the Dataframe")
print(df1)

Merge Dataframes
data1 = {"Country":["USA","Mexico","Brazil"],"Jan_Sales":[1000,2000,3000]}
data2 = {"Country":["Canada","Mexico","Brazil","Belgium"],"Feb_Sales":[4000,5000,6000,7000]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
#Pandas merge function can automatically detect which columns are common between the dataframes
#Pandas will use the common columns to merge the two dataframes
df1
Country Jan_Sales
0 USA 1000
1 Mexico 2000
2 Brazil 3000
df2
Country Feb_Sales
0 Canada 4000
1 Mexico 5000
2 Brazil 6000
3 Belgium 7000
df3 = df1.merge(df2)
df3
Country Jan_Sales Feb_Sales
0 Mexico 2000 5000
1 Brazil 3000 6000
Inner Merge
df4 = df1.merge(df2,how='inner')
df4
Country Jan_Sales Feb_Sales
0 Mexico 2000 5000
1 Brazil 3000 6000
Outer Merge
df5 = df1.merge(df2,how='outer')
df5
Country Jan_Sales Feb_Sales
0 USA 1000.0 NaN
1 Mexico 2000.0 5000.0
2 Brazil 3000.0 6000.0
3 Canada NaN 4000.0
4 Belgium NaN 7000.0
Left Merge
df6 = df1.merge(df2,how='left')
df6
Country Jan_Sales Feb_Sales
0 USA 1000 NaN
1 Mexico 2000 5000.0
2 Brazil 3000 6000.0
Right Merge
df7 = df1.merge(df2,how='right')
df7
Country Jan_Sales Feb_Sales
0 Canada NaN 4000
1 Mexico 2000.0 5000
2 Brazil 3000.0 6000
3 Belgium NaN 7000