Table of Contents
Python is one of the versatile and easy to learn software languages. Here we are going to concentrate on Pandas. Pandas is a software library written in Python language. It is one of the go-to software for data analysis and data manipulation which is nothing but organizing data into a user-friendly format. This library is built with functions like NumPy, SciPy, and Matplotlib. Pandas is mainly used in data analysis and data manipulations which are nothing but organizing data into a more user-friendly format. All you need to have to work on Pandas is given below.
Install Pandas in Python
- Run this command to install pandas.
$pip install pandas
- A decent knowledge of python. If you are new to python I would recommend you go through some basic tutorials on python and continue further.
Python Basics: Lists, Dictionaries, & Booleans | Python
Goals of this blog
Understanding and playing around with basic commands in Pandas. For better understanding, you can use the repo here.
Before getting into the commands, let’s quickly go through the basic terms generally used in Pandas. Dataframe is nothing but a table with multiple columns, it can be of single dimension or multi-dimensional and series is one single column of the data frame.
ALSO READ: MongoDB In Golang With Examples – A Beginner’s Guide
Now, let’s start by reading the CSV file. A CSV (comma separated values) files are actually tables in the text version. It is basically separated by commas.
Read a CSV file
import pandas as pd df = pd.read_csv('sample.csv') print(df.head(3))
Output:
# Name Type 1 ... Speed Generation Legendary 0 1 Bulbasaur Grass ... 45 1 False 1 2 Ivysaur Grass ... 60 1 False 2 3 Venusaur Grass ... 80 1 False 3 3 Venusaur Grass ... 80 1 False 4 4 Charmander Fire ... 65 1 False .. ... ... ... ... ... ... ... 795 719 Diancie Rock ... 50 6 True 796 719 Diancie Rock ... 110 6 True 797 720 Confined Psychic ... 70 6 True 798 720 Unbound Psychic ... 80 6 True 799 721 Volcanion Fire ... 70 6 True
To read the first few values, we use the head function.
import pandas as pd df = pd.read_csv('sample.csv') print(df.head(3))
Output:
# Name Type 1 Type 2 ... Sp. Def Speed Generation Legendary 0 1 Bulbasaur Grass Poison ... 65 45 1 False 1 2 Ivysaur Grass Poison ... 80 60 1 False 2 3 Venusaur Grass Poison ... 100 80 1 False
To read a few values from the bottom we use tail functions
import pandas as pd df = pd.read_csv('sample.csv') print(df.tail(3))
Output:
# Name Type 1 ... Speed Generation Legendary 797 720 Confined Psychic ... 70 6 True 798 720 Unbound Psychic ... 80 6 True 799 721 Volcanion Fire ... 70 6 True
To read the column name in CSV file
import pandas as pd df = pd.read_csv('sample.csv') print(df.columns)
Output:
Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'], dtype='object')
To read specific column name
import pandas as pd df = pd.read_csv('sample.csv') print(df['Name'])
Output:
0 Bulbasaur 1 Ivysaur 2 Venusaur 3 Venusaur 4 Charmander ... 795 Diancie
To read specific row
import pandas as pd df = pd.read_csv('sample.csv') print(df.iloc[1])
Output:
# 2 Name Ivysaur Type 1 Grass Type 2 Poison HP 60 Attack 62 Defense 63 Sp. Atk 80 Sp. Def 80
To print the value of specific row and columns
import pandas as pd df = pd.read_csv('sample.csv') print(df.iloc[1,2])
Output:
Grass
To view only the results for a particular condition
import pandas as pd df = pd.read_csv('sample.csv') test = df.loc[df['Type 1']=='Grass'] print(test)
Output:
# Name Type 1 ... Speed Generation Legendary 0 1 Bulbasaur Grass ... 45 1 False 1 2 Ivysaur Grass ... 60 1 False 2 3 Venusaur Grass ... 80 1 False 3 3 Venusaur Grass ... 80 1 False 48 43 Oddish Grass ... 30 1 False
To find the statistics for numerical columns
df.mean() | This will return the mean of all columns |
df.corr() | This will return the correlation between columns in a DataFrame |
df.count() | This will return the number of non-null values in each DataFrame column |
df.max() | This will return the highest value in each column |
df.min() | This will return the lowest value in each column |
df.median() | This will return the median of each column |
df.std() | This will return the standard deviation of each column |
import pandas as pd df = pd.read_csv('sample.csv') print(df.describe()) (describe function will give all the details all at once)
Output:
count 800.000000 800.000000 800.000000 ... 800.000000 800.000000 800.00000 mean 362.813750 69.258750 79.001250 ... 71.902500 68.277500 3.32375 std 208.343798 25.534669 32.457366 ... 27.828916 29.060474 1.66129 min 1.000000 1.000000 5.000000 ... 20.000000 5.000000 1.00000 25% 184.750000 50.000000 55.000000 ... 50.000000 45.000000 2.00000 50% 364.500000 65.000000 75.000000 ... 70.000000 65.000000 3.00000 75% 539.250000 80.000000 100.000000 ... 90.000000 90.000000 5.00000 max 721.000000 255.000000 190.000000 ... 230.000000 180.000000 6.00000
To add the column in the existing data frame.
For example, let’s say the sum of some columns
import pandas as pd df = pd.read_csv('sample.csv') df['total']=df['HP']+df['Attack'] print(df)
Output:
import pandas as pd df = pd.read_csv('sample.csv') df['total']=df['HP']+df['Attack'] print(df)
To delete the column
import pandas as pd df = pd.read_csv('sample.csv') df = df.drop(columns=['total']) print(df)
Output:
# Name Type 1 ... Sp. Def Speed Legendary 0 1 Bulbasaur Grass ... 65 45 False 1 2 Ivysaur Grass ... 80 60 False 2 3 Venusaur Grass ... 100 80 False 3 3 Venusaur Grass ... 120 80 False 4 4 Charmander Fire ... 50 65 False
To add multiple columns
In the following code, you might notice ‘:’ which refers to all the rows and that 4:9 refers to from column 4 to column 9.
import pandas as pd df = pd.read_csv('sample.csv') df['total']=df.iloc[:,4:9].sum(axis=1) print(df)
Output:
# Name Type 1 ... Generation Legendary total 0 1 Bulbasaur Grass ... 1 False 273 1 2 Ivysaur Grass ... 1 False 345 2 3 Venusaur Grass ... 1 False 445 3 3 Venusaur Grass ... 1 False 545 4 4 Charmander Fire ... 1 False 244 .. ... ... ... ... ... ... ... 795 719 Diancie Rock ... 6 True 550 796 719 Diancie Rock ... 6 True 590
To filter the data
import pandas as pd df = pd.read_csv('sample.csv') test = df.loc[(df['Type 1'] == "Grass") & (df['Type 2'] == "Poison")] print(df)
Output:
# Name Type 1 ... Speed Generation Legendary 0 1 Bulbasaur Grass ... 45 1 False 1 2 Ivysaur Grass ... 60 1 False 2 3 Venusaur Grass ... 80 1 False 3 3 Venusaur Grass ... 80 1 False 4 4 Charmander Fire ... 65 1 False
Here in this blog, basic commands used in pandas are covered to help you better understand the essential Panda functions.
Do you find it interesting? you might also like these articles. Top 10 Best Tech Companies For Employees To Work In The USA In 2020 and Top 10 IT Staffing and Recruiting Agencies in the USA.
If you have a business idea in your mind and in search of a reliable web development company, you are in the right place. Hire the best Python developers in the industry from Agira technologies.