Skip to main content

How to Read .CSV file in Pandas

import pandas as pd
df = pd.read_csv('downloads/adeshbhai.csv')
df.head()
Out[1]:
RegionCountryItem TypeSales ChannelOrder PriorityOrder DateOrder IDShip DateUnits SoldUnit PriceUnit CostTotal RevenueTotal CostTotal Profit
0Australia and OceaniaTuvaluBaby FoodOfflineH5/28/20106691659336/27/20109925255.28159.422533654.001582243.50951410.50
1Central America and the CaribbeanGrenadaCerealOnlineC8/22/20129638814809/15/20122804205.70117.11576782.80328376.44248406.36
2EuropeRussiaOffice SuppliesOfflineL5/2/20143414171575/8/20141779651.21524.961158502.59933903.84224598.75
3Sub-Saharan AfricaSao Tome and PrincipeFruitsOnlineC6/20/20145143217927/5/201481029.336.9275591.6656065.8419525.82
4Sub-Saharan AfricaRwandaOffice SuppliesOfflineL2/1/20131154567122/6/20135062651.21524.963296425.022657347.52639077.50
In [2]:
df.tail()
Out[2]:
RegionCountryItem TypeSales ChannelOrder PriorityOrder DateOrder IDShip DateUnits SoldUnit PriceUnit CostTotal RevenueTotal CostTotal Profit
95Sub-Saharan AfricaMaliClothesOnlineM7/26/20115128781199/3/2011888109.2835.8497040.6431825.9265214.72
96AsiaMalaysiaFruitsOfflineL11/11/201181071103812/28/201162679.336.9258471.1143367.6415103.47
97Sub-Saharan AfricaSierra LeoneVegetablesOfflineC6/1/20167288152576/29/20161485154.0690.93228779.10135031.0593748.05
98North AmericaMexicoPersonal CareOfflineM7/30/20155594271068/8/2015576781.7356.67471336.91326815.89144521.02
99Sub-Saharan AfricaMozambiqueHouseholdOfflineL2/10/20126650954122/15/20125367668.27502.543586605.092697132.18889472.91
In [7]:
import matplotlib.pyplot as plt # import library 
x = df['Region']  # store the value in x
y= df['Country']  # store the vatue in y
plt.plot(x,y)  # Simple plot in two data
plt.show() # for shown in figure command shell or jupyter notebook
In [8]:
plt.scatter(x,y) # plot the Scatter  plot
Out[8]:
<matplotlib.collections.PathCollection at 0x1fe0742acc8>
In [11]:
plt.bar(x,y) # plot the bar plot 
Out[11]:
<BarContainer object of 100 artists>
In [13]:
df.index[1] #The index (row labels) of the DataFrame.
Out[13]:
1
In [15]:
df.columns[ : 3]    # The column labels of the DataFrame.
Out[15]:
Index(['Region', 'Country', 'Item Type'], dtype='object')
In [20]:
df.columns[0: ]    # The column labels of the DataFrame.
Out[20]:
Index(['Region', 'Country', 'Item Type', 'Sales Channel', 'Order Priority',
       'Order Date', 'Order ID', 'Ship Date', 'Units Sold', 'Unit Price',
       'Unit Cost', 'Total Revenue', 'Total Cost', 'Total Profit'],
      dtype='object')
In [24]:
df.dtypes  #Return the dtypes in the DataFrame.
Out[24]:
Region             object
Country            object
Item Type          object
Sales Channel      object
Order Priority     object
Order Date         object
Order ID            int64
Ship Date          object
Units Sold          int64
Unit Price        float64
Unit Cost         float64
Total Revenue     float64
Total Cost        float64
Total Profit      float64
dtype: object
In [26]:
df.ftypes # Return the ftypes (indication of sparse/dense and dtype) in DataFrame.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: DataFrame.ftypes is deprecated and will be removed in a future version. Use DataFrame.dtypes instead.
  """Entry point for launching an IPython kernel.
Out[26]:
Region             object:dense
Country            object:dense
Item Type          object:dense
Sales Channel      object:dense
Order Priority     object:dense
Order Date         object:dense
Order ID            int64:dense
Ship Date          object:dense
Units Sold          int64:dense
Unit Price        float64:dense
Unit Cost         float64:dense
Total Revenue     float64:dense
Total Cost        float64:dense
Total Profit      float64:dense
dtype: object
In [29]:
df.get_dtype_counts()  # Return counts of unique dtypes in this object.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: `get_dtype_counts` has been deprecated and will be removed in a future version. For DataFrames use `.dtypes.value_counts()
  """Entry point for launching an IPython kernel.
Out[29]:
float64    5
int64      2
object     7
dtype: int64
In [30]:
df.get_ftype_counts()  #Return counts of unique ftypes in this object.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: get_ftype_counts is deprecated and will be removed in a future version
  """Entry point for launching an IPython kernel.
Out[30]:
float64:dense    5
int64:dense      2
object:dense     7
dtype: int64
In [33]:
# applying get_value() function  
df.get_value(1, 'Order ID') #get_value( index,col)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: get_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  
Out[33]:
963881480
In [34]:
# column index value of "Name" column is 0 
# We have set takeable = True 
# to interpret the index / col as indexer 
df.get_value(4, 0, takeable = True) 
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: FutureWarning: get_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  after removing the cwd from sys.path.
Out[34]:
'Sub-Saharan Africa'
In [40]:
df.groupby('Country').mean()
Out[40]:
Order IDUnits SoldUnit PriceUnit CostTotal RevenueTotal CostTotal Profit
Country
Albania385383069.02269.000000109.28000035.8402.479563e+058.132096e+04166635.360000
Angola135425221.04187.000000668.270000502.5402.798046e+062.104135e+06693911.510000
Australia283189761.04331.666667301.453333224.6208.299778e+056.377761e+05192201.706667
Austria868214595.02847.000000437.200000263.3301.244708e+067.497005e+05495007.890000
Azerbaijan402861845.04627.500000544.205000394.1452.239400e+061.482937e+06756463.415000
........................
The Gambia800142168.53703.250000387.785000285.9401.362379e+061.015909e+06346470.817500
Turkmenistan452012574.04420.000000659.740000513.7502.911018e+062.277389e+06633629.200000
Tuvalu669165933.09925.000000255.280000159.4202.533654e+061.582244e+06951410.500000
United Kingdom955357205.0282.000000668.270000502.5401.884521e+051.417163e+0546735.860000
Zambia122583663.04085.000000152.58000097.4406.232893e+053.980424e+05225246.900000
76 rows × 7 columns
In [41]:
df.groupby('Region').mean()
Out[41]:
Order IDUnits SoldUnit PriceUnit CostTotal RevenueTotal CostTotal Profit
Region
Asia4.980493e+085451.545455335.809091239.5872731.940645e+061.384840e+06555804.170000
Australia and Oceania4.012882e+086211.363636222.672727154.7445451.281297e+068.520096e+05429287.275455
Central America and the Caribbean7.164449e+085110.142857243.172857157.8171431.310055e+069.033539e+05406701.121429
Europe5.843770e+084459.863636328.979545223.1663641.516770e+061.013000e+06503769.937727
Middle East and North Africa5.028923e+084867.800000241.506000152.4500001.405271e+068.291515e+05576119.186000
North America6.589260e+086381.000000277.243333205.2933331.881119e+061.395138e+06485980.920000
Sub-Saharan Africa5.758950e+085079.722222259.618889183.6775001.102001e+067.635783e+05338422.538889

Comments

Popular posts from this blog

Regression Graded Quiz week 2 quiz (ibm) Coursera

Congratulations! You passed! TO PASS   80% or higher Keep Learning GRADE 80% Regression LATEST SUBMISSION GRADE 80% 1. Question 1 Based on the reading, which of the following best describes the real added value of the author's research on residential real estate properties? Quantifying the magnitude of relationships between housing prices and different determinants. Quantifying people's preferences of different transport services. The research revealed findings that opposed basic perceptions that people hold about the real estate properties. The research determined that there was no correlation between proximity to shopping centres and housing prices. Correct Correct. The research confirmed many perceptions that people have about real estate properties but it major contribution is quantifying the magnitude of the relationships between the housing prices and different deter...

Assignment 4 - Understanding and Predicting Property Maintenance Fines

You are currently looking at  version 1.0  of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the  Jupyter Notebook FAQ  course resource. Assignment 4 - Understanding and Predicting Property Maintenance Fines This assignment is based on a data challenge from the Michigan Data Science Team ( MDST ). The Michigan Data Science Team ( MDST ) and the Michigan Student Symposium for Interdisciplinary Statistical Sciences ( MSSISS ) have partnered with the City of Detroit to help solve one of the most pressing problems facing Detroit - blight.  Blight violations  are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city...