MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • 1: NumPy Module
  • 2: Pandas Module
    • Series Structure
    • Series Quiz
    • Colab Exercise
    • DataFrame Structure
    • File Reading & Writing
    • Summary Statistics using DataFrames
    • Additional Functions
    • DataFrame Quiz
    • Colab Exercise
  • 3: Pandas - More on Dataframes
  • 4: Matplotlib Module
  • 5: Seaborn Module
  • 6: Plotly Express Module
  • 7: GeoSpatial Modules
  • 8. Other Popular Libs
  • 9. Data Driven Stories
  • 10. Bad Visualization Example
  • 11. Glossary
  • Slides-1
  • Slides-2

Additional Handy Functions

You can use a few functions that are applicable to both DataFrames and Series. Here is a rundown:

iloc on DataFrame

For a DataFrame, you can provide both row and column index numbers to retrieve values.

Now, let's understand how to retrieve values in specific row(s) and column(s) using the iloc accessor in a DataFrame:

dataFrame.iloc[<row selection>, <column selection>] is used to select rows and columns by index number in the order that they appear in the DataFrame. Row and column indexes start at 0 and span up to the length of the rows/columns - 1. Therefore, the last row index can be calculated by:

df.shape[0] - 1

And to calculate the last column index, it would be:

df.shape[1] - 1

Get the entire first row

import pandas as pd

df = pd.DataFrame(
    {
        "marks": [70, 66, 100, 88],
        "age": [29, 32, 31, 28],
        "sex": ["F", "M", "F", "F"],
        "name": ["Jane", "John", "Sally", "Sandy"],
        "ssn": ["1234", "3456", "4567", "5678"],
    }
)

row = df.iloc[0]  # Gets the first row's values as a Series
print(type(row))
print(row)

Output:

<class 'pandas.core.series.Series'>
marks      70
age        29
sex         F
name     Jane
ssn      1234
Name: 0, dtype: object

 

Related functions for both Series and DataFrames
  • df.iloc[-1] - Gets all the values in the last row.

 

Related functions for DataFrames Only
  • df.iloc[:, 0] - Gets all the values in the first column.
  • df.iloc[:, -1] - Gets all the values in the last column.

More Examples

Get the first row, second column value only

single_value = df.iloc[0, 1]  # Gets the value as a scalar
print(type(single_value))
print(single_value)

Output:

<class 'numpy.int64'>
29

Get multiple row values of a single column

To get all the rows from 0 to 2:

row_0_to_2_of_2nd_column = df.iloc[0:3, 1]  # index 3 is excluded
print(type(row_0_to_2_of_2nd_column))  # The returned type is a Series
print(row_0_to_2_of_2nd_column)

Output:

<class 'pandas.core.series.Series'>
0    29
1    32
2    31
Name: age, dtype: int64

Get multiple rows and multiple columns

row_0_2_and_column_0_3 = df.iloc[
    0:2, 0:3
]  # row index 2 and column index 3 are excluded
print(type(row_0_2_and_column_0_3))
print(row_0_2_and_column_0_3)

Output:

<class 'pandas.core.frame.DataFrame'>
   marks  age sex
0     70   29   F
1     66   32   M
Note
  • iloc returns a scalar, Series, or DataFrame based on the results. If only a single value is returned, it will be one of the basic NumPy data types. If a collection of one type is returned, it will be a Series object. If two-dimensional results are returned, it will be a DataFrame.

loc

Using loc, you can retrieve row or column values using the integer index numbers (similar to iloc) if the DataFrame has no custom index labels, or you can use the custom labels if the index has been replaced.

loc on DataFrame

Here are the various ways you can use loc on a DataFrame:

  1. Use the index number, just like iloc, to retrieve row values:
df = pd.DataFrame(
    {
        "marks": [70, 66, 100, 88],
        "age": [29, 32, 31, 28],
        "sex": ["F", "M", "F", "F"],
        "name": ["Jane", "John", "Sally", "Sandy"],
        "ssn": ["1234", "3456", "4567", "5678"],
    }
)

row_values = df.loc[0]
print(type(row_values))

Output:

<class 'pandas.core.series.Series'>

Notice that the returned object is a Series. Now, let's print the Series object's values.

  1. Use multiple specific row indexes and a column label to get values:
row_column = df.loc[[1, 3], "name"]
print(row_column)  # A Series is returned

Output:

1     John
3    Sandy
Name: name, dtype: object
  1. Use multiple specific column labels with a single row index:
row_column = df.loc[1, ["name", "age"]]
print(row_column)  # A Series is returned

Output:

name    John
age       32
Name: 1, dtype: object
  1. Use multiple specific row indexes and multiple column labels:
row_column = df.loc[[1, 3], ["name", "age"]]
print(row_column)  # A DataFrame is returned

Output:

<class 'pandas.core.frame.DataFrame'>
    name  age
1   John   32
3  Sandy   28

loc will not work with index numbers if the default integer index is replaced with a custom one. You can only use the custom labels in that case. In the example below, we replace the existing integer index with the ssn column:

df.set_index("ssn", drop=True, inplace=True)
row_values = df.loc["1234"]
print(row_values)

Output:

marks      70
age        29
sex         F
name     Jane
Name: 1234, dtype: object
Privacy Policy | Terms & Conditions