Computer Science
Grade 10
20 min
Introduction to Pandas: Working with DataFrames
Introduce the Pandas library and the DataFrame data structure for data manipulation.
Tutorial Preview
1
Introduction & Learning Objectives
Learning Objectives
Create a Pandas DataFrame from a Python dictionary.
Inspect the first few rows, data types, and summary statistics of a DataFrame.
Select a single column (as a Series) and multiple columns (as a DataFrame).
Filter rows in a DataFrame based on a single condition.
Add a new column to an existing DataFrame.
Explain the difference between a DataFrame and a Series.
Ever wonder how YouTube recommends the perfect video or how a sports team analyzes player stats? 📈 They use powerful tools to wrangle massive tables of data, and today you'll learn the basics of one of the most popular tools in Python: Pandas!
In this tutorial, you'll learn about the fundamental data structure in Pandas, the DataFrame. We'll cover how to create, inspect, and manipul...
2
Key Concepts & Vocabulary
TermDefinitionExample
PandasA powerful and popular open-source Python library used for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.`import pandas as pd` is the standard way to import the library into your Python script.
DataFrameA 2-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It's like a spreadsheet or a SQL table in Python.A table of students with columns for 'Name', 'Grade', and 'Score'.
SeriesA one-dimensional labeled array capable of holding any data type. A single column or a single row of a DataFrame is a Series.The 'Grade' column from a student DataFrame, which is just a list of all the gr...
3
Core Syntax & Patterns
Creating a DataFrame
pd.DataFrame(data)
The primary way to create a DataFrame. The `data` is most commonly a Python dictionary where keys become column names and values become the column data.
Selecting Columns
df['ColumnName'] or df[['Col1', 'Col2']]
Use single square brackets `[]` to select one column (returns a Series). Use double square brackets `[[]]` to select one or more columns (returns a DataFrame).
Filtering Rows (Boolean Indexing)
df[df['ColumnName'] > value]
Place a conditional statement inside the selection brackets. This creates a boolean Series (True/False) that is used to select only the rows from the DataFrame where the condition is True.
4 more steps in this tutorial
Sign up free to access the complete tutorial with worked examples and practice.
Sign Up Free to ContinueSample Practice Questions
Challenging
Imagine a `products_df` with 'Product', 'Cost', and 'Price' columns. Which sequence of operations correctly adds a 'Profit' column and then shows only the products with a profit greater than $50?
A.products_df['Profit'] = products_df['Price'] - products_df['Cost']; products_df[products_df['Profit'] > 50]
B.products_df['Profit'] = products_df['Price'] - products_df['Cost']; filtered_df = products_df[products_df['Profit'] > 50]; print(filtered_df)
C.products_df.filter(products_df['Profit'] > 50); products_df['Profit'] = products_df['Price'] - products_df['Cost']
D.filtered_df = products_df[products_df['Price'] - products_df['Cost'] > 50]
Challenging
A student wrote the following code to find RPGs with a rating over 9.0, but it has two errors. What are they? `games_df[games_df['genre'] = 'RPG' & games_df['Rating'] > 9.0]`
A.Case-sensitivity in 'genre' and using `=` instead of `==` for comparison.
B.Using `&` instead of `and`, and forgetting to use double brackets for the columns.
C.The value 'RPG' should not be in quotes, and the condition should be in a separate function.
D.Using `>` is not allowed, and the column 'genre' does not exist.
Challenging
You are given the `students_df` from the tutorial. You want to produce a new DataFrame that contains only the `Student` and `Final Score` for students who passed (Final Score >= 70). Which single line of code achieves this?
A.students_df.filter(students_df['Final Score'] >= 70).select(['Student', 'Final Score'])
B.students_df[['Student', 'Final Score']][students_df['Final Score'] >= 70]
C.students_df[students_df['Final Score'] >= 70, ['Student', 'Final Score']]
D.students_df[students_df['Final Score'] >= 70][['Student', 'Final Score']]
Want to practice and check your answers?
Sign up to access all questions with instant feedback, explanations, and progress tracking.
Start Practicing Free