Computer Science Grade 10 20 min

Introduction to Pandas: Working with DataFrames

Introduce the Pandas library and the DataFrame data structure for data manipulation.

Tutorial Preview

1

Introduction & Learning Objectives

Learning Objectives Create a Pandas DataFrame from a Python dictionary. Inspect the first few rows, data types, and summary statistics of a DataFrame. Select a single column (as a Series) and multiple columns (as a DataFrame). Filter rows in a DataFrame based on a single condition. Add a new column to an existing DataFrame. Explain the difference between a DataFrame and a Series. Ever wonder how YouTube recommends the perfect video or how a sports team analyzes player stats? 📈 They use powerful tools to wrangle massive tables of data, and today you'll learn the basics of one of the most popular tools in Python: Pandas! In this tutorial, you'll learn about the fundamental data structure in Pandas, the DataFrame. We'll cover how to create, inspect, and manipul...
2

Key Concepts & Vocabulary

TermDefinitionExample PandasA powerful and popular open-source Python library used for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.`import pandas as pd` is the standard way to import the library into your Python script. DataFrameA 2-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It's like a spreadsheet or a SQL table in Python.A table of students with columns for 'Name', 'Grade', and 'Score'. SeriesA one-dimensional labeled array capable of holding any data type. A single column or a single row of a DataFrame is a Series.The 'Grade' column from a student DataFrame, which is just a list of all the gr...
3

Core Syntax & Patterns

Creating a DataFrame pd.DataFrame(data) The primary way to create a DataFrame. The `data` is most commonly a Python dictionary where keys become column names and values become the column data. Selecting Columns df['ColumnName'] or df[['Col1', 'Col2']] Use single square brackets `[]` to select one column (returns a Series). Use double square brackets `[[]]` to select one or more columns (returns a DataFrame). Filtering Rows (Boolean Indexing) df[df['ColumnName'] > value] Place a conditional statement inside the selection brackets. This creates a boolean Series (True/False) that is used to select only the rows from the DataFrame where the condition is True.

4 more steps in this tutorial

Sign up free to access the complete tutorial with worked examples and practice.

Sign Up Free to Continue

Sample Practice Questions

Challenging
Imagine a `products_df` with 'Product', 'Cost', and 'Price' columns. Which sequence of operations correctly adds a 'Profit' column and then shows only the products with a profit greater than $50?
A.products_df['Profit'] = products_df['Price'] - products_df['Cost']; products_df[products_df['Profit'] > 50]
B.products_df['Profit'] = products_df['Price'] - products_df['Cost']; filtered_df = products_df[products_df['Profit'] > 50]; print(filtered_df)
C.products_df.filter(products_df['Profit'] > 50); products_df['Profit'] = products_df['Price'] - products_df['Cost']
D.filtered_df = products_df[products_df['Price'] - products_df['Cost'] > 50]
Challenging
A student wrote the following code to find RPGs with a rating over 9.0, but it has two errors. What are they? `games_df[games_df['genre'] = 'RPG' & games_df['Rating'] > 9.0]`
A.Case-sensitivity in 'genre' and using `=` instead of `==` for comparison.
B.Using `&` instead of `and`, and forgetting to use double brackets for the columns.
C.The value 'RPG' should not be in quotes, and the condition should be in a separate function.
D.Using `>` is not allowed, and the column 'genre' does not exist.
Challenging
You are given the `students_df` from the tutorial. You want to produce a new DataFrame that contains only the `Student` and `Final Score` for students who passed (Final Score >= 70). Which single line of code achieves this?
A.students_df.filter(students_df['Final Score'] >= 70).select(['Student', 'Final Score'])
B.students_df[['Student', 'Final Score']][students_df['Final Score'] >= 70]
C.students_df[students_df['Final Score'] >= 70, ['Student', 'Final Score']]
D.students_df[students_df['Final Score'] >= 70][['Student', 'Final Score']]

Want to practice and check your answers?

Sign up to access all questions with instant feedback, explanations, and progress tracking.

Start Practicing Free

More from Data Science Fundamentals: Exploring and Visualizing Data

Ready to find your learning gaps?

Take a free diagnostic test and get a personalized learning plan in minutes.