Computer Science
Grade 10
20 min
Data Collection: Gathering Data from Various Sources
Learn about different data sources and techniques for collecting data (e.g., APIs, web scraping, files).
Tutorial Preview
1
Introduction & Learning Objectives
Learning Objectives
Identify and differentiate between primary and secondary data sources.
Explain the purpose and structure of APIs for data collection.
Write a simple Python script to fetch and parse data from a public JSON API.
Write a Python script to read and process data from a CSV file.
Describe the process of web scraping and identify its key ethical considerations.
Retrieve data from a database table using a basic SQL SELECT query.
Compare and contrast the JSON and CSV data formats.
Ever wonder how your favorite weather app gets its real-time data or how a music app recommends new songs? 🌦️ Let's become digital detectives and learn how to gather the data that powers these technologies!
This lesson introduces the essential skill of data collection, the first...
2
Key Concepts & Vocabulary
TermDefinitionExample
API (Application Programming Interface)A set of rules and tools that allows different software applications to communicate with each other. It acts as a messenger that takes a request, tells a system what you want to do, and then returns the response (data) back to you.Using the OpenWeatherMap API to send a request for the current temperature in 'Toronto' and receiving back a JSON object with the weather data.
CSV (Comma-Separated Values)A simple text file format where data is stored in a table-like structure. Each line is a data record, and each record consists of one or more fields, separated by commas.A file named `students.csv` containing:
`name,grade,score`
`Alice,10,95`
`Bob,10,88`
JSON (JavaScript Object Notation)A lightweight, text-based data format...
3
Core Syntax & Patterns
API Request Pattern (Python)
import requests
response = requests.get('API_ENDPOINT_URL')
data = response.json()
This pattern is used to send a GET request to an API endpoint. The `requests.get()` function fetches the data, and `response.json()` converts the JSON response into a Python dictionary or list.
CSV Reading Pattern (Python)
import csv
with open('filename.csv', 'r') as file:
reader = csv.reader(file)
next(reader) # To skip header
for row in reader:
# process each row
This is the standard way to open and read a CSV file in Python. The `with` statement ensures the file is closed properly, and the `csv.reader` object iterates over each row in the file as a list of strings.
Basic SQL SELECT Query
SELECT column1, col...
4 more steps in this tutorial
Sign up free to access the complete tutorial with worked examples and practice.
Sign Up Free to ContinueSample Practice Questions
Challenging
A city planning department wants to analyze traffic patterns. They have access to live data from traffic sensors via an API, historical traffic data in a large CSV archive, and a real-time map website that shows traffic congestion. Which data source should they primarily use to build a predictive model of traffic at different times of the day?
A.The historical traffic data in the CSV archive, as it provides a large dataset of past patterns necessary for training a model.
B.The live data API, because it shows the most current traffic conditions.
C.The real-time map website, by scraping it every few seconds to gather data.
D.combination of the API and web scraping, ignoring the CSV file because it is outdated.
Challenging
A student writes the following script to process the `inventory.csv`. What is the critical error in their logic?
`import csv
total_value = 0
with open('inventory.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
if row[1] == 'Electronics':
total_value += row[2] * row[3]`
A.The file is opened in read mode ('r') instead of write mode ('w').
B.The script fails to convert the string values from `row[2]` and `row[3]` to numbers (e.g., int or float) before multiplication.
C.The script does not use `next(reader)` to skip the header, causing an error on the first row.
D.The `if` statement incorrectly uses `==` to compare strings.
Challenging
You are given a Python script that uses a database connector and runs the query `SELECT product_name, price FROM products WHERE category_id = 2;`. The script then successfully prints product names and prices. Based on this, what can you infer about the structure of the `products` table in the database?
A.The table is named `products` and contains exactly three columns: `product_name`, `price`, and `category_id`.
B.The table must have a primary key named `product_name`.
C.The table is named `products` and contains, at a minimum, columns named `product_name`, `price`, and `category_id`, where `category_id` is likely a number.
D.The table contains only products from category 2.
Want to practice and check your answers?
Sign up to access all questions with instant feedback, explanations, and progress tracking.
Start Practicing Free