Computer Science Grade 10 20 min

Data Collection: Gathering Data from Various Sources

Learn about different data sources and techniques for collecting data (e.g., APIs, web scraping, files).

Tutorial Preview

1

Introduction & Learning Objectives

Learning Objectives Identify and differentiate between primary and secondary data sources. Explain the purpose and structure of APIs for data collection. Write a simple Python script to fetch and parse data from a public JSON API. Write a Python script to read and process data from a CSV file. Describe the process of web scraping and identify its key ethical considerations. Retrieve data from a database table using a basic SQL SELECT query. Compare and contrast the JSON and CSV data formats. Ever wonder how your favorite weather app gets its real-time data or how a music app recommends new songs? 🌦️ Let's become digital detectives and learn how to gather the data that powers these technologies! This lesson introduces the essential skill of data collection, the first...
2

Key Concepts & Vocabulary

TermDefinitionExample API (Application Programming Interface)A set of rules and tools that allows different software applications to communicate with each other. It acts as a messenger that takes a request, tells a system what you want to do, and then returns the response (data) back to you.Using the OpenWeatherMap API to send a request for the current temperature in 'Toronto' and receiving back a JSON object with the weather data. CSV (Comma-Separated Values)A simple text file format where data is stored in a table-like structure. Each line is a data record, and each record consists of one or more fields, separated by commas.A file named `students.csv` containing: `name,grade,score` `Alice,10,95` `Bob,10,88` JSON (JavaScript Object Notation)A lightweight, text-based data format...
3

Core Syntax & Patterns

API Request Pattern (Python) import requests response = requests.get('API_ENDPOINT_URL') data = response.json() This pattern is used to send a GET request to an API endpoint. The `requests.get()` function fetches the data, and `response.json()` converts the JSON response into a Python dictionary or list. CSV Reading Pattern (Python) import csv with open('filename.csv', 'r') as file: reader = csv.reader(file) next(reader) # To skip header for row in reader: # process each row This is the standard way to open and read a CSV file in Python. The `with` statement ensures the file is closed properly, and the `csv.reader` object iterates over each row in the file as a list of strings. Basic SQL SELECT Query SELECT column1, col...

4 more steps in this tutorial

Sign up free to access the complete tutorial with worked examples and practice.

Sign Up Free to Continue

Sample Practice Questions

Challenging
A city planning department wants to analyze traffic patterns. They have access to live data from traffic sensors via an API, historical traffic data in a large CSV archive, and a real-time map website that shows traffic congestion. Which data source should they primarily use to build a predictive model of traffic at different times of the day?
A.The historical traffic data in the CSV archive, as it provides a large dataset of past patterns necessary for training a model.
B.The live data API, because it shows the most current traffic conditions.
C.The real-time map website, by scraping it every few seconds to gather data.
D.combination of the API and web scraping, ignoring the CSV file because it is outdated.
Challenging
A student writes the following script to process the `inventory.csv`. What is the critical error in their logic? `import csv total_value = 0 with open('inventory.csv', 'r') as file: reader = csv.reader(file) for row in reader: if row[1] == 'Electronics': total_value += row[2] * row[3]`
A.The file is opened in read mode ('r') instead of write mode ('w').
B.The script fails to convert the string values from `row[2]` and `row[3]` to numbers (e.g., int or float) before multiplication.
C.The script does not use `next(reader)` to skip the header, causing an error on the first row.
D.The `if` statement incorrectly uses `==` to compare strings.
Challenging
You are given a Python script that uses a database connector and runs the query `SELECT product_name, price FROM products WHERE category_id = 2;`. The script then successfully prints product names and prices. Based on this, what can you infer about the structure of the `products` table in the database?
A.The table is named `products` and contains exactly three columns: `product_name`, `price`, and `category_id`.
B.The table must have a primary key named `product_name`.
C.The table is named `products` and contains, at a minimum, columns named `product_name`, `price`, and `category_id`, where `category_id` is likely a number.
D.The table contains only products from category 2.

Want to practice and check your answers?

Sign up to access all questions with instant feedback, explanations, and progress tracking.

Start Practicing Free

More from Data Science Fundamentals: Exploring and Visualizing Data

Ready to find your learning gaps?

Take a free diagnostic test and get a personalized learning plan in minutes.