Interpret a scatter plot

Tutorial Preview

1

Introduction & Learning Objectives

Learning Objectives Identify the independent and dependent variables on a scatter plot. Describe the correlation between two variables as positive, negative, or no correlation. Classify the strength of a linear correlation as strong, moderate, or weak. Distinguish between linear and non-linear relationships. Identify outliers in a data set. Use a given line of best fit to make predictions (interpolation and extrapolation). Explain the difference between correlation and causation. Does the number of hours you spend on social media affect your grades? 📱 A scatter plot can help us visualize the answer! This tutorial will teach you how to read and understand scatter plots, which are powerful tools for seeing relationships between two different things. You'll learn how t...

2

Key Concepts & Vocabulary

TermDefinitionExample Scatter PlotA graph that uses dots to represent the values for two different numeric variables. Each dot represents one observation or data point.A plot where the x-axis is 'Hours Studied' and the y-axis is 'Test Score'. Each dot on the graph represents one student's data. CorrelationA statistical measure that describes the direction and strength of the relationship between two variables.The positive correlation between temperature and ice cream sales means that as the temperature goes up, sales also tend to go up. Positive CorrelationWhen two variables move in the same direction. As the independent variable (x) increases, the dependent variable (y) also tends to increase. The points on the plot generally trend upwards from left to right.The...

3

Core Formulas

Line of Best Fit Equation y = mx + b This is the equation for a straight line. In a scatter plot context, 'm' is the slope, representing the rate of change between the two variables. 'b' is the y-intercept, representing the predicted value of y when x is zero. We use this equation to make predictions. Interpolation Predicting a 'y' value for an 'x' value that is *within* the range of the original data. Use the line of best fit equation. Substitute a given x-value from inside your data's range to find the corresponding y-value. This type of prediction is generally considered reliable. Extrapolation Predicting a 'y' value for an 'x' value that is *outside* the range of the original data. Use the line of b...

4 more steps in this tutorial

Sign up free to access the complete tutorial with worked examples and practice.

Sign Up Free to Continue

Sample Practice Questions

Challenging

A scatter plot of a plant's height versus days since planting shows a relationship that is initially linear but then levels off, forming a curve. There is also one data point showing a very tall plant after only a few days. Which statement provides the most complete description of the data?

A.There is a strong, positive, linear correlation between the variables.

B.The data shows a weak negative correlation with a significant outlier.

C.The data shows a non-linear correlation and contains no outliers.

D.The data shows a non-linear relationship and includes a potential outlier.

Challenging

A city's population (P, in thousands) is modeled by P = 5.2t + 850, where t is years since 2010. Data was collected from 2010 (t=0) to 2020 (t=10). A planner predicts the population in 2018 (t=8) will be 891.6 thousand and the population in 2040 (t=30) will be 1006 thousand. Which is the best evaluation of these predictions?

A.Both predictions are unreliable extrapolations.

B.The 2018 prediction is a reasonable interpolation, while the 2040 prediction is a potentially unreliable extrapolation.

C.The 2040 prediction is more reliable because it is further in the future.

D.Both predictions are reliable because the model is linear.

Challenging

A study finds a strong negative correlation between the number of local libraries in a region and its high school dropout rate. A researcher concludes, 'To lower the dropout rate, we must build more libraries.' What is the fundamental flaw in this conclusion?

A.The researcher should have used a line of best fit to make the conclusion.

B.The correlation is positive, not negative, so the conclusion is backwards.

C.The researcher is assuming causation from correlation; a third factor, such as regional wealth, likely influences both variables.

D.The conclusion is flawed because a non-linear model would be more appropriate.

Want to practice and check your answers?

Sign up to access all questions with instant feedback, explanations, and progress tracking.

Start Practicing Free

More from Statistics

Identify biased samples Identify an outlier and describe the effect of removing it Identify an outlier Match correlation coefficients to scatter plots Quartiles

Tutorial Preview

Introduction & Learning Objectives

Key Concepts & Vocabulary

Core Formulas

Sample Practice Questions

More from Statistics

Ready to find your learning gaps?