CC Statistics Worksheet
1Review quiz: Topics 1-2-3
Q1
4 Points
A 2017 study examined a large sample of health records from a California health insurance for children born between 2000-10. The study found that 3103 children had a diagnosis of autism spectrum disorder (ASD) and the remaining 193,826 did not. Below is a summary table from the publication.
Q1.1
2 Points
The value 82% listed in this table is part of:
Q1.2
2 Points
How common was an ADS diagnosis among the girls in this study? Give your answer as a percentage rounded to 1 decimal place, for ex. 11.1%. (Do include the % symbol but do not spell it out or add text.)
Q2
4 Points
A 2012 study of freely forming groups in bars all over Europe examined the natural behavior of groups. The researchers recorded the group size (number of individuals in the group) of all 501 groups in the study that were naturally laughing. The findings were displayed in the figure below.
Q2.1
2 Points
Complete this summary sentence by filling in the blanks with numerical values rounded to one decimal place if needed (for ex., 1.1, no text or units).
For groups in European bars observed to be naturally laughing before the pandemic (in 2012), group size ranged from a minimum of
to a maximum of
with a median size of
Q2.2
2 Points
Can we conclude from these findings that smaller groups are more likely to be laughing than larger groups? Explain your reasoning in one short sentence.
Q3
4 Points
A 2015 study examined whether light-emitting eReaders at bedtime may impair sleep quality. A random sample of 12 healthy adults slept in the lab on two different days. In random order, participants read for 30 minutes a print book one night and a light-emitting eReader the other night. Scalp electrodes were used to measure how long (in minutes) it took participants to reach a deep sleep stage. On average, the subjects took 12.3 minutes longer to reach deep sleep with the eReader than with the book.
Q3.1
2 Points
Choose the appropriate layout to record data from this study.
Q3.2
2 Points
On average, it took healthy adults longer to reach deep sleep when they used an eReader compared to a print book (a “significant,” surprisingly substantial, difference). What conclusion can we draw from this study?
Q4
6 Points
In 2018, the data analytics website, The Pudding, published an analysis of men’s and women’s pocket sizes for the most popular American brands of blue jeans. Download the data file below:
https://drive.google.com/file/d/14IT82MGQ6MnLct9ao04BHT6cMldwF5fb/view?usp=sharing
(This file, jeans-pockets2018.csv, is also accessible from Canvas on the Topic 2 Roadmap.) Import this data file into CrunchIt: http://crunchit3.bfwpub.com/psls4e
We will look at the price (in dollars, $) of men’s and women’s jeans. This corresponds to the variables “price” and “menWomen” in the data file.
Q4.1
2 Points
Plot the data for men’s and women’s jeans prices in a side-by-side boxplot. Be sure to label both axes of your graph and include units as needed. Set the title to be your first and last name — understand that you will not get full credit unless your graph has your name in its title. Right-click on your graph to “save as image” on your desktop, then upload this image here.
Q4.2
2 Points
Obtain the 10th percentile of prices for women’s jeans. Enter this value rounded to the nearest dollar (for ex., 11, no text or symbol).
Q4.3
2 Points
The 10th percentile of prices for women’s jeans indicates that 10% of women’s jeans in the data set are
Q5
2 Points
National College Athletic Association (NCAA) provides data on college sports injuries. When an injury occurs, the NCAA adds an entry to their database. The database does not track which player experienced the injury.
This figure below summarizes all injuries experienced by women’s collegiate volleyball players from 2005 to 2009. It shows what percent of all injuries were of each type. Injury types that made up less than 2% of all injuries were grouped together as “All other.”
Consider these two statements summarizing the findings:
A) Between 2005 and 2009, the most common type of injury reported by women collegiate volleyball players was a sprain.
B) Between 2005 and 2009, nearly half of all women collegiate volleyball players experienced either a sprain or a muscle tear.
Q6
2 Points
A study assessed the effects of playing a computer game (solitaire) during lunch on snack behavior. A sample of 44 healthy adults was randomly assigned to eat lunch either with the computer game distraction or without distraction. All participants ate the same breakfast and lunch.
Thirty minutes after lunch, participants were offered cookies as a snack and were instructed to eat as many or as few cookies as they liked. Those who ate with the distraction ate on average 52 grams of cookies, compared with 27 grams for those who ate without the distraction — a surprisingly substantial (statistically significant) difference.
What can we reasonably conclude from this study?
Q7
2 Points
New York State (excluding New York City) reported the deaths of all white men and black men who died from prostate cancer (PC) in 1994. The data are summarized below.
Overall, the death rates from prostate cancers were very similar for White men and Black men in New York State in 1994. This fact, however, obscures some stark differences between White men and Black men, where Black men clearly suffered disproportionately from prostate cancer. This paradox is due to the fact that
Q8
4 Points
A study published in 2015 examined material-need insecurities (food insecurity, cost-related medication underuse, housing instability, and energy insecurity) among adults diagnosed with diabetes. The study collected data on a random sample of 411 adult diabetic patients treated in medical centers all over Massachusetts. It found that 64.1% of the patients who experienced food insecurity had a poor control of their diabetes, whereas 41.6% of the patients who didn’t experience any food insecurity had a poor control of their diabetes.
Q8.1
2 Points
Which of the following plots correctly displays the reported study findings?
Q8.2
2 Points
What would be an appropriate conclusion from this study?
Q9
6 Points
The pulse oximeter is an inexpensive and non-invasive device that assesses a person’s blood oxygen level (“oxygen saturation”) by shining light through the skin of a finger. It is commonly used to determine whether a COVID-19 patient should be admitted to the emergency room and how much oxygen to give them.
A 2020 study compared the reading from a pulse oximeter with the actual oxygen level in blood drawn from an artery, in a large sample of hospitalized patients who identified their race as Black or White. The findings are summarized in the figure below. The figure highlights the arterial oxygen saturation value 88% because of its clinical relevance in a number of medical decisions.
Q9.1
2 Points
Select the best option from the listed choices to fill in the “BLANKS” in the following sentence:
Among patients with a pulse oximeter reading of 91%, BLANK-1 of White patients and BLANK-2 of Black patients had actual arterial oxygen saturations below the 88% level.
BLANK-1:
BLANK-2:
Q9.2
2 Points
The pulse oximeter performed worse overall for Black patients than for White patients, with data showing a tendency of the device to
Q9.3
2 Points
In the article, the researchers stated that “Questions about pulse oximeter technology have been raised, given its original development in populations that were not racially diverse.”
To what statistical issue does this statement refer?
Q10
2 Points
A study of the effect of vitamin E on vascular disease in smokers recruited 22 healthy young adult male smokers and randomly assigned them to take for four weeks either a vitamin E supplement pill or an identical-looking pill containing no vitamin E. The study used ultrasound to measure the participants’ arterial diameter (in millimeters) at the end of the study. The ultrasound operator was not told which participant had taken the vitamin E supplement.
Explain why this study does not have a case-control design.
Q11
9 Points
Manatees are large aquatic mammals that were listed until recently as an endangered species in the United States. Florida has collected census data on manatee deaths for all years since 1977, except for 2020 when data collection was interrupted by the pandemic.
Download the manatee-deaths2019.csv data file at
https://drive.google.com/file/d/1fokbY784GSWN4xC4jvdPd62UoeT4x9wf/view?usp=sharing
containing the number of manatees that died because of a powerboat collision and the number of powerboats registered (recorded in thousands) in any given year in Florida between 1977 and 2019. (Beware that this file is different from the one posted on Topic 3 Roadmap because it has one more row.)
You can upload the data into CrunchIt (http://crunchit3.bfwpub.com/psls4e) either from the ‘Menu / File / Import from file’ or from the ‘Menu / Edit / Paste Spreadsheet’ options.
Q11.1
2 Points
Create a scatterplot of the number of manatees that died because of a powerboat collision and the number of powerboats registered (in thousands) in any given year in Florida between 1977 and 2019. Be sure to properly label the axes and include units as needed. Also enter your first and last name in the title of the graph — understand that you will not get full credit unless your graph has your name in its title. Right-click on your graph to “save as image” on your desktop, then upload this image here.
Q11.2
3 Points
Run a linear regression analysis modeling manatee deaths from collision as a function of thousand powerboats registered. In one sentence, interpret the value of the slope in context.
Q11.3
2 Points
Enter the numerical value of r2 as a number between 0 and 1 rounded to 2 decimal places (for ex., 0.11, no units, no text or letters).
Q11.4
2 Points
Based on this linear regression model, if Florida were to limit to 700,000 the number of powerboats registered in a year, what could we expect the number of manatee deaths due to collisions with powerboats to be in such years? We would expect
Q12
5 Points
Here is a figure from a report by 2014 report by the Pew Research Center based on a random sample of American adults who use the internet.
Q12.1
3 Points
Express the left-most light-blue value on this graph, 22, as a sentence in context.
Q12.2
2 Points
Explain in one sentence why the dark blue bars for male respondents do not add up to 100% in this figure.