Coffee or Tea Specialty Shops
Jim Thorton’s
Executive Team
Georgia Tselikis
March 28, 2020
Overview
The following analysis explores
collected data on the market for coffee and tea from the
past 23 years. Measures of central tendency, measures of spread, distribution of both
histograms and box-plots, linear correlation, and correlation type will all be analyzed to
provide a basis of recommendation for whether coffee or tea is the best option to invest
in a specialty shop.
Measures of Central Tendency
The
mean
represents all values of the data set, it is also referred to as the
average
.
From the data set for
Tea (L per person),
the mean is 70.94478261, thus representing
all of the Tea values from the data set. From the
Coffee (L per person)
data set, the
mean is 100.1286957, again this number represents all the values of the Coffee data
set. Looking at both means, it is evident that the mean for the data set of Coffee (L per
person) is greater than the mean for the data set of Tea (L per person). This indicates
that the Coffee dataset is slightly more spread out.
The
median
is the data point that lies in the middle of a data set when ordered, and is
also referred to as the 50th percentile.
From the data set for
Tea (L per person),
the
median is 68.31, the mean from this data set, recall, is 70.94478261. It is greater than
the median thus the data is positively skewed.
From the
Coffee (L per person)
data
set, the median is 101.31, it is greater than the mean, recall, 100.1286957, thus the
data is negatively skewed.
The
modal interval
is
the interval of data points that shows up the most.
From the data
set for
Tea (L per person),
the modal interval is 55 - 70. This indicates that the most
years from the data set, have 55 - 70 litres of tea per person. From the data set for
Coffee (L per person),
the modal interval is 100 - 105. This indicates that the most of
the years from the data set have 100 - 105 litres per person.
Measures of Spread
Measures of spread help to describe the variability in a data set by summarizing the
extent to which data is clustered around the center. There are three measures of
spread, range, interquartile range (IQR), and standard deviation.

The
range
is the difference between the maximum data point and the minimum data
point. The range for the data set of
Tea (L per person)
is 69.2. The range is not
extremely small, but not extremely large either, this indicates that there is a
moderate/medium amount of variability. From the
Coffee (L per person)
is 18.68. This
is a small range thus indicating that
in the data set there is low variability. Comparing
the ranges from both the Tea data set and the Coffee data set, it can be seen that the
Tea data set has a higher level of variability than the Coffee data set.