Home of www.mathoninfo.net
or The Mart or
Commercial Pages
Statistics
101
Percentiles,
Quartiles, Box Plots and Box Plot Construction Using Minitab Release 13, and
Time Series Charts and Time Series Chart Construction Using Minitab Release 13
|
This page is
sponsored by: |
Percentiles, Quartiles, Box Plots and Box Plot Construction Using Minitab Release 13, and Time Series Charts and Time Series Chart Construction Using Minitab Release 13
Previous Definitions
4. A population is a set of objects or events being studied.
5. A sample is a subset of the population.
9. Quantitative data are data expressed in units of measure (such as distance).
More Definitions
33. Percentile
a. The percentile of a selected measurement from a quantitative data set is that percent of the measurements of the data set that fall at or below the selected measurement.
b. Also, the percentile of a selected population (or sample) member, with respect to some quantitative characteristic, is the percent of the population (or sample) whose measurements, with respect to the quantitative characteristic, fall at or below the measurement of the selected population (or sample) member.
c. Also, for p>0 and p
100, the pth percentile of a data set of measurements is the smallest measurement such that at least p percent of the measurements (data) fall at or below such smallest measurement.
Context will determine which of the above three percentile definitions apply.
With regard to third percentile definition, it is generally okay to think that if the pth percentile of a data set is measurement x, then p% of the data set fall below measurement x and (100-p)% of the data set fall above measurement x. However, as a word of caution, if we take the extreme case where all measurements of the data set are the same value x, then by the third definition of percentile the 50th percentile (as well as any other percentile) of the data set is x, but 0% of the data set falls below x and 0% of the data set falls above x.
Note that the 50th percentile of a quantitative data set is very often equal to or nearly equal to the median.
34. The first quartile or lower quartile of a quantitative data set is the 25th percentile of the data set, i.e., the smallest measurement such that at least 25% of all measurements within the data set fall at or below such smallest measurement. The first quartile is often designated by Q1 or QL; we will use Q1.
35. The third quartile or upper quartile of a quantitative data set is the 75th percentile of the data set, i.e., the smallest measurement such that at least 75% of all measurements within the data set fall at or below such smallest measurement. The first quartile is often designated by Q3 or QU; we will use Q3.
Note that lower level statistical texts often provide very fuzzy definitions of percentile, first quartile, and second quartile (often so fuzzy that they would violate the conscience and intellectual integrity of any mathematician), so those using these web pages as a course aid should take account of the differences between the above definitions and those provided by the visitor's text.
36. The interquartile range is equal to Q3 - Q1 and will be designated by IQR.
Box Plots
The first quartile, the median, the third quartile, and the interquartile range are used in the construction of box plots.
We briefly outline the steps for constructing a box plot using Minitab for the data table in Displaying Quantitative Data.
Percentage of Weekly Visitors Visiting one of the Summary Web Pages of a Web-Page-Rich Website between the Weeks of 17-23 April 2005 and 19-25 February 2006 Week
4/17/2005-4/23/2005
4/24/2005-4/30/2005
5/01/2005-5/07/2005
5/08/2005-5/14/2005
5/15/2005-5/21/2005
5/22/2005-5/28/2005
5/29/2005-6/04/2005
6/05/2005-6/11/2005
6/12/2005-6/18/2005
6/19/2005-6/25/2005
6/26/2005-7/02/2005
7/03/2005-7/09/2005
7/10/2005-7/16/2005
7/17/2005-7/23/2005
7/24/2005-7/30/2005
7/31/2005-8/06/2005
8/07/2005-8/13/2005
8/14/2005-8/20/2005
8/21/2005-8/27/2005
8/28/2005-9/03/2005
9/04/2005-9/10/2005
9/11/2005-9/17/2005
9/18/2005-9/24/2005
9/25/205-10/01/2005
10/02/2005-10/08/2005
10/09/2005-10/15/2005
10/16/2005-10/22/2005
10/23/2005-10/29/2005
10/30/2005-11/05/2005
11/06/2005-11/12/2005
11/13/2005-11/19/2005
11/20/2005-11/26/2005
11/27/2005-12/03/2005
12/04/2005-12/10/2005
12/11/2005-12/17/2005
12/18/2005-12/24/2005
12/25/2005-12/31/2005
1/01/2006-1/07/2006
1/08/2006-1/14/2006
1/15/2006-1/21/2006
1/22/2006-1/28/2006
1/29/2006-2/04/2006
2/05/2006-2/11/2006
2/12/2006-2/18/2006
2/19/2006-2/25/2006Percentage
1.26
1.05
1.02
0.92
1.03
0.89
0.99
0.93
1.03
0.75
0.75
0.86
*
0.82
0.81
0.82
0.84
0.62
0.83
0.82
0.65
0.52
0.58
0.81
0.72
0.64
0.70
0.75
0.77
0.71
0.76
1.09
0.94
0.66
0.85
1.01
0.98
1.05
0.86
0.80
0.76
0.61
0.99
0.77
0.91Step 1. After transferring the data to a Minitab worksheet, click Graph and move the cursor to Boxplot... in the drop down menu. You have:
Step 2. Click Boxplot... to obtain the Boxplot dialog box. Click the variable name in the window to the left and then click the Select button. You have:
Step 3. Click the Edit Attributes... button to obtain the IQRange Box dialog box. Select Solid under Fill Type (using the drop down menu) and select a color under Back Color (using the drop down menu). The former selection activates the Back Color (so that the back color transfers to the final graph) and the latter is important as the color may not be changed after the graph is constructed without loosing the median line. (Recall that in all of our previous graph constructions we made the former selection and had no need to make the second selection.) In the center of the screen, you have:
Step 4. Click the OK button in the IQRange Box. In the Annotation drop down menu, click Title... and in the Title dialog box type in a title and click OK. In the Regions drop down menu, click Figure... and in the resulting Figure dialog box, click Solid in the Fill Type: menu window and then the OK button. (Recall that this selection permits us to add background color to the larger area surrounding the central graph area after the graph is constructed.) Also in the Regions drop down menu, click Data... and in the resulting Data dialog box, click Solid in the Fill Type: menu window and then the OK button. (Recall that this selection permits us to add background color to the central area of the graph after the graph is constructed.) Click the OK button in the Boxplot dialog box to obtain the box plot graph. Double click the graph to obtain the Tools and Attributes bars and make your final touchups. (For details on the use of these bars, see the pages on Histogram Construction and Dot Plot construction.)
The final result appears below.
The line within the magenta box depicts the median, the bottom line of the box (called the lower hinge) depicts the lower quartile Q1, and the upper line of the box (called the upper hinge) depicts the upper quartile Q3. The approximate value of each, as well as the approximate value of IQR = Q3 - Q1, may be obtained from the scale on the left.
The two vertical lines extending from the box are called whiskers. The top whisker extends to the largest value within the data set that is less than or equal to Q3 + (1.5
IQR), and the bottom whisker extends to the smallest value within the data set that is greater than or equal to Q1 - (1.5
IQR). When there are data values falling above Q3 + (1.5
IQR) or below Q1 - (1.5
IQR), they are depicted in the box plot by asterisks or 0s. Using Minitab, we draw the box plot for the data set of Problem 3 for Centrality, Spreads, Normality, and Chebyshev's Rule to illustrate the use of asterisks.
Weekly Referrals from a Forum to a Website for the first 12 weeks of 2006 Week
1/01/2006-1/07/2006
1/08/2006-1/14/2006
1/15/2006-1/21/2006
1/22/2006-1/28/2006
1/29/2006-2/04/2006
2/05/2006-2/11/2006
2/12/2006-2/18/2006
2/19/2006-2/25/2006
2/26/2006-3/04/2006
3/05/2006-3/11/2006
3/12/2006-3/18/2006
3/19/2006-3/25/2006Referrals
41
6
9
0
0
0
0
4
5
1
1
1
A value represented by an asterisk is said to be a suspected outlier. As to what constitutes an outlier (versus merely a suspected outlier) rests outside the realm of the well-defined; however, it may be safely said that if a suspected outlier is proven to be an erroneous value, it is an outlier. Also, note the absence of a bottom whisker for this data set.
If you wish to draw the box plot by hand (using either a pencil or a software program such as Photo Draw), after transferring the data to Minitab, follow Steps 1.-4. of Solution to Problem 5: Centrality, Spreads, Normality, and Chebyshev's Rule to obtain the Minitab's Descriptive Statistics table. The Descriptive Statistics table provides the median (the line within the box), Q1 (the bottom line of the box, i.e., lower hinge), and Q3 (the top line of the box, i.e., upper hinge), from which may be computed IQR = Q3 - Q1, which in turn permits you to compute the boundaries for the whiskers Q1 - (1.5
IQR) and Q3 + (1.5
IQR), from which you may determine the smallest data value above the lower whisker boundary and outside the box (the terminal point for the lower whisker) and the largest data value below the upper whisker boundary and above the box (the terminal point for the upper whisker). Minitab's Descriptive Statistics table for the above referrals table is given below.
Time Series Charts
A time series chart displays values at particular times or for particular time periods. Over a short span of time, say, two years or less, a time series chart of the following type is often used. The chart depicts medians of monthly sale prices of single-family dwellings in or closely about Fergus Falls, Minnesota, for which the sales agreement was signed in the first ten months of 2005.
The steps in the construction of the above time series chart are:
Step 1. Click Graph and move the cursor to Time Series Plot... in the drop down menu. You have:
Step 2. Click Time Series Plot... to obtain the Time Series Plot dialog box. Click the applicable variable name (here, Median) in the window to the left and then click the Select button. You have:
Click the OK button.
Step 3. Click the Edit Attributes... button to select the desired plot symbol, color, and size. For the above graph the selections were:
Click the OK button.
Step 3. Click Annotation or the Annotation button to obtain a drop down menu. On this menu, click Title... to enter your title in the Title dialog box. You have:
Click the OK button.
Step 4. Click Frame or the Frame button to obtain a drop down menu. On this menu, click Tick... in order to select the number of ticks on the Y-axis. For the above graph the selection was 5 ticks, as shown below on the second line of the Number of Major column.
Click the OK button.
Step 5. Again, click Frame or the Frame button to obtain a drop down menu. This time click Grid... in order to add horizontal lines at the tick marks. Enter Y on the first line of the Direction column and make your selections for the next three columns. For the above graph the selections were:
Click the OK button.
Step 6. Again, click Frame or the Frame button to obtain a drop down menu. This time click Min and Max... in order to select the range of your Y-axis. For the above graph the selections were 0 for the minimum and 125000 for the maximum.
Click the OK button.
Step 7. Recall that by selecting in the Regions drop down menu, first Figure... and then Data..., you may access the Figure dialog box and the Data dialog box, respectively, in order to select Solid for the Fill Type, necessary for permitting color selection of the outer area of the graph and central area of the graph, respectively, once the graph is constructed.
Step 8. Click the Options... button to obtain the Time Series Plot Options dialog box. Here you select the tick mark labels for the x-axis. For the above graph the selections were:
Click the OK button in the Time Series Plot Options dialog box and the OK button in the Time Series Plot dialog box to obtain your graph. As always, to do the final touchups, double click the graph to summon the Tools and Attributes bars. (For details on the use of these bars, see the pages on Histogram Construction and Dot Plot construction.) If you wish to add vertical lines corresponding to some or all of the tick marks on the x-axis, this should be done first.
For a short span of time, a time series chart of box plots is very effective for displaying a broader body of information, as demonstrated next. The following chart provides box plots of monthly sale prices of single-family dwellings in or closely about Fergus Falls, Minnesota, that were sold in the first ten months of 2005.
The steps in the construction of the above time series chart are:
Step 1. Click Graph and move the cursor to Boxplot... in the drop down menu. You have:
Step 2. Click Boxplot... to obtain the Boxplot dialog box. For the Y variable, click the applicable variable name (here, Sale Price) in the window to the left and then click the Select button. For the X variable, similarly click the applicable variable name (here, Month of Agreement) in the window to the left and then click the Select button. You have:
Click the OK button.
Step 3. Click the Edit Attributes... button to obtain the IQRange Box. In this box, select Solid in the Fill Type column and a suitable Back Color. For the above graph, we have:
Click the OK button.
Step 4. Click Frame or the Frame button to obtain a drop down menu. On this menu, click Multiple Graphs... to obtain the Multiple Graphs dialog box. For the above graph, the selections appear below.
Step 5. Recall that by selecting in the Regions drop down menu, first Figure... and then Data..., you may access the Figure dialog box and the Data dialog box, respectively, in order to select Solid for the Fill Type, necessary for permitting color selection of the outer area of the graph and central area of the graph, respectively, once the graph is constructed.
Step 6. Click the OK button in the Boxplot dialog box to obtain your graph. As always, to do the final touchups, double click the graph to summon the Tools and Attributes bars. (For details on the use of these bars, see the pages on Histogram Construction and Dot Plot construction.)
For a multi-year span of time, a times series chart with a logarithmic scale is appropriate when there is growth or an appreciation or a depreciation in value. This is because the slope of the line connecting any two points on such a chart mirrors the rate of growth, the rate of appreciation in value (akin to an interest rate), or the rate of depreciation in value.
From the chart it may be seen that the greatest appreciation in urban residential real estate - houses, town houses, and condominiums - in and closely about Fergus Falls, Minnesota, occurred from 1996 to 1997, assuming that a quite similar body of urban residential real estate came onto the Fergus Falls market each year. The moderate increase in home values from 1999 to 2000 likely reflects higher interest rates, and the very modest increase from 2000 to 2001 likely reflects initially higher interest rates and later the broad impact of the collapse of the New York towers. The modest increase from 2004 to 2005 likely reflects rising interest rates and possibly some credit exhaustion.
The steps in the construction of the above time series chart are the same as those for the first time series chart, save that in the Time Series Plot Option dialog box you select years and the logarithm transformation of the Y axis. The appropriate selections are shown below.
For comparing the performance of several investments over a multi-year span of time, a times series chart displaying normalized or standardized values and employing a logarithmic scale is useful. A normalized or standardized value is one in which all values of a series are divided by the first value of the series, thereby making the first value of each series 1.0. The following time series chart provides a comparison of the appreciation of in-town residential property, rural residential property, and lakeshore residential property in the vicinity of Battle Lake, Minnesota - an area of western Minnesota that is on the western edge of North America's lake region and where summer vacation homes have been common for several decades - and the appreciation of American Century's Real Estate Fund.
From the above chart, it may be seen that lakeshore property in the vicinity of Battle Lake has experienced the greatest appreciation over the given time period, although from 2002-2003 to 2004-2005 American Century's Real Estate Mutual Fund has performed best. These conclusions, of course, assume that for all pairs of years, a quite similar body of properties in each category came onto the Battle Lake real estate market.
Note that here the tick labels of the X-axis must be inserted as part of the touchups after the graph is created, as the Time Series Plot Option dialog box will not accept hyphenated values.
To view in-town, lakeshore, and rural homes, acreage for country estates, and other residential, investment, and commercial properties available in Otter Tail County, Minnesota, and neighboring counties of western Minnesota, visit BEL Realty at: www.ourhomesite.com/belrealty. To learn more about Otter Tail County, click here.
Choice Gifts for Birthday, Graduation, Mother's Day, Father's Day, Engagement, Anniversary, an Evening Out, or a Trip Abroad
Links to Amazon.com
Jewelry by Type: Earrings - Necklaces - Pendants - Brooches - Cufflinks - Money Clips Watches: Men's Watches - Men's Watches by Gucci - Men's Watches by Movado - Men's Watches by Tissot - Men's Watches by Cartier - Women's Watches - Women's Watches by Gucci - Women's Watches by Movado Thank you for visiting and patronizing Math on Info.
|
|