Module 1.2 Notes
Describing Data with Pictures
 

Index to Module One Notes

1.1: Why Statistics for Mgt 

1.2: Describing Data: Pictures 

1.3: Describing Data: Number Summaries

1.4: Estimating with Confidence 

1.5: Testing Hypothesis

 

This section of Module 1 will get us into Microsoft Excel, and take us to the point of creating a histogram summary of numerical data. Table 1.2.1 shows two and a half years of cycle time data from the manufacturing firm introduced in Module 1.1 Notes.

Table 1.2.1

1997
1998
1999
January
19
20
21
February
21
22
21
March
20
19
29
April
16
20
25
May
18
23
21
June
23
19
21
July
22
27

August
24
22

September
17
18

October
26
20

November
20
19

December
21
18

Enter this data into a single column in a Microsoft Excel spreadsheet. For some extra practice with Excel features, you can place this column to the right of a column for Month/Yr, as shown in Worksheet 1.2.1. For example, I placed the months of the years in column A. Thus, column A, row 1 (cell A1), has the title and the specific months go from cell A2 to A31. In Microsoft Excel 98 and 2000, the default format for month/year is Mon-XX. So, when I typed January 1997 in Cell A2, and hit the enter key, Jan-97 appeared (you can change the default format of any cell by selecting Format on the Standard Toolbar, then Cells from the pulldown menu, then the Number tab, then whatever format is desired from the available list). Now, if you select cell A2, note a small square in the lower right of the highlighted cell border. Click and drag that square down to cell A31 and all of the months/years will enter successively - slick, huh?!

Cell B1 has the title for the variable "Time," and the data is entered from cell B2 to B31. There is no magic way of entering the data for the variable "Time" in column B, at least not until I get voice data entry capability. Of course, the typical FGCU MBA student has data entry employees who can do this for them at work - right?!!

An aside: if you have any questions about Excel as we go through the course, please e-mail/call me so we can get them resolved quickly. Another option for questions: I find the Excel Help feature getting better and better with each version, but it is getting so big that you seem to need basic search engine skills to locate your topic quickly.

I then copied the Time column into Column C and re-titled it "Sorted." One way to copy a range of data: select or highlight cells B1 to B31 by clicking on B1, then dragging to B31, select the Copy Icon on the Formatting Toolbar, select or highlight Cell C1, and click the Paste Icon on the Formatting Toolbar. Sort the data from lowest to highest by selecting or highlighting Cells C1 to C31, then click on the Sort Ascending Icon (A to Z) on the Formatting Toolbar. Note that his technique sorts the data in rows based on the content of one column. To sort the data in rows based on contents of two or more columns, select the rows and columns of your data range, click on Data on the Standard Toolbar, select Sort from the pulldown menu, and then follow the dialog screen entry instructions.

Did you replicate the columns in Worksheet 1.2.1? If not, you should e-mail or phone me for help. If so, save your work to a file before going on. Assignment 1 requires entry of your data, copying to a separate column, then sorting the duplicate column of data (Items 1 and 2 of Project Assignment 1).

Worksheet 1.2.1
Month/Yr
Time
Sorted
Jan-97
19
16
Feb-97
21
17
Mar-97
20
18
Apr-97
16
18
May-97
18
18
Jun-97
23
19
Jul-97
22
19
Aug-97
24
19
Sep-97
17
19
Oct-97
26
20
Nov-97
20
20
Dec-97
21
20
Jan-98
20
20
Feb-98
22
20
Mar-98
19
21
Apr-98
20
21
May-98
23
21
Jun-98
19
21
Jul-98
27
21
Aug-98
22
21
Sep-98
18
22
Oct-98
20
22
Nov-98
19
22
Dec-98
18
23
Jan-99
21
23
Feb-99
21
24
Mar-99
29
25
Apr-99
25
26
May-99
21
27
Jun-99
21
29

Now it's time to create a histogram to summarize the data in a picture. As we will see, the histogram tells us something about the center of the data, the spread of the data, and the shape of the data.

The fastest way to create a histogram is to use the Data Analysis Histogram Tool. To do this, you have to activate the Data Analysis Add-In feature of Excel. Select Tools on the Standard Menu Bar and check to see if Data Analysis is available on the pulldown menu. If not, select Add-Ins and the Add-Ins dialogue box will appear. Select the Analysis ToolPak check box. Click OK and exit Excel (CAUTION: Save your Excel File before exiting). When you open Excel, the Data Analysis Add-In feature should be available through the Tools pulldown menu on the Standard Toolbar.

If the Analysis ToolPak selection is not available in the Add-Ins, the Data Analysis component was probably not included when your copy of Excel was installed on your computer. Select Help on the Standard Toolbar, then select Microsoft Excel Help in the pulldown menu, select the Index tab, type keyword "Installation," select Search, and then select Install or Remove Individual Features, and follow the instructions. The Data Analysis component allows Excel to be somewhat competitive with statistical software packages. The component doesn't have the full functionality of software packages such as Minitab, SPSS or SAS, but it has the features commonly used in business applications... and these are on a spreadsheet used by most of the world.

After you have activated this important "Add In" that we will be using throughout this course, create a histogram. To do this, select Tools on the Standard Toolbar, then Data Analysis from the pulldown menu and respond to the dialog screen. I entered C1:C31 for the input range and checked Label in the Label Checkbox. This means that the first cell of my data range actually is a title rather than a data entry. If I would have entered C2:C31 for the input range, I would not have checked Label (if you are like me, you will make this mistake several times before you get the hang of Excel Data Analysis dialog screens). Skip bin range to let the software create a default set of frequency bars for the histogram. I like to put the histogram on the same page as my data so I select Output Range, then put the cursor in the rectangle next to Output Range, and type in a cell location for placement of my histogram. The cell location will identify the upper left corner of where the histogram will appear in the worksheet, so just make sure the worksheet is clear to the right and down from the selected cell location. I entered D1 as the worksheet was empty to the right and down from D1. Select Chart Output and OK and you should get something like Worksheet 1.2.2.

Worksheet 1.2.2

Bin

Frequency

16

1

18.6

4

21.2

15

23.8

5

26.4

3

More

2

The bins of the histogram represent a convenient way of summarizing the data by showing the frequency of observations at preselected intervals - the family name given to this type of chart then, is the frequency distribution. The histogram shows one observation in the Bin labeled 16. Next, there are 4 observations in the Bin with a value of 18.6 (observations with values of 17, 18, 18, and 18). This means that in the interval of numbers greater than 16 and less than 18.6 there are four numbers: 17, 18, 18, and 18. Likewise, there are 15 numbers in the interval of numbers greater than 18.6 and less than 21.2, and so forth. The bin labels are the upper limits of each class interval chosen by Excel.

How did Excel determine the interval of each bin should be 2.6? It's fairly simple, actually (you can skim this and the next paragraph if you want to go directly to interpretation of Center and Spread below). First, Excel computes the range of the data: highest number minus the lowest number = 29 - 16 = 13. Next, Excel divides the range by the number of intervals or bars desired to summarize and display the data. The default number of intervals is 5, so 13 divided by 5 gives a class interval range of 2.6. Now Excel divides the numbers 16 through 29 into five class intervals, each with a range of 2.6.

- The first interval includes the numbers < 16 (there is one 16)
- The second interval includes numbers > 16 to 18.6
- The third interval includes the numbers > 18.6 to 21.2
- The fourth interval includes the numbers > 21.2 to 23.8
- The fifth interval includes the numbers > 23.8 to 26.4
- The last interval is titled "More" by Excel: and includes all the numbers above the last upper limit (26.4 in this case) for placing in an extra "bucket" or bin.

Some notes on the location and shape of the chart: What if you do not like the location of the histogram? Select the histogram by pointing and clicking anywhere on the white area of the chart. You should see 8 small squares called "handles" around the border of the chart. Now, click again anywhere on the white area of the histogram and drag the chart wherever you wish. You can also change the shape of the histogram. Point and click on the middle handle on the left border and drag to the left to make the historgram longer to the left, or drag into the histogram to make it smaller - you can do the same to the right side. Point and click on the middle handle of the top border and drag up to make the histogram wider to the top - same for the bottom. If you point, click and drag a handle on the corner, the histogram gets larger if you drag out, or smaller if you drag in. If you point and click on the histogram, select File on the Standard Toolbar, and Print, you can print just the histogram. If you do not point and click on the histogram and select File and Print, you can print the histogram and whatever else is on the worksheet, such as the data. I like to select File and Print Preview to see where the page borders are to determine if I have to move the histogram so it doesn't get cut up when printing.

Creating a histogram is sometimes considered more "art" than "science." We are trying to develop a picture that best illustrates the center, spread and shape of the data distribution to communicate information about the underlying process. More intervals provide greater detail at the expense of losing the identification of the process center. Less intervals provide a more concise summary at the expense of losing the identification of the process shape. Excel does a very good job at creating the histogram as long as there is are at least 30 observations. You can change the number of intervals, the bin labels, and the frequency count by manipulating the Bin and Frequency cells of the frequency table that accompanies each histogram.


The Center and Spread of a Set of Data

We can see that the center of the data appears at about 21, and the spread is from 16 to "More." You can change "More" to 29 where 29 is the highest number in the data set by typing 29 over the word "More" in the frequency table next to the histogram. This would then allow the reader to see that the spread of the data is from a low of 16 to a high of 29. Exact numerical measures of the center and spread of the data will be introduced in Module 1.3.


The Shape

One other thing we get from the histogram is the shape - the shape shown in Worksheet 1.2.2 is a symmetric bell-shaped distribution called the Normal Probability Distribution. The histograms of many distributions appear like this, especially if the continuous data in a sample under study is randomly selected, without bias to very high or very low numbers, and without multiple processes mixed together. I will illustrate the effects of very high or very low numbers next.

Suppose for a moment that the last observation (Jun 99) was 38 days rather than 29 days. Now look at the histogram in Worksheet 1.2.3.

Worksheet 1.2.3

Bin

Frequency

16

1

20.4

13

24.8

12

29.2

3

33.6

0

38

1

Note that the 38 lies outside and to the right of the range and center of the data - it is called an outlier. When outliers occur to the right of the range and center, we say the distribution is skewed right. Generally, outliers are events that are not created or expected from a process and should be removed and investigated. In statistical process control, an outlier is called a signal, and is an indication that a process is going out of control and must be shut down and examined. Once the outlier is removed, the histogram will return to its symmetric shape.

What happens if the June 1999 observation was an unexpected short cycle time, such as 7 days. Look at the resulting Histogram in Worksheet 1.2.4.

Worksheet 1.2.4

Bin

Frequency

7

1

11

0

15

0

19

9

23

16

27

4

This histogram shows an outlier or signal event to the left of the range of the data. When outliers occur to the left of the range and center, we say the distribution is skewed left.


Bi-Modal Shape

The next histogram, Worksheet 1.2.5, shows two distinct bell-shaped distributions, one with a center at about 10, the other between 21 and 22. The histogram would suggest that two processes are actually in the set of data - perhaps a set of data representing a process before improvements were made (center at 21.6 days), and one set of data representing a process after improvements were made (center at 10 days). This is a very common phenomenon in business. For example, in season variables may have much higher values than off season. As we will see in Modules 3 and 4, these type of Bi-Modal, or even multi-modal distributions should be separated or stratified before analysis of the data.

Worksheet 1.2.5

Bin

Frequency

3

1

6.7

10.0

10.4

17.0

14.1

2.0

17.9

2.0

21.6

18.0

25.3

7.0

29

3


There are some numerical summary measures that help us measure shape. These will be introduced in Module 1.3. The histogram is the third item required in your analysis of data for Project Assignment 1.

Pause and Reflect

Pictures, such as histograms, quickly communicate information about the center, spread and shape of distributions so that we begin to know about the underlying process that generates the data. This is true whether we are describing data from a population of interest, or data in a sample drawn from a population.


Reference:

Anderson, D., Sweeney, D., & Williams, T. (2001). Contemporary Business Statistics with Microsoft Excel. Cincinnati, OH: South-Western, Chapter 2.


| Return to Module 1 Overview | Return to top of page |

  About the Course
Module Schedule
WebBoard