Main Module 1
Presenting, Describing and Inferring from a Sample of Numerical Data

 

Learning Objectives

At the end of the module, the student will:

  Enter and analyze numerical data using Microsoft Excel

  Present a picture of the shape, center and variability of numerical data

  Present and interpret selected numerical measures of center, spread and shape of numerical data

  Understand appropriate application of measures of center and spread

 Use measures of center and spread to interpret the "Voice of the Process" and distinguish "noise" from process "signals" or outliers

Understand process capability and process variability (compute probabilities)

Use appropriate sample statistics to estimate a population mean

Construct and test hypothesis statements about a population mean

 

Module Notes

The following sub modules contain summary notes for the five topic areas of Module 1.

Module 1.1: Why Statistics for Management?

Module 1.2: Describing Data with Pictures 

Module 1.3: Describing Data with Numerical Summaries 

Module 1.4: Estimating with Confidence 

Module 1.5: Hypothesis-Testing 

Module 1.6: Useful Excel Functions

Assignment 1

Part A.

Select a numerical variable, such as time to complete an activity, days to pay accounts or dollars of profit contribution, etc, from one of the data files supplied by the instructor.  Using Microsoft Excel:

1.  Enter the data.

2.  Sort data from lowest to highest values.

3. Create a histogram.

4.  Report descriptive statistics using Excel Data Analysis.  Report the first quartile, median, third quartile, and minimum and maximum values.

5.  Standardize the data.

6.  Remove outliers, if any, and generate new descriptive statistics if necessary (Item 4).

7. Compute the probability of exceeding the 5th from last observation in the sorted data set.

8. Create a confidence interval for the population mean, using data with outliers (if any) removed.

9. Select a value for a true population mean for which you reject the null hypothesis. The hypothesis test may be one or two-tailed. Using an alpha of 0.05, carry out the hypothesis test in Excel.

10. Select a value for a true population mean for which you do not reject the null hypothesis. The hypothesis test may be one or two-tailed. Using an alpha of 0.05, carry out the hypothesis test in Excel.

Part B.

Answer the following problems (P1 through P3) and show your work:

P1.   As the new analyst hired by Bank Inc, you were asked to determine if the mean of all account balances is significantly different from $1,150.  A sample of 81 account balances showed an average balance of $1,200 with a standard deviation of $126. 

a.            State the hypotheses associated with the analyst’s test.

b.            Conduct the t-test and state your conclusion at alpha equal to 5%

 

P2.  If a real estate market is strong there will be a close relationship between the asking price for homes and the selling price.   Suppose that one analyst believes that the mean difference between asking price and selling price for homes in a particular market area is less than $2,000.   To test this using an alpha level equal to 0.05, a random sample of n = 15 homes that have sold recently was selected.  The sample differences between asking price and selling price are in the following table:

 

$          2,053

 $          1,798

 $          1,996

 $          2,070

 $          1,100

 $          1,496

 $          2,473

 $          2,015

 $          2,303

 $          1,509

 $          1,028

 $          2,155

 $          2,085

 $          1,994

 $          1,465

 

Based on these sample data, what is your conclusion? State the hypotheses, conduct the t-test, and state the 3-part conclusion.       

P3. The CT County Wireless phone company claims in its annual report that ‘the typical customer spends $60 per month on text message and web media communication services’.  A sample of 12 subscribers revealed the following amounts spent last month.

$64

66

64

65

59

64

66

61

64

58

54

66

 

a.         What is the point estimate of the population mean?

b.         Develop a 90% confidence interval for the population mean

c.         Based on your computations in (b), is the company’s claim that the ‘typical customer’ spends $60 per month reasonable with a 90% confidence?

 

Be able to demonstrate your knowledge of the learning objectives, as applied to this assignment, in Exam 1.

 

Practice problems (not for grading)

 

A)   Solve a selected set of problems related to this module in this link and check your answers here.  

 

B)   Analyze a data set related to AJ fitness club by answering the following questions:

           

a.    Sort column Membership Age from lowest to highest values.

b.    Create a histogram for age for classes limits 10, 20, 30, 40, 50, and 60 years of age

c.    Generate the descriptive statistics using Excel Data Analysis tool.

d.    Compute first quartile, third quartile, and inter-quartile range.

e.    Compute the margin of error for a 95 confidence level.

f.     Compute the 95% confidence interval for the population mean.

g.    Test the hypothesis that the mean membership age is not equal to 35. (Ha: mean =/=35 years)

h.    Check if there is any member who is considered an outlier above the mean. That is, if any member has an unusually high age for the club.

 

The solution for this data problem can be found here.

Useful Excel Functions in Module 1

* all functions in Excel start with '=' sign

=AVERAGE(array)

=STDEV(array)

=QUARTILE(array,1)---1 for Q1,3 for Q3, etc.

=PERCENTILE(array,k)----kth percentile in decimal

=STANDARDIZE(x,mean,stdev)----compute Z score

=NORMSINV(probability)---returns Z score, negative or positive

=NORMSINV(confidence level + alpha/2)-- returns Z score for a desired confidence level

=NORMSDIST(Z score)---prob of a less than Z occur

=NORMDIST(x,mean,stdev,TRUE)-----probability of less than x occur

=1-NORMSDIST(Z)---probability of a score exceed  Z

=CONFIDENCE(alpha,stdev,sample size)---sampling error computed for a Normal dist

=TINV(alpha,n-1)---computes t value for a given alpha and sample size n

P-values in Excel

for a Z test

    Lower (Left) tail test:

    =NORMSDIST(Z)

    Upper (Right) tail test:

    =1-NORMSDIST(Z)

    2-tail test:     

    =2*(1-NORMSDIST(ABS(Z))

 

for a t test

    Lower (Left) tail test AND:

    t-value is positive  =1-TDIST(ABS(t-value),n-1*,number of tails**)

    t-value is negative =TDIST(ABS(t-value),n-1,number of tails**)

    Upper (Right) tail test AND: 

    t-value is positive  =TDIST(ABS(t-value),n-1,number of tails**)

    t-value is negative =1-TDIST(ABS(t-value),n-1,number of tails**)

 

    2-tail test:  

    =TDIST(ABS(t-value),n-1, number of tails***)

    *where n  = sample size and n-1 = degrees of freedom

    ** number of tails = 1

    *** number of tails = 2

Optional Text Reading

D. Anderson, D. Sweeney, and T. Williams, “Essentials of Modern Business Statistics with MS Excel”.  3rd Edition, Thompson/Southwestern

D. Groebner, P. Shannon, P. Fry & K. Smith.  Business Statistics: A Decision Making Approach, Seventh Edition, Prentice Hall – Chaps 1, 2, 3, 6, 7, 8, and 9.

Additional text: Ken Black. Business Statistics for Contemporary Decision Making. Fourth Edition, Wiley.