Chapter 11

  • Uploaded by: Khay Ong
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Chapter 11 as PDF for free.

More details

  • Words: 1,273
  • Pages: 6
Loading documents preview...
CHAPTER 11 Chi-Square Test 11.1 Introduction 

The Chi-squared (  2 , Greek letter chi, pronounced “ki”) test looks not at an individual item of data (i.e. a single parameter) but at the whole distribution. As a result it is known as a nonparametric test or distribution -free tests. These tests do not require the population to follow the normal distribution.



These tests can be used i) to test a hypothesis about a single variance or standard deviation. ii) for tests concerning frequency distribution. iii) to test the independence of two variables.

Characteristics of Chi-square Distribution: i)

The value of  2 is never negative (i.e. always positive).

ii)

There is a family of  2 distribution. Each with a different shape, depending on the number of degrees of freedom (df).

iii)

When the number of df is small the distribution is positively skewed but as the number of degrees of df increase it becomes symmetrical and approaches the normal distribution.

The Chi-square Family of Curves.

11.2 A Goodness-of-Fit Tests Definition:

A nonparametric test involving a set of observed frequencies and a corresponding set of expected frequencies.

Purpose:

To determine if there is a statistical difference between the two sets of data, one of which is observed and the other expected. It determines whether frequencies observed for some categorical variable, could have been drawn from a hypothesized population distribution.

The null and alternative hypotheses are usually stated as: H0: The sample is from the specified population H1: The sample is not from the specified population In the goodness-of-fit tests the  2 distribution is used to determine how well an observed set of data fit an expected set of data.

The frequencies obtained from the performance of an experiment are called the observed frequency and are denoted as O. The expected frequencies, denoted by E, are the frequencies that we expect to obtain if the null hypothesis is true. The expected frequency for a category is obtained as E  np

where n is the sample size and p is the probability that an element belongs to that category if the null hypothesis is true.

 (O  E ) 2  The test statistic for a goodness-of-fit test is  2      E  where O = observed frequency for a category E = expected frequency for a category = np Remember that a chi-square goodness-of-fit test is always a right-tailed test. If the numbers of observed and expected frequencies among the categories are quite close, the resulting statistic will be small and the H0 is not rejected. If large differences exist among categories, a large statistic results and the H0 will be rejected. Thus, a chi-square goodness-of-fit test is always a right-tailed test. In goodness-of-fit test, the degrees of freedom df  k  1 where k = number of categories

Example 11.2: A bank has an ATM installed inside the bank, and it is available to its customers only from 7am to 6pm Monday through Friday. The manager of the bank wanted to investigate if the percentage of transactions made on this ATM is the same for each of the five days (Monday through Friday) of the week. She randomly selected one week and counted the number of transactions made on this ATM on each of the five days during this week. The information she obtained is given in the following table, where the number of users represents the number of transactions on this ATM on these days. For convenience, we will refer to these transactions as “people” or “users”. Day Number of users

Monday

Tuesday

Wednesday

Thursday

Friday

253

197

204

279

267

At the 1% level of significance, can we reject the null hypothesis that the proportion of people who use this ATM each of the five days of the week is the same? Assume that this week is typical of all weeks in regard to the use of this ATM. Solution: H 0 : p1  p2  p3  p4  p5  0.20

H 1 : At least two of the five proportions are not equal to 0.20 Use  2 distribution with df  5  1  4 , the critical value of  2 is 13.277.

Test statistic:  (O  E ) 2  2      23.184 E   Since 23.184 > 13.277, reject H 0 . We conclude that a higher number of users of this ATM use this machine on one or more of these days.

11.3 Tests of Independence We often have information on more than one variable for each element. Such information can be summarized and presented using a two-way classification table, which is also called a contingency table or cross-tabulation. Example: Total Enrollment at a university Full-time Male 3768 Female 4658

Part-time 2615 3717

A contingency table can be of any size. For example, it can be 2  3 , 3 2 or 3 3 . In general, the table is made up of r rows and c columns and designated as an r  c (numbers of rows by numbers of columns) table. The χ2 independence test can be used to test the independence of two variables. That is to determine whether a relationship exists between two variables. Another word, we test the null hypothesis that the two characteristics of the elements of a given population are NOT related (i.e., they are independent) against the alternative that the two characteristics are related (i.e., they are dependent). Eg:

(i) (ii) (iii)

Choice of TV program and gender. Magazines read and educational background. Years of working experience and income.

A test of independence involves a test of the null hypothesis that two characteristics of a population are not related. The degree of freedom for a test of independence is df  (r  1)(c  1) where r and c are the number of rows and the number of columns, respectively, in the given contingency table.

 (O  E ) 2  The test statistic for a test of independence is       E  where O and E are the observed and expected frequencies, respectively, for a cell. 2

Note: Row and column headings do not count in determining the number of r and c. Each block in the table is called a cell.

Example 11.4: Violence and lack of discipline have become major problems in schools in the United States. A random sample of 300 adults was selected, and these adults were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. The two-way classification of the responses of these adults is presents in the following table. Does the sample provides sufficient evidence to conclude that the two attributes, gender and opinions of adults, are dependent? Use a 1% significance level. In Favor (F) Against (A) No Opinion (N) Men (M) 93 70 12 Women (W) 87 32 6 Solution:

11.4 Tests of Homogeneity A test of homogeneity involves testing the null hypothesis that the proportions of elements with certain characteristic in two or more different populations are the same against the alternative hypothesis that these proportions are not the same. Example 11.5: Consider the data on income distributions for households in California and Wisconsin given in the table below. Using the 2.5% significance level, test the null hypothesis that the distribution of households with regards to income levels is similar (homogeneous) for two states. California Wisconsin 70 34 High income 80 40 Medium income 100 76 Low income Solution:

Related Documents

Chapter 11
January 2021 1
Chapter 11
February 2021 0
Kotler Chapter 11
February 2021 1

More Documents from "Juwita Merlinda"

Chapter 12
January 2021 1
Chapter 11
January 2021 1
Solutions Of Tutorial 11
January 2021 0
Solutions Of Tutorial 12
January 2021 0
Designing Electrical Room
January 2021 1