Loading documents preview...
SAMPLING AND SAMPLING DISTRIBUTION
SAMPLING
• For research work and for taking wise decisions in different fields of human activity and business, two ways are used to collect information namely: Survey or Census method Sampling method In the Sampling method, instead of every unit of population only a part of the population is studied and the conclusions are drawn on that basis for the entire population. But the idea of sampling is pretty old. Ex: A housewife examines few grains of boiling rice to see whether
the pot of rice is ready or not. A doctor examines few drops of blood and draws conclusion about blood constitution in the whole body. A businessman places orders of materials by examining only a small sample of the same. An interviewer may ask a few questions to find the candidate suitable for a job.
PURPOSE OF SAMPLING
• Basic objective of the study is to draw inference about the population. • Sampling is only a tool which helps to know the characteristics of the universe or population by examining only a small part of it.
• Values obtained from the study of sample, such as average and variance are known as ‘Statistic’. • Values of the population are called ‘Parameters’.
SAMPLING • Population – A group that includes all the cases (individuals, objects or groups) in which the researcher is interested. • Finite Population : All students in a College • Infinite Population : Total water in the sea or all the sand particle in sea shore. • Populations are often described by the distributions of their values, and it is common practice to refer to a population in terms of its distribution.
PRINCIPLES OF SAMPLING • 1. Principle of ‘Statistical Regularity’ and • A sample taken at random from a population is likely to have almost the same characteristics as that of the population. • 2. Principle of ‘Inertia of Large Numbers’. • Other things being equal, larger the size of the sample, more accurate the results are likely to be.
METHODS OF SAMPLING The various methods of Sampling under two separate headings: A. RANDOM B. NON-RANDOM SAMPLING METHODS SAMPLING METHODS i. Simple Random Sampling
i. Judgement Sampling
ii. Stratified Sampling
ii. Quota Sampling
iii.Systematic Sampling
iii.Convenience Sampling
iv.Multi-stage Sampling
iv.Snow Ball Sampling
SIMPLE RANDOM SAMPLING • Each and every item of the population is given equal chance of being included in the sample • Selection is free from personal bias • This method resembles lottery method where a in a system names are placed in a box, the box is shuffled, and the names of the winners are then drawn out in an unbiased manner.
STRATIFIED SAMPLING It is a two-way process. i. It requires division of the parent population into mutually exclusively and exhaustive subsets ii. A simple random sample of elements is chosen independently from each group or subset Strata are created by a divider called the stratification variable. This variable divides the population into strata based on homogeneity, heterogeneity, relatedness or cost. Sometimes, more than one variable is used for stratification purpose.
SYSTEMATIC SAMPLING • This is also another widely used type of sampling technique. • Here, if the researcher intends to choose 10 elements from a population of 100, he/she will first number the elements from one to 100. • For this, a sample of size 10 is created, (the number is 10/100 = 1/10) and then one element in 10 will be selected.
MULTI-STAGE SAMPLING • As the name suggests, sampling procedure is carried out in several stages. Ex: Suppose, we want a sample of 5000 households from the state of UP, • At first stage may be divided into a number of districts and a few districts selected at random. • At the second stage, each district may be sub-divided into number of villages and a sample of villages may be taken at random. • At the third stage, a number of households may be selected from each of the villages. • In this way, the sample size becomes smaller and smaller at each stage.
JUDGEMENT SAMPLING • The sample elements are chosen based on the judgement that prevails in the researcher’s mind about the prospective individual and so called as ‘Purposive Sampling’. • It may be possible that the researcher has ideas and insights about the respondent’s requisite experience and knowledge to offer some perspective on the research question.
QUOTA SAMPLING • Quota sampling is viewed as two-staged restricted judgemental sampling technique. • The first stage consists of developing control categories or quotas, of population elements. Control characteristics involve age, sex, and race identified on the basis of judgement. For example, the researcher may use control categories in that, he/she intends to study 40% of men and 60% of women in a population. • In the second stage, sample elements are selected based on convenience or judgement.
CONVENIENCE SAMPLING • Convenience sampling is the least expensive and least time consuming of all sampling techniques. • Convenience samples are sometimes called Accidental Samples because the elements included in the sample enter by “accident”. The selection of the respondents is left to the discretion of the interviewer. • The popular examples of convenience sampling include: (a) Respondents who gather in a church (b) Students in a class room (c) Mall intercepts interviews without qualifying the respondents for the study (d) Tear-out questionnaire included in magazines and (e) People on the street. In the above examples, the people may not be qualified respondents, however, form part of the sample by virtue of assembling in the place where the researcher is conveniently placed.
SNOW BALL SAMPLING • In this technique, an initial group of respondents is selected, usually at random. • After being interviewed, these respondents are asked to identify others who belong to the target population of interest.
SIZE OF SAMPLE • Different opinions have been expressed by experts suggesting the size of the sample to be 5% , while other suggest 10%. • So following two considerations may be kept in mind in determining the appropriate size of the sample i.e. The size of the sample should increase as the variation in the individual items increases. The greater the degree of accuracy desired, the larger should be the sample size.
MERITS & LIMITATIONS OF SAMPLING METHODS
MERITS LIMITATIONS Less time Results may be inaccurate & misleading if Less cost survey is not carefully planned &executed. More reliable In the absence of qualified & experienced results persons, the survey cannot be relied upon. More detailed At times, sampling may be complicated & information may require more time, labour & money. In some cases, each and every unit of domain study may require a complete enumeration of survey.
SAMPLE • Sample – A relatively small subset from a population. • If a population is infinite it is impossible to observe all its values, and even if it is finite it may be impractical or uneconomical to observe it in its entirety. Thus it is necessary to use a sample. • Sample: A part of population collected for investigation which needed to be representative of population and to be large enough to contain all information about population.
SAMPLING DISTRIBUTION There are three distinct types of distribution of data which are – • 1. Population Distribution, characterizes the distribution of elements of a population • 2. Sample Distribution, characterizes the distribution of elements of a sample drawn from a population • 3. Sampling Distribution, describes the expected behavior of a large number of simple random samples drawn from the same population.
SAMPLING DISTRIBUTION Sampling distributions constitute the theoretical basis of statistical inference and are of considerable importance in business decision-making. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference. • A sampling distribution is a theoretical probability distribution of a statistic obtained through a large number of samples drawn from a specific population. ( McTavish : 435)
SAMPLING DISTRIBUTION • A sampling distribution is a graph of a statistics (i.e. mean, mean absolute value of the deviation from the mean, range, standard deviation of the sample, unbiased estimate of variance, variance of the sample) for sample data. • Sampling distribution is a theoretical distribution of an infinite number of sample means of equal size taken from a population . ( Walsh : 95)
CENTRAL LIMIT THEOREM (CLT) • First introduced by De Moivre in the eighteenth century and happens to be the most important theorem in statistics • According to this theorem, if large number of simple random samples are selected from any population and determined the mean of each sample, the distribution of these samples means will tend to be described by the normal probability distribution with the mean μ and variance.
Central Limit Theorem……. • Sampling distribution of sample means approaches to a normal distribution, irrespective of the distribution of population from where it is taken. • The Central Limit Theorem (CLT) is a statistical theory states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population.
• The central limit theorem tells us exactly what the shape of the distribution of means will be when we draw repeated samples from a given population. Specifically, as the sample sizes get larger, the distribution of means calculated from repeated sampling will approach normality. • The central limit theorem says that this sampling distribution is approximately normal - commonly known as a Bell Curve. This approximation improves as we increase the size of the simple random samples that are used to produce the sampling distribution. • This population is not normally distributed, but the Central Limit Theorem will apply if n > 30.
Central limit theorem is applicable for a sufficiently large sample sizes (n≥30). The formula for central limit theorem can be stated as follows:
and
Where, μ = Population mean σ = Population standard deviation μ x¯ = Sample mean σ x¯ = Sample standard deviation n = Sample size
Solved Examples: Question 1:The record of weights of male population follows normal distribution. Its mean and standard deviation are 70 kg and 15 kg respectively. If a researcher considers the records of 50 males, then what would be the mean and standard deviation of the chosen sample? Solution: Mean of the population μ = 70 kg Standard deviation of the population = 15 kg sample size n = 50 Mean of the sample is given by: μx¯ = 70 kg Standard deviation of the sample is given by: σx¯ = σ/√n σx¯ = 15/√ 50 σx¯ = 2.121 = 2.1 kg (approx)
Example: A population in which the population mean is 75 with a standard deviation of 8 (Assuming sample size to be n =10). The mean of the sample means is 75 and the standard deviation of the sample means is 2.5, with the standard deviation of the sample means computed as follows:
Central limit theorem is a concept of probability.
Example : In a survey of a company, mean salary of employees is 29321 dollars with SD of 2120 dollars. Consider the sample of 100 employees and find the probability if their mean salary will be less than 29000 dollars. Solution: Total number of employees (n) = 100, Mean (μ) = 29321 standard deviation (σ) = 2120 Substitute all the values in z-formula i.e. z = (x¯−μ)/(σ/√n) Z = (29,000−29,321)/(2,120/√100)= -321/212= -1.51 Using z-table, we found -1.51 has an area of 93.45%. Since we have to find result for "less than", so minus 93.45 from 100 to get required result. => 100 - 93.45 = 0.07 Hence the probability of employees having mean salary less than 29000 dollars is 0.07%.
Problem 1: The average GPA scored by a class is 4.91 and standard deviation is 0.72 For a sample of 20 students, find the z-score that the average is above 5. Solution: Average, μ = 4.91 Standard deviation, σ = 0.72 Sample size, n = 20 Z-score = (x¯−μ)/(σ/√n)
= 5−4.91/0.72√20 = 0.09/0.161 = 0.559
Hence, the Z-score is 0.559.
Problem 2: The mean salary of all employees in a company is 3578, and standard deviation is 1980. Find the z-score for the mean of a sample of 18 employees to be less than 3000. Solution: Average, μ = 3578 Standard deviation, σ = 1980 Sample size, n = 18 Z-score = (x¯−μ)/(σ/√n) = 3000−3578/198018 = 1.2385 Hence, the Z-score is 1.2385.
Problem 3: The average score of a subject is 2.89 for the whole class, with a standard deviation of 0.63. If a sample of 255 students is being taken, then find the probability of getting the average of this sample to be more than 3. Solution: Average, μ = 2.89 Std. Dev, σ = 0.63 Sample size, n = 25 Z-score = (x¯−μ)/(σ/√n) = 3−2.89/0.63/√25= 0.126 Hence, the Z-score is 0.126 Looking the z-score in normal curve table, the probability is found to be 0.8078. Hence, probability = 11 - 0.8078 = 0.1922.
THANK YOU
RANDOM SAMPLING • Simple Random Sample – A sample designed in such a way as to ensure that (1) every member of the population has an equal chance of being chosen and (2) every combination of N members has an equal chance of being chosen. • This can be done using a computer, calculator, or a table of random numbers
Population inferences can be made...
...by selecting a representative sample from the population
RANDOM SAMPLING • Systematic random sampling – A method of sampling in which every Kth member (K is a ration obtained by dividing the population size by the desired sample size) in the total population is chosen for inclusion in the sample after the first member of the sample is selected at random from among the first K members of the population.
SYSTEMATIC RANDOM SAMPLING
STRATIFIED RANDOM SAMPLING • Proportionate stratified sample – The size of the sample selected from each subgroup is proportional to the size of that subgroup in the entire population. (Self weighting) • Disproportionate stratified sample – The size of the sample selected from each subgroup is disproportional to the size of that subgroup in the population. (needs weights)
STRATIFIED RANDOM SAMPLING • Stratified random sample – A method of sampling obtained by (1) dividing the population into subgroups based on one or more variables central to our analysis and (2) then drawing a simple random sample from each of the subgroups
DISPROPORTIONATE STRATIFIED SAMPLE
AIMS OF SAMPLING • Reduces cost of research (e.g. political polls) • Generalize about a larger population (e.g., benefits of sampling city r/t neighborhood) • In some cases (e.g. industrial production) analysis may be destructive, so sampling is needed
PROBABILITY • Probability: what is the chance that a given event will occur? • Probability is expressed in numbers between 0 and 1. Probability = 0 means the event never happens; probability = 1 means it always happens. • The total probability of all possible event always sums to 1.
Probability distributions: Permutations What is the probability distribution of number of girls in families with two children? 2 GG 1 BG 1 GB 0 BB
0.6
Probability Distribution of Number of Girls
0.5 0.4 0.3 0.2 0.1 0 0
1
2
How about family of three? Num. Girls 0 1 1 1 2 2 2 3
child #1 B B B G B G G G
child #2 B B G B G B G G
child #3 B G B B G G B G
Probability distribution of number of girls 0.5
0.4
0.3
0.2
0.1
0 0
1
2
3
How about a family of 10? 0.3 0.25 0.2 0.15 0.1 0.05 0 0
1
2
3
4
5
6
7
8
9
10
As family size increases, the binomial distribution looks more and more normal.
0.0
1.0
2.0
Number of Successes
3.0
-0
1
2
3
4
5
6
Number of Successes
7
8
9
10
Normal distribution Same shape, if you adjusted the scales
B A
C
SAMPLING DISTRIBUTION OF THE MEAN • An example • A die is thrown infinitely many times. Let X represent the number of spots showing on any throw. • The probability distribution of X is x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1(1/6) + 2(1/6) + 3(1/6)+ ………………….= 3.5 V(X) = (1-3.5)2(1/6) + (2-3.5)2(1/6) + ……………. …= 2.92