Maths Made Easy By Ashish Pandey

  • Uploaded by: Ashish Pandey
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Maths Made Easy By Ashish Pandey as PDF for free.

More details

  • Words: 114,699
  • Pages: 242
Loading documents preview...
Maths made EASY

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Mon, 27 Sep 2010 12:14:40 UTC

Contents Articles ARITHMETIC MEAN

1

Arithmetic mean

1

Statistics

3

Mathematics

12

Median

24

Mean

29

Statistical population

36

Sampling (statistics)

37

Probability theory

50

Normal distribution

56

Standard deviation

76

Random variable

89

Probability distribution

95

Real number

99

Variance

105

Probability density function

115

Cumulative distribution function

120

Expected value

124

Discrete probability distribution

131

Continuous probability distribution

133

Probability mass function

134

Continuous function

135

Measure (mathematics)

144

Bias of an estimator

149

Probability

152

Pierre-Simon Laplace

158

Integral

171

Function (mathematics)

192

Calculus

213

Average

226

References Article Sources and Contributors

231

Image Sources, Licenses and Contributors

237

Article Licenses License

239

1

ARITHMETIC MEAN Arithmetic mean In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space. The term "arithmetic mean" is preferred in mathematics and statistics because it helps distinguish it from other averages such as the geometric and harmonic mean. In addition to mathematics and statistics, the arithmetic mean is used frequently in fields such as economics, sociology, and history, though it is used in almost every academic field to some extent. For example, per capita GDP gives an approximation of the arithmetic average income of a nation's population. While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers. Notably, for skewed distributions, the arithmetic mean may not accord with one's notion of "middle", and robust statistics such as the median may be a better description of central tendency.

Definition Suppose we have sample space

. Then the arithmetic mean

is defined via the equation

. If the list is a statistical population, then the mean of that population is called a population mean. If the list is a statistical sample, we call the resulting statistic a sample mean.

Motivating properties The arithmetic mean has several properties that make it useful, especially as a measure of central tendency. These include: • If numbers

have mean X, then

. Since

is the

distance from a given number to the mean, one way to interpret this property is as saying that the numbers to the left of the mean are balanced by the numbers to the right of the mean. The mean is the only single number for which the residuals defined this way sum to zero. • If it is required to use a single number X as an estimate for the value of numbers , then the arithmetic mean does this best, in the sense of minimizing the sum of squares (xi − X)2 of the residuals. (It follows that the mean is also the best single predictor in the sense of having the lowest root mean squared error.) • For a normal distribution, the arithmetic mean is equal to both the median and the mode, other measures of central tendency.

Problems The arithmetic mean may be misinterpreted as the median to imply that most values are higher or lower than is actually the case. If elements in the sample space increase arithmetically, when placed in some order, then the median and arithmetic average are equal. For example, consider the sample space {1,2,3,4}. The average is 2.5, as is the median. However, when we consider a sample space that cannot be arranged into an arithmetic progression, such as {1,2,4,8,16}, the median and arithmetic average can differ significantly. In this case the arithmetic average is 6.2

Arithmetic mean

2

and the median is 4. When one looks at the arithmetic average of a sample space, one must note that the average value can vary significantly from most values in the sample space. There are applications of this phenomenon in fields such as economics. For example, since the 1980s in the United States median income has increased more slowly than the arithmetic average of income. Ben Bernanke, has speculated that the difference can be accounted for through technology, and less so via the decline in labour unions and other factors.[1]

Angles Particular care must be taken when using cyclic data such as phases or angles. Naïvely taking the arithmetic mean of 1° and 359° yields a result of 180°. This is incorrect for two reasons: • Firstly, angle measurements are only defined up to a factor of 360° (or 2π, if measuring in radians). Thus one could as easily call these 1° and −1°, or 1° and 719° – each of which gives a different average. • Secondly, in this situation, 0° (equivalently, 360°) is geometrically a better average value: there is lower dispersion about it (the points are both 1° from it, and 179° from 180°, the putative average). In general application such an oversight will lead to the average value artificially moving towards the middle of the numerical range. A solution to this problem is to use the optimization formulation (viz, define the mean as the central point: the point about which one has the lowest dispersion), and redefine the difference as a modular distance (i.e., the distance on the circle: so the modular distance between 1° and 359° is 2°, not 358°).

See also • • • • • • • • •

Assumed mean Average Central tendency Empirical measure Fréchet mean Generalized mean Geometric mean Inequality of arithmetic and geometric means Mean

• • • • • • • •

Median Mode Muirhead's inequality Sample mean and covariance Sample size Standard deviation Summary statistics Variance

Further reading • Darrell Huff, How to lie with statistics, Victor Gollancz, 1954 (ISBN 0-393-31072-8).

External links • Calculations and comparisons between arithmetic and geometric mean of two numbers [2] • Mean or Average [3]

References [1] Ben S. Bernanke. "The Level and Distribution of Economic Well-Being" (http:/ / www. federalreserve. gov/ newsevents/ speech/ bernanke20070206a. htm). . Retrieved 23 July 2010. [2] http:/ / www. sengpielaudio. com/ calculator-geommean. htm [3] http:/ / people. revoledu. com/ kardi/ tutorial/ BasicMath/ Average/ index. html

Statistics

Statistics Statistics is the science of the collection, organization, and interpretation of data.[1] [2] It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments.[1] A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful application of statistical analysis. Such people have often gained this experience through working in any of a wide number of fields. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject. The word statistics can either be singular or plural.[3] When it refers to the discipline, "statistics" is singular, as in "Statistics is an art." When it refers to quantities (such as mean and median) calculated from a set of data,[4] statistics is plural, as in "These statistics are misleading."

Scope Statistics is considered by some to be a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data,[5] while others consider it a branch of mathematics[6] concerned with collecting and interpreting data.[7] Because of its empirical roots and its focus on applications, statistics is usually considered to be a distinct mathematical science rather than a branch of mathematics.[8] [9] Statisticians improve the quality of More probability density will be found the closer one gets to the expected (mean) value in data with the design of experiments a normal distribution. Statistics used in standardized testing assessment are shown. The and survey sampling. Statistics also scales include standard deviations, cumulative percentages, percentile equivalents, provides tools for prediction and Z-scores, T-scores, standard nines, and percentages in standard nines. forecasting using data and statistical models. Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences, government, and business. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. This is useful in research, when communicating the results of experiments. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw inferences about the process or population being studied; this is called inferential statistics. Inference is a vital element of scientific advance, since it provides a prediction (based in data) for where a theory logically leads. To further prove the guiding theory, these predictions are tested as well, as part of the scientific method. If the inference holds true, then the descriptive statistics of the new data increase the soundness of that hypothesis. Descriptive statistics and inferential statistics (a.k.a., predictive statistics) together comprise applied statistics.[10]

3

Statistics

History Some scholars pinpoint the origin of statistics to 1663, with the publication of Natural and Political Observations upon the Bills of Mortality by John Graunt.[11] Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data, hence its stat- etymology. The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and the natural and social sciences. Its mathematical foundations were laid in the 17th century with the development of probability theory by Blaise Pascal and Pierre de Fermat. Probability theory arose from the study of games of chance. The method of least squares was first described by Carl Friedrich Gauss around 1794. The use of modern computers has expedited large-scale statistical computation, and has also made possible new methods that are impractical to perform manually.

Overview In applying statistics to a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". A population can also be composed of observations of a process at various times, with the data from each observation serving as a different member of the overall group. Data collected about this kind of "population" constitutes what is called a time series. For practical reasons, a chosen subset of the population called a sample is studied — as opposed to compiling data about the entire group (an operation called census). Once a sample that is representative of the population is determined, data is collected for the sample members in an observational or experimental setting. This data can then be subjected to statistical analysis, serving two related purposes: description and inference. • Descriptive statistics summarize the population data by describing what was observed in the sample numerically or graphically. Numerical descriptors include mean and standard deviation for continuous data types (like heights or weights), while frequency and percentage are more useful in terms of describing categorical data (like race). • Inferential statistics uses patterns in the sample data to draw inferences about the population represented, accounting for randomness. These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation), modeling relationships within the data (regression), extrapolation, interpolation, or other modeling techniques like ANOVA, time series, and data mining. “... it is only the manipulation of uncertainty that interests us. We are not concerned with the matter that is uncertain. Thus we do not study the mechanism of rain; only whether it will rain.” Dennis Lindley, "The Philosophy of Statistics", The Statistician (2000).

The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if they were connected. For example, a study of annual income that also looks at age of death might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to immediately infer the existence of a causal relationship between the two variables. (See Correlation does not imply causation.) For a sample to be used as a guide to an entire population, it is important that it is truly a representative of that overall population. Representative sampling assures that the inferences and conclusions can be safely extended from the sample to the population as a whole. A major problem lies in determining the extent to which the sample chosen is actually representative. Statistics offers methods to estimate and correct for any random trending within the sample

4

Statistics and data collection procedures. There are also methods for designing experiments that can lessen these issues at the outset of a study, strengthening its capability to discern truths about the population. Statisticians describe stronger methods as more "robust".(See experimental design.) Randomness is studied using the mathematical discipline of probability theory. Probability is used in "Mathematical statistics" (alternatively, "statistical theory") to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures. The use of any statistical method is valid when the system or population under consideration satisfies the assumptions of the method. Misuse of statistics can produce subtle, but serious errors in description and interpretation — subtle in the sense that even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper use of statistics. Even when statistics are correctly applied, the results can be difficult to interpret for those lacking expertise. The statistical significance of a trend in the data — which measures the extent to which a trend could be caused by random variation in the sample — may or may not agree with an intuitive sense of its significance. The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy.

Statistical methods Experimental and observational studies A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on dependent variables or response. There are two major types of causal statistical studies: experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types lies in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead, data are gathered and correlations between predictors and response are investigated. Experiments The basic steps of a statistical experiment are: 1. Planning the research, including finding the number of replicates of the study, using the following information: preliminary estimates regarding the size of treatment effects, alternative hypotheses, and the estimated experimental variability. Consideration of the selection of experimental subjects and the ethics of research is necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard treatment or control, to allow an unbiased estimate of the difference in treatment effects. 2. Design of experiments, using blocking to reduce the influence of confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At this stage, the experimenters and statisticians write the experimental protocol that shall guide the performance of the experiment and that specifies the primary analysis of the experimental data. 3. Performing the experiment following the experimental protocol and analyzing the data following the experimental protocol. 4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study. 5. Documenting and presenting the results of the study. Experiments on human behavior have special concerns. The famous Hawthorne study examined changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in

5

Statistics determining whether increased illumination would increase the productivity of the assembly line workers. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected productivity. It turned out that productivity indeed improved (under the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a control group and blindness. The Hawthorne effect refers to finding that an outcome (in this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed. Observational study An example of an observational study is one that explores the correlation between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a case-control study, and then look for the number of cases of lung cancer in each group.

Levels of measurement There are four main levels of measurement used in statistics: • nominal, • ordinal, • interval, and • ratio. They have different degrees of usefulness in statistical research. Ratio measurements have both a meaningful zero value and the distances between different measurements defined; they provide the greatest flexibility in statistical methods that can be used for analyzing the data. Interval measurements have meaningful distances between measurements defined, but the zero value is arbitrary (as in the case with longitude and temperature measurements in Celsius or Fahrenheit). Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values. Nominal measurements have no meaningful rank order among values. Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative or continuous variables due to their numerical nature.

Key terms used in statistics Null hypothesis Interpretation of statistical information can often involve the development of a null hypothesis in that the assumption is that whatever is proposed as a cause has no effect on the variable being measured. The best illustration for a novice is the predicament encountered by a jury trial. The null hypothesis, H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained unless H1 is supported by evidence  “beyond a reasonable doubt”. However,  “failure to reject H0” in this case does not imply innocence, but merely that the evidence was insufficient to convict. So the jury does not necessarily accept H0 but fails to reject H0. While to the casual observer the difference appears moot, misunderstanding the difference is one of the most common and arguably most serious errors made by non-statisticians. Failure to reject the H0 does NOT prove that the H0 is true, as any crook with a good lawyer who gets off because of insufficient evidence can attest to. While one can not  “prove” a null hypothesis one can test how close it is to being true with a power test, which tests for type II errors.

6

Statistics Error Working from a null hypothesis two basic forms of error are recognised: • Type I errors where the null hypothesis is falsely rejected giving a "false positive". • Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is missed. Error also refers to the extent to which individual observations in a sample differ from a central value, such as the sample or population mean. Many statistical methods seek to minimize the mean-squared error, and these are called "methods of least squares." Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. Confidence intervals Most studies will only sample part of a population and then the result is used to interpret the null hypothesis in the context of the whole population. Any estimates obtained from the sample only approximate the population value. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Often they are expressed as 95% confidence intervals. Formally, a 95% confidence interval of a procedure is a range where, if the sampling an analysis were repeated under the same conditions, the interval would include the true (population) value 95% of the time. This does not imply that the probability that the true value is in the confidence interval is 95%. One quantity that is a probability for an estimated value is the credible interval from Bayesian statistics. Significance Statistics rarely give a simple Yes/No type answer to the question asked of them. Interpretation often comes down to the level of statistical significance applied to the numbers and often refer to the probability of a value accurately rejecting the null hypothesis (sometimes referred to as the p-value). Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug will be unlikely to help the patient in a noticeable way.

Examples Some well-known statistical tests and procedures are: • • • • • • • • • • •

Analysis of variance (ANOVA) Chi-square test Correlation Factor analysis Mann–Whitney U Mean square weighted deviation (MSWD) Pearson product-moment correlation coefficient Regression analysis Spearman's rank correlation coefficient Student's t-test Time series analysis

7

Statistics

Specialized disciplines Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include: • • • • • • • • • • • • • •

Actuarial science Applied information economics Biostatistics Business statistics Chemometrics (for analysis of data from chemistry) Data mining (applying statistics and pattern recognition to discover knowledge from data) Demography Econometrics Energy statistics Engineering statistics Epidemiology Geography and Geographic Information Systems, specifically in Spatial analysis Image processing Psychological statistics

• Reliability engineering • Social statistics In addition, there are particular types of statistical analysis that have also developed their own specialised terminology and methodology: • • • • • •

Bootstrap & Jackknife Resampling Statistical classification Statistical surveys Structured data analysis (statistics) Survival analysis Statistics in various sports, particularly baseball and cricket

Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles, it is a key tool, and perhaps the only reliable tool.

8

Statistics

Statistical computing The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models. Increased computing power has also led to the growing popularity of computationally intensive methods based on resampling, such as permutation tests and the bootstrap, gretl, an example of an open source statistical package while techniques such as Gibbs sampling have made use of Bayesian models more feasible. The computer revolution has implications for the future of statistics with new emphasis on "experimental" and "empirical" statistics. A large number of both general and special purpose statistical software are now available.

Misuse There is a general perception that statistical knowledge is all-too-frequently intentionally misused by finding ways to interpret only the data that are favorable to the presenter. The famous saying, "There are three kinds of lies: lies, damned lies, and statistics".[12] which was popularized in the USA by Samuel Clemens and incorrectly attributed by him to Disraeli (1804–1881), has come to represent the general mistrust [and misunderstanding] of statistical science. Harvard President Lawrence Lowell wrote in 1909 that statistics, "...like veal pies, are good if you know the person that made them, and are sure of the ingredients." If various studies appear to contradict one another, then the public may come to distrust such studies. For example, one study may suggest that a given diet or activity raises blood pressure, while another may suggest that it lowers blood pressure. The discrepancy can arise from subtle variations in experimental design, such as differences in the patient groups or research protocols, which are not easily understood by the non-expert. (Media reports usually omit this vital contextual information entirely, because of its complexity.) By choosing (or rejecting, or modifying) a certain sample, results can be manipulated. Such manipulations need not be malicious or devious; they can arise from unintentional biases of the researcher. The graphs used to summarize data can also be misleading. Deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis (the null hypothesis) to be "favored," and can also seem to exaggerate the importance of minor differences in large studies. A difference that is highly statistically significant can still be of no practical significance. (See criticism of hypothesis testing and controversy over the null hypothesis.) One response is by giving a greater emphasis on the p-value than simply reporting whether a hypothesis is rejected at the given level of significance. The p-value, however, does not indicate the size of the effect. Another increasingly common approach is to report confidence intervals. Although these are produced from the same calculations as those of hypothesis tests or p-values, they describe both the size of the effect and the uncertainty surrounding it.

9

Statistics

10

Statistics applied to mathematics or the arts Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was "required learning" in most sciences. This has changed with use of statistics in non-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically. Initially derided by some mathematical purists, it is now considered essential methodology in certain areas. • In number theory, scatter plots of data generated by a distribution function may be transformed with familiar tools used in statistics to reveal underlying patterns, which may then lead to hypotheses. • Methods of statistics including predictive methods in forecasting, are combined with chaos theory and fractal geometry to create video works that are considered to have great beauty. • The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were artistically revealed. With the advent of computers, methods of statistics were applied to formalize such distribution driven natural processes, in order to make and analyze moving video art. • Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process that only works some of the time, the occasion of which can be predicted using statistical methodology. • Statistics is used to predicatively create art, as in applications of statistical mechanics with the statistical or stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of artistry does not always come out as expected, it does behave within a range predictable using statistics.

See also •

Glossary of probability and statistics



Forecasting



Index of statistics articles



Foundations of statistics



List of academic statistical associations



Multivariate statistics



List of national and international statistical services •

Official statistics



List of important publications in statistics



Regression analysis



List of statistical packages (software)



Statistical consultants



Notation in probability and statistics



Statistician, List of statisticians



Structural equation modeling

• Statistical literacy • Statistical modeling

Related disciplines • • • • • • • •

Biostatistics Computational biology Computational sociology Network biology Social science Sociology Positivism Social research

Statistics

References • Best, Joel (2001). Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists. University of California Press. ISBN 0-520-21978-3. • Desrosières, Alain (2004). The Politics of Large Numbers: A History of Statistical Reasoning. Trans. Camille Naish. Harvard University Press. ISBN 0-674-68932-1. • Hacking, Ian (1990). The Taming of Chance. Cambridge University Press. ISBN 0-521-38884-8. • Lindley, D.V. (1985). Making Decisions (2nd ed. ed.). John Wiley & Sons. ISBN 0-471-90808-8. • Tijms, Henk (2004). Understanding Probability: Chance Rules in Everyday life. Cambridge University Press. ISBN 0-521-83329-9.

External links Online non-commercial textbooks • "A New View of Statistics" [13], by Will G. Hopkins, AUT University • "NIST/SEMATECH e-Handbook of Statistical Methods" [14], by U.S. National Institute of Standards and Technology and SEMATECH • "Online Statistics: An Interactive Multimedia Course of Study" [15], by David Lane, Joan Lu, Camille Peres, Emily Zitek, et al. • "The Little Handbook of Statistical Practice" [16], by Gerard E. Dallal [17], Tufts University • "StatSoft Electronic Textbook" [18], by StatSoft [19]

Other non-commercial resources • • • • • • • • •

Statistics [20] (OECD) Probability Web [21] (Carleton College) Free online statistics course with interactive practice exercises [22] (Carnegie Mellon University) Resources for Teaching and Learning about Probability and Statistics [23] (ERIC) Rice Virtual Lab in Statistics [24] (Rice University) Statistical Science Web [25] (University of Melbourne) Applied statistics applets [26] Statlib: data and software archives [27] StatProb [28] – peer reviewed Statistics and probability Wikipedia, Sponsored by a Collaborative of Statistics and Probability Societies[29]

References [1] [2] [3] [4] [5] [6] [7] [8]

Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 The Free Online Dictionary (http:/ / www. thefreedictionary. com/ dict. asp?Word=statistics) "Statistics" (http:/ / www. merriam-webster. com/ dictionary/ statistics). Merriam-Webster Online Dictionary. . "Statistic" (http:/ / www. merriam-webster. com/ dictionary/ statistic). Merriam-Webster Online Dictionary. . Moses, Lincoln E. Think and Explain with statistics, pp. 1–3. Addison-Wesley, 1986. Hays, William Lee, Statistics for the social sciences, Holt, Rinehart and Winston, 1973, p.xii, ISBN 978-0-03-077945-9 Statistics at Encyclopedia of Mathematics (http:/ / us. oocities. com/ mathfair2002/ school/ plans. htm) Moore, David (1992). "Teaching Statistics as a Respectable Subject". Statistics for the Twenty-First Century. Washington, DC: The Mathematical Association of America. pp. 14–25. [9] Chance, Beth L.; Rossman, Allan J. (2005). "Preface" (http:/ / www. rossmanchance. com/ iscam/ preface. pdf). Investigating Statistical Concepts, Applications, and Methods. Duxbury Press. ISBN 978-0495050643. . [10] Anderson, , D.R.; Sweeney, D.J.; Williams, T.A.. Statistics: Concepts and Applications, pp. 5–9. West Publishing Company, 1986. [11] Willcox, Walter (1938) The Founder of Statistics. (http:/ / www. jstor. org/ stable/ 1400906) Review of the International Statistical Institute 5(4):321–328.

11

Statistics

12

[12] Leonard H.Courtney (1832–1918) in a speech at Saratoga Springs’, New York, August 1895, in which this sentence appeared: ‘After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, “Lies – damned lies – and statistics,” still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.’, earliest documented use of exact phrase. [13] http:/ / sportsci. org/ resource/ stats/ [14] http:/ / www. itl. nist. gov/ div898/ handbook/ [15] http:/ / onlinestatbook. com/ index. html [16] http:/ / www. StatisticalPractice. com [17] http:/ / www. tufts. edu/ ~gdallal/ [18] http:/ / www. statsoft. com/ textbook/ stathome. html [19] http:/ / www. statsoft. com/ index. htm [20] http:/ / stats. oecd. org/ Index. aspx [21] http:/ / www. mathcs. carleton. edu/ probweb/ probweb. html [22] http:/ / oli. web. cmu. edu/ openlearning/ forstudents/ freecourses/ statistics [23] http:/ / www. ericdigests. org/ 2000-2/ resources. htm [24] http:/ / www. onlinestatbook. com/ rvls. html [25] http:/ / www. statsci. org [26] http:/ / www. mbhs. edu/ ~steind00/ statistics. html [27] http:/ / lib. stat. cmu. edu/ [28] http:/ / statprob. com/ encyclopedia [29] http:/ / statprob. com/ ?op=about

Mathematics Mathematics is the study of quantity, structure, space, and change. Mathematicians seek out patterns,[2] [3] formulate new conjectures, and establish truth by rigorous deduction from appropriately chosen axioms and definitions.[4] There is debate over whether mathematical objects such as numbers and points exist naturally or are human creations. The mathematician Benjamin Peirce called mathematics "the science that draws necessary conclusions".[5] Albert Einstein, on the other hand, stated that "as far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."[6]

Euclid, Greek mathematician, 3rd century BC, as imagined by Raphael in this detail from The [1] School of Athens.

Through the use of abstraction and logical reasoning, mathematics evolved from counting, calculation, measurement, and the systematic study of the shapes and motions of physical objects. Practical mathematics has been a human activity for as far back as written records exist. Rigorous arguments first appeared in Greek mathematics, most notably in Euclid's Elements. Mathematics continued to develop, for example in China in 300 BC, in India in AD 100, and in the Muslim world in AD 800, until the Renaissance, when mathematical innovations interacting with new scientific discoveries led to a rapid increase in the rate of mathematical discovery that continues to the present day.[7] Mathematics is used throughout the world as an essential tool in many fields, including natural science, engineering, medicine, and the social sciences. Applied mathematics, the branch of mathematics concerned with application of mathematical knowledge to other fields, inspires and makes use of new mathematical discoveries and sometimes leads to the development of entirely new mathematical disciplines, such as statistics and game theory. Mathematicians also engage in pure mathematics, or mathematics for its own sake, without having any application in mind, although practical applications for what began as pure mathematics are often discovered.[8]

Mathematics

13

Etymology The word "mathematics" comes from the Greek μάθημα (máthēma), which means learning, study, science, and additionally came to have the narrower and more technical meaning "mathematical study", even in Classical times.[9] Its adjective is μαθηματικός (mathēmatikós), related to learning, or studious, which likewise further came to mean mathematical. In particular, μαθηματικὴ τέχνη (mathēmatikḗ tékhnē), Latin: ars mathematica, meant the mathematical art. The apparent plural form in English, like the French plural form les mathématiques (and the less commonly used singular derivative la mathématique), goes back to the Latin neuter plural mathematica (Cicero), based on the Greek plural τα μαθηματικά (ta mathēmatiká), used by Aristotle, and meaning roughly "all things mathematical"; although it is plausible that English borrowed only the adjective mathematic(al) and formed the noun mathematics anew, after the pattern of physics and metaphysics, which were inherited from the Greek.[10] In English, the noun mathematics takes singular verb forms. It is often shortened to maths or, in English-speaking North America, math.

History The evolution of mathematics might be seen as an ever-increasing series of abstractions, or alternatively an expansion of subject matter. The first abstraction, which is shared by many animals,[11] was probably that of numbers: the realization that a collection of two apples and a collection of two oranges (for example) have something in common, namely quantity of their members. In addition to recognizing how to count physical objects, prehistoric peoples also recognized how to count abstract quantities, like time – days, seasons, years.[12] Elementary arithmetic (addition, subtraction, multiplication and division) naturally followed.

Pythagoras (c.570-c.495 BC) has commonly been given credit for discovering the Pythagorean theorem. Well-known figures in Greek mathematics also include Euclid, Archimedes, and Thales.

Since numeracy pre-dated writing, further steps were needed for recording numbers such as tallies or the knotted strings called quipu used by the Inca to store numerical data. Numeral systems have been many and diverse, with the first known written numerals created by Egyptians in Middle Kingdom texts such as the Rhind Mathematical Papyrus.

Mathematics

The earliest uses of mathematics were in trading, land measurement, painting and weaving patterns and the recording of time. More complex mathematics did not appear until around 3000 BC, when the Babylonians and Egyptians began using arithmetic, algebra and geometry for taxation and other financial calculations, for building and construction, and for astronomy.[13] The systematic study of mathematics in its own right began with the Ancient Greeks between 600 and 300 BC.[14] Mathematics has since been greatly extended, and there has been a fruitful interaction between mathematics and science, to the benefit of both. Mathematical discoveries continue to be made today. According to Mikhail B. Sevryuk, in the January 2006 issue of the Bulletin of the American Mathematical Society, "The number of papers and books Mayan numerals included in the Mathematical Reviews database since 1940 (the first year of operation of MR) is now more than 1.9 million, and more than 75 thousand items are added to the database each year. The overwhelming majority of works in this ocean contain new mathematical theorems and their proofs."[15]

Inspiration, pure and applied mathematics, and aesthetics Mathematics arises from many different kinds of problems. At first these were found in commerce, land measurement, architecture and later astronomy; nowadays, all sciences suggest problems studied by mathematicians, and many problems arise within mathematics itself. For example, the physicist Richard Feynman invented the path integral formulation of quantum mechanics using a combination of mathematical reasoning and physical insight, and today's string theory, a still-developing scientific theory which attempts to unify the four fundamental forces of nature, continues to inspire new mathematics.[16] Some mathematics is only relevant in the area that inspired it, and is applied to solve further problems in that area. But often mathematics inspired by one area proves useful in many areas, and joins the general stock of mathematical concepts. A distinction is often made between pure mathematics and applied mathematics. However pure mathematics topics often turn out to have applications, e.g. number theory in cryptography. This remarkable fact that even the "purest" Sir Isaac Newton (1643-1727), an inventor of mathematics often turns out to have practical applications is what infinitesimal calculus. Eugene Wigner has called "the unreasonable effectiveness of [17] mathematics". As in most areas of study, the explosion of knowledge in the scientific age has led to specialization: there are now hundreds of specialized areas in mathematics and the latest Mathematics Subject Classification runs to 46 pages.[18] Several areas of applied mathematics have merged with related traditions outside of mathematics and become disciplines in their own right, including statistics, operations research, and computer science. For those who are mathematically inclined, there is often a definite aesthetic aspect to much of mathematics. Many mathematicians talk about the elegance of mathematics, its intrinsic aesthetics and inner beauty. Simplicity and generality are valued. There is beauty in a simple and elegant proof, such as Euclid's proof that there are infinitely

14

Mathematics

15

many prime numbers, and in an elegant numerical method that speeds calculation, such as the fast Fourier transform. G. H. Hardy in A Mathematician's Apology expressed the belief that these aesthetic considerations are, in themselves, sufficient to justify the study of pure mathematics. He identified criteria such as significance, unexpectedness, inevitability, and economy as factors that contribute to a mathematical aesthetic.[19] Mathematicians often strive to find proofs of theorems that are particularly elegant, a quest Paul Erdős often referred to as finding proofs from "The Book" in which God had written down his favorite proofs.[20] [21] The popularity of recreational mathematics is another sign of the pleasure many find in solving mathematical questions.

Notation, language, and rigor Most of the mathematical notation in use today was not invented until the 16th century.[22] Before that, mathematics was written out in words, a painstaking process that limited mathematical discovery.[23] Euler (1707–1783) was responsible for many of the notations in use today. Modern notation makes mathematics much easier for the professional, but beginners often find it daunting. It is extremely compressed: a few symbols contain a great deal of information. Like musical notation, modern mathematical notation has a strict syntax (which to a limited extent varies from author to author and from discipline to discipline) and encodes information that would be difficult to write in any other way. Mathematical language can also be hard for beginners. Words such as or and only have more precise meanings than in everyday speech. Moreover, words such as open and field have been given specialized Leonhard Euler, who created and popularized mathematical meanings. Mathematical jargon includes technical terms much of the mathematical notation used today such as homeomorphism and integrable. But there is a reason for special notation and technical jargon: mathematics requires more precision than everyday speech. Mathematicians refer to this precision of language and logic as "rigor". Mathematical proof is fundamentally a matter of rigor. Mathematicians want their theorems to follow from axioms by means of systematic reasoning. This is to avoid mistaken "theorems", based on fallible intuitions, of which many instances have occurred in the history of the subject.[24] The level of rigor expected in mathematics has varied over time: the Greeks expected detailed arguments, but at the time of Isaac Newton the methods employed were less rigorous. Problems inherent in the definitions used by Newton would lead to a resurgence of careful analysis and formal proof in the 19th century. Misunderstanding the rigor is a cause for some of the common misconceptions of mathematics. Today, mathematicians continue to argue among themselves about computer-assisted proofs. Since large computations are hard to verify, such proofs may not be sufficiently rigorous.[25] The infinity symbol ∞ in several typefaces.

Axioms in traditional thought were "self-evident truths", but that conception is problematic. At a formal level, an axiom is just a string of symbols, which has an intrinsic meaning only in the context of all derivable formulas of an axiomatic system. It was the goal of Hilbert's program to put all of mathematics on a firm axiomatic basis, but according to Gödel's incompleteness theorem every (sufficiently powerful) axiomatic system has undecidable formulas; and so a final axiomatization of mathematics is impossible.

Mathematics Nonetheless mathematics is often imagined to be (as far as its formal content) nothing but set theory in some axiomatization, in the sense that every mathematical statement or proof could be cast into formulas within set theory.[26]

Mathematics as science Carl Friedrich Gauss referred to mathematics as "the Queen of the Sciences".[28] In the original Latin Regina Scientiarum, as well as in German Königin der Wissenschaften, the word corresponding to science means (field of) knowledge. Indeed, this is also the original meaning in English, and there is no doubt that mathematics is in this sense a science. The specialization restricting the meaning to natural science is of later date. If one considers science to be strictly about the physical world, then mathematics, or at least pure mathematics, is not a science. Albert Einstein stated that "as far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."[6] Many philosophers believe that mathematics is not experimentally falsifiable, and thus not a science according to the definition of Karl Popper.[29] However, in the 1930s important work in mathematical Carl Friedrich Gauss, himself known as the logic convinced many mathematicians that mathematics cannot be [27] "prince of mathematicians", referred to reduced to logic alone, and Karl Popper concluded that "most mathematics as "the Queen of the Sciences". mathematical theories are, like those of physics and biology, hypothetico-deductive: pure mathematics therefore turns out to be much closer to the natural sciences whose hypotheses are conjectures, than it seemed even recently."[30] Other thinkers, notably Imre Lakatos, have applied a version of falsificationism to mathematics itself. An alternative view is that certain scientific fields (such as theoretical physics) are mathematics with axioms that are intended to correspond to reality. In fact, the theoretical physicist, J. M. Ziman, proposed that science is public knowledge and thus includes mathematics.[31] In any case, mathematics shares much in common with many fields in the physical sciences, notably the exploration of the logical consequences of assumptions. Intuition and experimentation also play a role in the formulation of conjectures in both mathematics and the (other) sciences. Experimental mathematics continues to grow in importance within mathematics, and computation and simulation are playing an increasing role in both the sciences and mathematics, weakening the objection that mathematics does not use the scientific method. In his 2002 book A New Kind of Science, Stephen Wolfram argues that computational mathematics deserves to be explored empirically as a scientific field in its own right. The opinions of mathematicians on this matter are varied. Many mathematicians feel that to call their area a science is to downplay the importance of its aesthetic side, and its history in the traditional seven liberal arts; others feel that to ignore its connection to the sciences is to turn a blind eye to the fact that the interface between mathematics and its applications in science and engineering has driven much development in mathematics. One way this difference of viewpoint plays out is in the philosophical debate as to whether mathematics is created (as in art) or discovered (as in science). It is common to see universities divided into sections that include a division of Science and Mathematics, indicating that the fields are seen as being allied but that they do not coincide. In practice, mathematicians are typically grouped with scientists at the gross level but separated at finer levels. This is one of many issues considered in the philosophy of mathematics. Mathematical awards are generally kept separate from their equivalents in science. The most prestigious award in mathematics is the Fields Medal,[32] [33] established in 1936 and now awarded every 4 years. It is often considered

16

Mathematics

17

the equivalent of science's Nobel Prizes. The Wolf Prize in Mathematics, instituted in 1978, recognizes lifetime achievement, and another major international award, the Abel Prize, was introduced in 2003. These are awarded for a particular body of work, which may be innovation, or resolution of an outstanding problem in an established field. A famous list of 23 such open problems, called "Hilbert's problems", was compiled in 1900 by German mathematician David Hilbert. This list achieved great celebrity among mathematicians, and at least nine of the problems have now been solved. A new list of seven important problems, titled the "Millennium Prize Problems", was published in 2000. Solution of each of these problems carries a $1 million reward, and only one (the Riemann hypothesis) is duplicated in Hilbert's problems.

Fields of mathematics Mathematics can, broadly speaking, be subdivided into the study of quantity, structure, space, and change (i.e. arithmetic, algebra, geometry, and analysis). In addition to these main concerns, there are also subdivisions dedicated to exploring links from the heart of mathematics to other fields: to logic, to set theory (foundations), to the empirical mathematics of the various sciences (applied mathematics), and more recently to the rigorous study of uncertainty. An abacus, a simple calculating tool used since ancient times.

Quantity

The study of quantity starts with numbers, first the familiar natural numbers and integers ("whole numbers") and arithmetical operations on them, which are characterized in arithmetic. The deeper properties of integers are studied in number theory, from which come such popular results as Fermat's Last Theorem. Number theory also holds two problems widely considered to be unsolved: the twin prime conjecture and Goldbach's conjecture. As the number system is further developed, the integers are recognized as a subset of the rational numbers ("fractions"). These, in turn, are contained within the real numbers, which are used to represent continuous quantities. Real numbers are generalized to complex numbers. These are the first steps of a hierarchy of numbers that goes on to include quarternions and octonions. Consideration of the natural numbers also leads to the transfinite numbers, which formalize the concept of "infinity". Another area of study is size, which leads to the cardinal numbers and then to another conception of infinity: the aleph numbers, which allow meaningful comparison of the size of infinitely large sets.

Natural numbers

Integers

Rational numbers

Real numbers

Complex numbers

Structure Many mathematical objects, such as sets of numbers and functions, exhibit internal structure as a consequence of operations or relations that are defined on the set. Mathematics then studies properties of those sets that can be expressed in terms of that structure; for instance number theory studies properties of the set of integers that can be expressed in terms of arithmetic operations. Moreover, it frequently happens that different such structured sets (or structures) exhibit similar properties, which makes it possible, by a further step of abstraction, to state axioms for a class of structures, and then study at once the whole class of structures satisfying these axioms. Thus one can study groups, rings, fields and other abstract systems; together such studies (for structures defined by algebraic operations) constitute the domain of abstract algebra. By its great generality, abstract algebra can often be applied to seemingly unrelated problems; for instance a number of ancient problems concerning compass and straightedge constructions

Mathematics

18

were finally solved using Galois theory, which involves field theory and group theory. Another example of an algebraic theory is linear algebra, which is the general study of vector spaces, whose elements called vectors have both quantity and direction, and can be used to model (relations between) points in space. This is one example of the phenomenon that the originally unrelated areas of geometry and algebra have very strong interactions in modern mathematics. Combinatorics studies ways of enumerating the number of objects that fit a given structure.

Combinatorics

Number theory

Group theory

Graph theory

Order theory

Space The study of space originates with geometry – in particular, Euclidean geometry. Trigonometry is the branch of mathematics that deals with relationships between the sides and the angles of triangles and with the trigonometric functions; it combines space and numbers, and encompasses the well-known Pythagorean theorem. The modern study of space generalizes these ideas to include higher-dimensional geometry, non-Euclidean geometries (which play a central role in general relativity) and topology. Quantity and space both play a role in analytic geometry, differential geometry, and algebraic geometry. Within differential geometry are the concepts of fiber bundles and calculus on manifolds, in particular, vector and tensor calculus. Within algebraic geometry is the description of geometric objects as solution sets of polynomial equations, combining the concepts of quantity and space, and also the study of topological groups, which combine structure and space. Lie groups are used to study space, structure, and change. Topology in all its many ramifications may have been the greatest growth area in 20th century mathematics; it includes point-set topology, set-theoretic topology, algebraic topology and differential topology. In particular, instances of modern day topology are metrizability theory, axiomatic set theory, homotopy theory, and Morse theory. Topology also includes the now solved Poincaré conjecture and the controversial four color theorem, whose only proof, by computer, has never been verified by a human.

Geometry

Trigonometry

Differential geometry

Topology

Fractal geometry

Measure Theory

Change Understanding and describing change is a common theme in the natural sciences, and calculus was developed as a powerful tool to investigate it. Functions arise here, as a central concept describing a changing quantity. The rigorous study of real numbers and functions of a real variable is known as real analysis, with complex analysis the equivalent field for the complex numbers. Functional analysis focuses attention on (typically infinite-dimensional) spaces of functions. One of many applications of functional analysis is quantum mechanics. Many problems lead naturally to relationships between a quantity and its rate of change, and these are studied as differential equations. Many phenomena in nature can be described by dynamical systems; chaos theory makes precise the ways in which many of these systems exhibit unpredictable yet still deterministic behavior.

Mathematics

19

Calculus

Vector calculus

Differential equations

Dynamical systems

Chaos theory

Complex analysis

Foundations and philosophy In order to clarify the foundations of mathematics, the fields of mathematical logic and set theory were developed. Mathematical logic includes the mathematical study of logic and the applications of formal logic to other areas of mathematics; set theory is the branch of mathematics that studies sets or collections of objects. Category theory, which deals in an abstract way with mathematical structures and relationships between them, is still in development. The phrase "crisis of foundations" describes the search for a rigorous foundation for mathematics that took place from approximately 1900 to 1930.[34] Some disagreement about the foundations of mathematics continues to present day. The crisis of foundations was stimulated by a number of controversies at the time, including the controversy over Cantor's set theory and the Brouwer-Hilbert controversy. Mathematical logic is concerned with setting mathematics within a rigorous axiomatic framework, and studying the implications of such a framework. As such, it is home to Gödel's incompleteness theorems which (informally) imply that any formal system that contains basic arithmetic, if sound (meaning that all theorems that can be proven are true), is necessarily incomplete (meaning that there are true theorems which cannot be proved in that system). Whatever finite collection of number-theoretical axioms is taken as a foundation, Gödel showed how to construct a formal statement that is a true number-theoretical fact, but which does not follow from those axioms. Therefore no formal system is a complete axiomatization of full number theory. Modern logic is divided into recursion theory, model theory, and proof theory, and is closely linked to theoretical computer science.

Mathematical logic

Set theory

Category theory

Theoretical computer science Theoretical computer science includes computability theory, computational complexity theory, and information theory. Computability theory examines the limitations of various theoretical models of the computer, including the most powerful known model – the Turing machine. Complexity theory is the study of tractability by computer; some problems, although theoretically solvable by computer, are so expensive in terms of time or space that solving them is likely to remain practically unfeasible, even with rapid advance of computer hardware. A famous problem is the "P=NP?" problem, one of the Millennium Prize Problems.[35] Finally, information theory is concerned with the amount of data that can be stored on a given medium, and hence deals with concepts such as compression and entropy.

Theory of computation

Cryptography

Mathematics

20

Applied mathematics Applied mathematics considers the use of abstract mathematical tools in solving concrete problems in the sciences, business, and other areas. Applied mathematics has significant overlap with the discipline of statistics, whose theory is formulated mathematically, especially with probability theory. Statisticians (working as part of a research project) "create data that makes sense" with random sampling and with randomized experiments; the design of a statistical sample or experiment specifies the analysis of the data (before the data be available). When reconsidering data from experiments and samples or when analyzing data from observational studies, statisticians "make sense of the data" using the art of modelling and the theory of inference – with model selection and estimation; the estimated models and consequential predictions should be tested on new data.[36] Computational mathematics proposes and studies methods for solving mathematical problems that are typically too large for human numerical capacity. Numerical analysis studies methods for problems in analysis using ideas of functional analysis and techniques of approximation theory; numerical analysis includes the study of approximation and discretization broadly with special concern for rounding errors. Other areas of computational mathematics include computer algebra and symbolic computation.

Mathematical physics

Fluid dynamics

Numerical analysis

Optimization (mathematics)Optimization

Probability theory

Statistics

Financial mathematics

Game theory

Mathematical biology

Mathematical chemistry

Mathematical economics

Control theory

Mathematics

See also • • • • • • • • • • • • • • • •

Definitions of mathematics Dyscalculia Iatromathematicians Logics Mathematical anxiety Mathematical game Mathematical model Mathematical problem Mathematical structure Mathematics and art Mathematics competitions Mathematics education Mathematics portal Pattern Philosophy of mathematics Pseudomathematics

References • Benson, Donald C., The Moment of Proof: Mathematical Epiphanies, Oxford University Press, USA; New Ed edition (December 14, 2000). ISBN 0-19-513919-4. • Boyer, Carl B., A History of Mathematics, Wiley; 2 edition (March 6, 1991). ISBN 0-471-54397-7. — A concise history of mathematics from the Concept of Number to contemporary Mathematics. • Courant, R. and H. Robbins, What Is Mathematics? : An Elementary Approach to Ideas and Methods, Oxford University Press, USA; 2 edition (July 18, 1996). ISBN 0-19-510519-2. • Davis, Philip J. and Hersh, Reuben, The Mathematical Experience. Mariner Books; Reprint edition (January 14, 1999). ISBN 0-395-92968-7. — A gentle introduction to the world of mathematics. • Einstein, Albert (1923). Sidelights on Relativity (Geometry and Experience). P. Dutton., Co. • Eves, Howard, An Introduction to the History of Mathematics, Sixth Edition, Saunders, 1990, ISBN 0-03-029558-0. • Gullberg, Jan, Mathematics — From the Birth of Numbers. W. W. Norton & Company; 1st edition (October 1997). ISBN 0-393-04002-X. — An encyclopedic overview of mathematics presented in clear, simple language. • Hazewinkel, Michiel (ed.), Encyclopaedia of Mathematics. Kluwer Academic Publishers 2000. — A translated and expanded version of a Soviet mathematics encyclopedia, in ten (expensive) volumes, the most complete and authoritative work available. Also in paperback and on CD-ROM, and online [37]. • Jourdain, Philip E. B., The Nature of Mathematics, in The World of Mathematics, James R. Newman, editor, Dover Publications, 2003, ISBN 0-486-43268-8. • Kline, Morris, Mathematical Thought from Ancient to Modern Times, Oxford University Press, USA; Paperback edition (March 1, 1990). ISBN 0-19-506135-7. • Monastyrsky, Michael (2001) (PDF). Some Trends in Modern Mathematics and the Fields Medal [38]. Canadian Mathematical Society. Retrieved 2006-07-28. • Oxford English Dictionary, second edition, ed. John Simpson and Edmund Weiner, Clarendon Press, 1989, ISBN 0-19-861186-2. • The Oxford Dictionary of English Etymology, 1983 reprint. ISBN 0-19-861112-9. • Pappas, Theoni, The Joy Of Mathematics, Wide World Publishing; Revised edition (June 1989). ISBN 0-933174-65-9.

21

Mathematics • Peirce, Benjamin (1882). "Linear Associative Algebra" [39]. American Journal of Mathematics (Vol. 4, No. 1/4. (1881).. • Peterson, Ivars, Mathematical Tourist, New and Updated Snapshots of Modern Mathematics, Owl Books, 2001, ISBN 0-8050-7159-8. • Paulos, John Allen (1996). A Mathematician Reads the Newspaper. Anchor. ISBN 0-385-48254-X. • Popper, Karl R. (1995). "On knowledge". In Search of a Better World: Lectures and Essays from Thirty Years. Routledge. ISBN 0-415-13548-6. • Riehm, Carl (August 2002). "The Early History of the Fields Medal" [40] (PDF). Notices of the AMS (AMS) 49 (7): 778–782. • Sevryuk, Mikhail B. (January 2006). "Book Reviews" [41] (PDF). Bulletin of the American Mathematical Society 43 (1): 101–109. doi:10.1090/S0273-0979-05-01069-4. Retrieved 2006-06-24. • Waltershausen, Wolfgang Sartorius von (1856, repr. 1965). Gauss zum Gedächtniss [42]. Sändig Reprint Verlag H. R. Wohlwend. ISBN 3-253-01702-8. • Ziman, J.M., F.R.S. (1968). Public Knowledge:An essay concerning the social dimension of science [43].

External links • Free Mathematics books [44] Free Mathematics books collection. • Encyclopaedia of Mathematics online encyclopaedia from Springer [45], Graduate-level reference work with over 8,000 entries, illuminating nearly 50,000 notions in mathematics. • HyperMath site at Georgia State University [46] • FreeScience Library [47] The mathematics section of FreeScience library • Rusin, Dave: The Mathematical Atlas [48]. A guided tour through the various branches of modern mathematics. (Can also be found at NIU.edu [49].) • Polyanin, Andrei: EqWorld: The World of Mathematical Equations [50]. An online resource focusing on algebraic, ordinary differential, partial differential (mathematical physics), integral, and other mathematical equations. • Cain, George: Online Mathematics Textbooks [51] available free online. • Tricki [52], Wiki-style site that is intended to develop into a large store of useful mathematical problem-solving techniques. • Mathematical Structures [53], list information about classes of mathematical structures. • Math & Logic: The history of formal mathematical, logical, linguistic and methodological ideas. [54] In The Dictionary of the History of Ideas. • Mathematician Biographies [55]. The MacTutor History of Mathematics archive Extensive history and quotes from all famous mathematicians. • Metamath [56]. A site and a language, that formalize mathematics from its foundations. • Nrich [57], a prize-winning site for students from age five from Cambridge University • Open Problem Garden [58], a wiki of open problems in mathematics • Planet Math [59]. An online mathematics encyclopedia under construction, focusing on modern mathematics. Uses the Attribution-ShareAlike license, allowing article exchange with Wikipedia. Uses TeX markup. • Some mathematics applets, at MIT [60] • Weisstein, Eric et al.: MathWorld: World of Mathematics [61]. An online encyclopedia of mathematics. • Patrick Jones' Video Tutorials [62] on Mathematics • Citizendium: Theory (mathematics) [63].

22

Mathematics

References [1] No likeness or description of Euclid's physical appearance made during his lifetime survived antiquity. Therefore, Euclid's depiction in works of art depends on the artist's imagination (see Euclid). [2] Steen, L.A. (April 29, 1988). The Science of Patterns. Science, 240: 611–616. and summarized at Association for Supervision and Curriculum Development. (http:/ / www. ascd. org/ portal/ site/ ascd/ template. chapter/ menuitem. 1889bf0176da7573127855b3e3108a0c/ ?chapterMgmtId=f97433df69abb010VgnVCM1000003d01a8c0RCRD), ascd.org [3] Devlin, Keith, Mathematics: The Science of Patterns: The Search for Order in Life, Mind and the Universe (Scientific American Paperback Library) 1996, ISBN 978-0-7167-5047-5 [4] Jourdain. [5] Peirce, p. 97. [6] Einstein, p. 28. The quote is Einstein's answer to the question: "how can it be that mathematics, being after all a product of human thought which is independent of experience, is so admirably appropriate to the objects of reality?" He, too, is concerned with The Unreasonable Effectiveness of Mathematics in the Natural Sciences. [7] Eves [8] Peterson [9] Both senses can be found in Plato. Liddell and Scott, s.voceμαθηματικός [10] The Oxford Dictionary of English Etymology, Oxford English Dictionary, sub "mathematics", "mathematic", "mathematics" [11] S. Dehaene; G. Dehaene-Lambertz; L. Cohen (Aug 1998). "Abstract representations of numbers in the animal and human brain". Trends in Neuroscience 21 (8): 355–361. doi:10.1016/S0166-2236(98)01263-6. [12] See, for example, Raymond L. Wilder, Evolution of Mathematical Concepts; an Elementary Study, passim [13] Kline 1990, Chapter 1. [14] " A History of Greek Mathematics: From Thales to Euclid (http:/ / books. google. com/ books?id=drnY3Vjix3kC& pg=PA1& dq& hl=en#v=onepage& q=& f=false)". Thomas Little Heath (1981). ISBN 0-486-24073-8 [15] Sevryuk [16] Johnson, Gerald W.; Lapidus, Michel L. (2002). The Feynman Integral and Feynman's Operational Calculus. Oxford University Press. ISBN 0821824139. [17] Eugene Wigner, 1960, " The Unreasonable Effectiveness of Mathematics in the Natural Sciences, (http:/ / www. dartmouth. edu/ ~matc/ MathDrama/ reading/ Wigner. html)" Communications on Pure and Applied Mathematics 13(1): 1–14. [18] Mathematics Subject Classification 2010 (http:/ / www. ams. org/ mathscinet/ msc/ pdfs/ classification2010. pdf) [19] Hardy, G. H. (1940). A Mathematician's Apology. Cambridge University Press. ISBN 0521427061. [20] Gold, Bonnie; Simons, Rogers A. (2008). Proof and Other Dilemmas: Mathematics and Philosophy. MAA. [21] Aigner, Martin; Ziegler, Gunter M. (2001). Proofs from the Book. Springer. ISBN 3540404600. [22] Earliest Uses of Various Mathematical Symbols (http:/ / jeff560. tripod. com/ mathsym. html) (Contains many further references). [23] Kline, p. 140, on Diophantus; p.261, on Vieta. [24] See false proof for simple examples of what can go wrong in a formal proof. The history of the Four Color Theorem contains examples of false proofs accidentally accepted by other mathematicians at the time. [25] Ivars Peterson, The Mathematical Tourist, Freeman, 1988, ISBN 0-7167-1953-3. p. 4 "A few complain that the computer program can't be verified properly", (in reference to the Haken-Apple proof of the Four Color Theorem). [26] Patrick Suppes, Axiomatic Set Theory, Dover, 1972, ISBN 0-486-61630-4. p. 1, "Among the many branches of modern mathematics set theory occupies a unique place: with a few rare exceptions the entities which are studied and analyzed in mathematics may be regarded as certain particular sets or classes of objects." [27] Zeidler, Eberhard (2004). Oxford User's Guide to Mathematics. Oxford, UK: Oxford University Press. p. 1188. ISBN 0198507631. [28] Waltershausen [29] Shasha, Dennis Elliot; Lazere, Cathy A. (1998). Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists. Springer. p. 228. [30] Popper 1995, p. 56 [31] Ziman [32] "The Fields Medal is now indisputably the best known and most influential award in mathematics." Monastyrsky [33] Riehm [34] Luke Howard Hodgkin & Luke Hodgkin, A History of Mathematics, Oxford University Press, 2005. [35] Clay Mathematics Institute (http:/ / www. claymath. org/ millennium/ P_vs_NP/ ), P=NP, claymath.org [36] Like other mathematical sciences such as physics and computer science, statistics is an autonomous discipline rather than a branch of applied mathematics. Like research physicists and computer scientists, research statisticians are mathematical scientists. Many statisticians have a degree in mathematics, and some statisticians are also mathematicians. [37] http:/ / eom. springer. de/ default. htm [38] http:/ / www. fields. utoronto. ca/ aboutus/ FieldsMedal_Monastyrsky. pdf [39] http:/ / books. google. com/ ?id=De0GAAAAYAAJ& pg=PA1& dq=Peirce+ Benjamin+ Linear+ Associative+ Algebra+ & q= [40] http:/ / www. ams. org/ notices/ 200207/ comm-riehm. pdf

23

Mathematics [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62]

http:/ / www. ams. org/ bull/ 2006-43-01/ S0273-0979-05-01069-4/ S0273-0979-05-01069-4. pdf http:/ / www. amazon. de/ Gauss-Ged%e4chtnis-Wolfgang-Sartorius-Waltershausen/ dp/ 3253017028 http:/ / info. med. yale. edu/ therarad/ summers/ ziman. htm http:/ / freebookcentre. net/ SpecialCat/ Free-Mathematics-Books-Download. html http:/ / eom. springer. de http:/ / hyperphysics. phy-astr. gsu. edu/ Hbase/ hmat. html http:/ / www. freescience. info/ mathematics. php http:/ / www. math-atlas. org/ http:/ / www. math. niu. edu/ ~rusin/ known-math/ index/ index. html http:/ / eqworld. ipmnet. ru/ http:/ / www. math. gatech. edu/ ~cain/ textbooks/ onlinebooks. html http:/ / www. tricki. org/ http:/ / math. chapman. edu/ cgi-bin/ structures?HomePage http:/ / etext. lib. virginia. edu/ DicHist/ analytic/ anaVII. html http:/ / www-history. mcs. st-and. ac. uk/ ~history/ http:/ / metamath. org/ http:/ / www. nrich. maths. org/ public/ index. php http:/ / garden. irmacs. sfu. ca http:/ / planetmath. org/ http:/ / www-math. mit. edu/ daimp http:/ / www. mathworld. com/ http:/ / www. youtube. com/ user/ patrickJMT

[63] http:/ / en. citizendium. org/ wiki/ Theory_(mathematics)

Median In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.[1] [2] In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size) and, if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample is the medoid. At most half the population have values less than the median and at most half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c, i.e. it is (b + c)/2. The median can be used as a measure of location when a distribution is skewed, when end values are not known, or when one requires reduced importance to be attached to outliers, e.g. because they may be measurement errors. A disadvantage of the median is the difficulty of handling it theoretically.

24

Median

25

Notation The median of some variable x is denoted either as

or as

[3]

Measures of statistical dispersion When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation. Since the median is the same as the second quartile, its calculation is illustrated in the article on quartiles.

Medians of probability distributions For any probability distribution on the real line with cumulative distribution function F, regardless of whether it is any kind of continuous probability distribution, in particular an absolutely continuous distribution (and therefore has a probability density function), or a discrete probability distribution, a median m satisfies the inequalities

or

in which a Lebesgue–Stieltjes integral is used. For an absolutely continuous probability distribution with probability density function ƒ, we have

Medians of particular distributions The medians of certain types of distributions can be easily calculated from their parameters: The median of a normal distribution with mean μ and variance σ2 is μ. In fact, for a normal distribution, mean = median = mode. The median of a uniform distribution in the interval [a, b] is (a + b) / 2, which is also the mean. The median of a Cauchy distribution with location parameter x0 and scale parameter y is x0, the location parameter. The median of an exponential distribution with rate parameter λ is the natural logarithm of 2 divided by the rate parameter: λ−1ln 2. The median of a Weibull distribution with shape parameter k and scale parameter λ is λ(ln 2)1/k.

Medians in descriptive statistics The median is primarily used for skewed distributions, which it summarizes differently than the arithmetic mean. Consider the multiset { 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, as is the mode, and it might be seen as a better indication of central tendency than the arithmetic mean of 4. Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

Median

Theoretical properties An optimality property A median is also a central point which minimizes the average of the absolute deviations. In the above example, the median value of 2 minimizes the average of the absolute deviations (1 + 0 + 0 + 0 + 1 + 12) / 6 = 2.33; in contrast, the mean value of 4 minimizes the average of the squares (9 + 4 + 4 + 4 + 1 + 100) / 6 = 20.33. In the language of statistics, a value of c that minimizes

is a median of the probability distribution of the random variable X. However, a median c need not be uniquely defined. Where exactly one median exists, statisticians speak of "the median" correctly; even when no unique median exists, some statisticians speak of "the median" informally.

An inequality relating means and medians For continuous probability distributions, the difference between the median and the mean is less than or equal to one standard deviation. See an inequality on location and scale parameters.

The sample median Efficient computation of the sample median Even though sorting n items generally requires O(n log n) operations, the median of n items can be computed with only O(n) operations. In fact, one can always find the kth smallest of n items with a O(n)-operations selection algorithm.

Easy explanation of the sample median For an odd number of values As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7. Start by sorting the values: 1, 2, 5, 7, 8. In this case, the median is 5 since it is the middle observation in the ordered list. For an even number of values As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7, 2. Start by sorting the values: 1, 2, 2, 5, 7, 8. In this case, the average of the two middlemost terms is (2 + 5)/2 = 3.5. Therefore, the median is 3.5 since it is the average of the middle observations in the ordered list.

Other estimates of the median If data are represented by a statistical model specifying a particular family of probability distributions, then estimates of the median can be obtained by fitting that family of probability distributions to the data and calculating the theoretical median of the fitted distribution. See, for example Pareto interpolation.

26

Median

27

Median-unbiased estimators, and bias with respect to loss functions Any mean-unbiased estimator minimizes the risk (expected loss) with respect to the squared-error loss function, as observed by Gauss. A median-unbiased estimator minimizes the risk with respect to the absolute-deviation loss function, as observed by Laplace. Other loss functions are used in statistical theory, particularly in robust statistics. The theory of median-unbiased estimators was revived by George W. Brown [4] in 1947: An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if for fixed θ, the median of the distribution of the estimate is at the value θ, i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. [page 584] Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. Besides being invariant under one-to-one transformations, median-unbiased estimators have surprising robustness.

In image processing In monochrome raster images there is a type of noise, known as the salt and pepper noise, when each pixel independently become black (with some small probability) or white (with some small probability), and is unchanged otherwise (with the probability close to 1). An image constructed of median values of neighborhoods (like 3×3 square) can effectively reduce noise in this case.

In multidimensional statistical inference In multidimensional statistical inference, the value this case

that minimizes

is also called a centroid.[5] In

is indicating a norm for the vector difference, such as the Euclidean norm, rather than the

one-dimensional case's use of an absolute value. (Note that in some other contexts a centroid is more like a multidimensional mean than the multidimensional median described here.) Like a centroid, a medoid minimizes , but is restricted to be a member of specified set. For instance, the set could be a sample of points drawn from some distribution.

History Gustav Fechner popularized the median into the formal analysis of data, although it had been used previously by Laplace.[6]

See also • • • • • •

Order statistic Quantile A median is the 2nd quartile, 5th decile, and 50th percentile. A sample-median is median-unbiased but can be a mean-biased estimator. Absolute deviation Concentration of measure for Lipschitz functions

• An inequality on location and scale parameters • Median voter theory • Median graph • The centerpoint is a generalization of the median for data in higher dimensions.

Median • Median search • Hinges

References [1] http:/ / mathworld. wolfram. com/ StatisticalMedian. html Weisstein, Eric W. "Statistical Median." From MathWorld--A Wolfram Web Resource. [2] http:/ / www. stat. psu. edu/ old_resources/ ClassNotes/ ljs_07/ sld008. htm Simon, Laura J "Descriptive statistics" Statistical Education Resource Kit Penn State Department of Statistics [3] http:/ / mathworld. wolfram. com/ StatisticalMedian. html [4] http:/ / www. universityofcalifornia. edu/ senate/ inmemoriam/ georgewbrown. htm [5] Carvalho, Luis; Lawrence, Charles (2008), "Centroid estimation in discrete high-dimensional spaces with applications in biology", Proc Natl Acad Sci U S A 105 (9): 3209–3214, doi:10.1073/pnas.0712329105 [6] Keynes, John Maynard; A Treatise on Probability (1921), Pt II Ch XVII §5 (p 201).

• Brown, George W. (http://www.universityofcalifornia.edu/senate/inmemoriam/georgewbrown.htm) ”On Small-Sample Estimation.” The Annals of Mathematical Statistics, Vol. 18, No. 4 (Dec., 1947), pp. 582–585. • Lehmann, E. L. “A General Concept of Unbiasedness” The Annals of Mathematical Statistics, Vol. 22, No. 4 (Dec., 1951), pp. 587–592. • Allan Birnbaum. 1961. “A Unified Theory of Estimation, I”, The Annals of Mathematical Statistics, Vol. 32, No. 1 (Mar., 1961), pp. 112–135 • van der Vaart, H. R. 1961. “Some Extensions of the Idea of Bias” The Annals of Mathematical Statistics, Vol. 32, No. 2 (Jun., 1961), pp. 436–447. • Pfanzagl, Johann; with the assistance of R. Hamböker (1994). Parametric Statistical Theory. Walter de Gruyter. ISBN 3-11-01-3863-8. MR1291393

External links • A Guide to Understanding & Calculating the Median (http://stats4students.com/ measures-of-central-tendency-2.php) • Median as a weighted arithmetic mean of all Sample Observations (http://www.accessecon.com/pubs/EB/ 2004/Volume3/EB-04C10011A.pdf) • On-line calculator (http://www.poorcity.richcity.org/cgi-bin/inequality.cgi) • Calculating the median (http://www.statcan.ca/english/edu/power/ch11/median/median.htm) • A problem involving the mean, the median, and the mode. (http://mathschallenge.net/index. php?section=problems&show=true&titleid=average_problem) • mathworld: Statistical Median (http://mathworld.wolfram.com/StatisticalMedian.html) • Python script (http://www.poorcity.richcity.org/oei/#GiniHooverTheil) for Median computations and income inequality metrics This article incorporates material from Median of a distribution on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.

28

Mean

Mean In statistics, mean has two related meanings: • the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). • the expected value of a random variable, which is also called the population mean. There are other statistical measures that use samples that some people confuse with averages - including 'median' and 'mode'. Other simple statistical analyses use measures of spread, such as range, interquartile range, or standard deviation. For a real-valued random variable X, the mean is the expectation of X. Note that not every probability distribution has a defined mean (or variance); see the Cauchy distribution for an example. For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1, x2, ..., xn is typically denoted by , pronounced "x bar". This mean is a type of arithmetic mean. If the data set was based on a series of observations obtained by sampling a statistical population, this mean is termed the "sample mean" to distinguish it from the "population mean". The mean is often quoted along with the standard deviation: the mean describes the central location of the data, and the standard deviation describes the spread. An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less mathematically tractable. If a series of observations is sampled from a larger population (measuring the heights of a sample of adults drawn from the entire world population, for example), or from a probability distribution which gives the probabilities of each possible result, then the larger population or probability distribution can be used to construct a "population mean", which is also the expected value for a sample drawn from this population or probability distribution. For a finite population, this would simply be the arithmetic mean of the given property for every member of the population. For a probability distribution, this would be a sum or integral over every possible value weighted by the probability of that value. It is a universal convention to represent the population mean by the symbol .[1] In the case of a discrete probability distribution, the mean of a discrete random variable x is given by taking the product of each possible value of x and its probability P(x), and then adding all these products together, giving .[2] The sample mean may differ from the population mean, especially for small samples, but the law of large numbers dictates that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.[3] As well as statistics, means are often used in geometry and analysis; a wide range of means have been developed for these purposes, which are not much used in statistics. These are listed below.

Examples of means Arithmetic mean (AM) The arithmetic mean is the "standard" average, often simply called the "mean".

The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode). For example, mean income is skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income, and favors the larger number of people with lower incomes. The median or mode are often more intuitive measures of

29

Mean such data. Nevertheless, many skewed distributions are best described by their mean – such as the exponential and Poisson distributions. For example, the arithmetic mean of six values: 34, 27, 45, 55, 22, 34 is

Geometric mean (GM) The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth.

For example, the geometric mean of six values: 34, 27, 45, 55, 22, 34 is:

Harmonic mean (HM) The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time).

For example, the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34 is

Relationship between AM, GM, and HM AM, GM, and HM satisfy these inequalities:

Equality holds only when all the elements of the given sample are equal.

Generalized means Power mean The generalized mean, also known as the power mean or Hölder mean, is an abstraction of the quadratic, arithmetic, geometric and harmonic means. It is defined for a set of n positive numbers xi by

By choosing the appropriate value for the parameter m we get

30

Mean

31

maximum, quadratic mean, arithmetic mean, geometric mean, harmonic mean, minimum.

ƒ-mean This can be generalized further as the generalized f-mean

and again a suitable choice of an invertible ƒ will give harmonic mean, power mean, geometric mean.

Weighted arithmetic mean The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population with different sample sizes:

The weights represent the bounds of the partial sample. In other applications they represent a measure for the reliability of the influence upon the mean by respective values.

Truncated mean Sometimes a set of numbers might contain outliers, i.e. a datum which is much lower or much higher than the others. Often, outliers are erroneous data caused by artifacts. In this case one can use a truncated mean. It involves discarding given parts of the data at the top or the bottom end, typically an equal amount at each end, and then taking the arithmetic mean of the remaining data. The number of values removed is indicated as a percentage of total number of values.

Mean

Interquartile mean The interquartile mean is a specific example of a truncated mean. It is simply the arithmetic mean after removing the lowest and the highest quarter of values.

assuming the values have been ordered, so is simply a specific example of a weighted mean for a specific set of weights.

Mean of a function In calculus, and especially multivariable calculus, the mean of a function is loosely defined as the average value of the function over its domain. In one variable, the mean of a function f(x) over the interval (a,b) is defined by

(See also mean value theorem.) In several variables, the mean over a relatively compact domain U in a Euclidean space is defined by

This generalizes the arithmetic mean. On the other hand, it is also possible to generalize the geometric mean to functions by defining the geometric mean of f to be

More generally, in measure theory and probability theory either sort of mean plays an important role. In this context, Jensen's inequality places sharp estimates on the relationship between these two different notions of the mean of a function. There is also a harmonic average of functions and a quadratic average (or root mean square) of functions.

Mean of a Probability Distribution See expected value

Mean of angles Most of the usual means fail on circular quantities, like angles, daytimes, fractional parts of real numbers. For those quantities you need a mean of circular quantities.

Fréchet mean The Fréchet mean gives a manner for determining the "center" of a mass distribution on a surface or, more generally, Riemannian manifold. Unlike many other means, the Fréchet mean is defined on a space whose elements cannot necessarily be added together or multiplied by scalars. It is sometimes also known as the Karcher mean.

32

Mean

33

Other means • • • • • • • • • • • • • • • •

Arithmetic-geometric mean Arithmetic-harmonic mean Cesàro mean Chisini mean Contraharmonic mean Elementary symmetric mean Geometric-harmonic mean Heinz mean Heronian mean Identric mean Lehmer mean Logarithmic mean Median Moving average Root mean square Stolarsky mean

• Weighted geometric mean • Weighted harmonic mean • Rényi's entropy (a generalized f-mean)

Properties All means share some properties and additional properties are shared by the most common means. Some of these properties are collected here.

Weighted mean A weighted mean M is a function which maps tuples of positive numbers to a positive number

such that the following properties hold: • "Fixed point": M(1,1,...,1) = 1 • Homogeneity: M(λ x1, ..., λ xn) = λ M(x1, ..., xn) for all λ and xi. In vector notation: M(λ x) = λ Mx for all n-vectors x. • Monotonicity: If xi ≤ yi for each i, then Mx ≤ My It follows • Boundedness: min x ≤ Mx ≤ max x • Continuity: • There are means which are not differentiable. For instance, the maximum number of a tuple is considered a mean (as an extreme case of the power mean, or as a special case of a median), but is not differentiable. • All means listed above, with the exception of most of the Generalized f-means, satisfy the presented properties. • If f is bijective, then the generalized f-mean satisfies the fixed point property. • If f is strictly monotonic, then the generalized f-mean satisfy also the monotony property. • In general a generalized f-mean will miss homogeneity. The above properties imply techniques to construct more complex means: If C, M1, ..., Mm are weighted means and p is a positive real number, then A and B defined by

Mean

34

are also weighted means.

Unweighted mean Intuitively spoken, an unweighted mean is a weighted mean with equal weights. Since our definition of weighted mean above does not expose particular weights, equal weights must be asserted by a different way. A different view on homogeneous weighting is, that the inputs can be swapped without altering the result. Thus we define M to be an unweighted mean if it is a weighted mean and for each permutation π of inputs, the result is the same. Symmetry: Mx = M(πx) for all n-tuples π and permutations π on n-tuples. Analogously to the weighted means, if C is a weighted mean and M1, ..., Mm are unweighted means and p is a positive real number, then A and B defined by

are also unweighted means.

Convert unweighted mean to weighted mean An unweighted mean can be turned into a weighted mean by repeating elements. This connection can also be used to state that a mean is the weighted version of an unweighted mean. Say you have the unweighted mean M and weight the numbers by natural numbers . (If the numbers are rational, then multiply them with the least common denominator.) Then the corresponding weighted mean A is obtained by

Means of tuples of different sizes If a mean M is defined for tuples of several sizes, then one also expects that the mean of a tuple is bounded by the means of partitions. More precisely • Given an arbitrary tuple x, which is partitioned into y1, ..., yk, then (See Convex hull.)

Population and sample means The mean of a population has an expected value of μ, known as the population mean. The sample mean makes a good estimator of the population mean, as its expected value is the same as the population mean. The sample mean of a population is a random variable, not a constant, and consequently it will have its own distribution. For a random sample of n observations from a normally distributed population, the sample mean distribution is

Often, since the population variance is an unknown parameter, it is estimated by the mean sum of squares, which changes the distribution of the sample mean from a normal distribution to a Student's t distribution with n − 1 degrees of freedom.

Mean

35

See also • • • • • • • • • • •

Average, same as central tendency Weighted mean Descriptive statistics Kurtosis Median Mode (statistics) Summary statistics Law of averages Spherical mean Algorithms for calculating mean and variance For an independent identical distribution from the reals, the mean of a sample is an unbiased estimator for the mean of the population.

References [1] IntroSTAT by L. G. Underhill and Dave Bradfield, p. 181 (http:/ / books. google. com/ books?id=f6TlVjrSAsgC& lpg=PP1& pg=PA181#v=onepage& q& f=false) [2] Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, p. 279 (http:/ / books. google. com/ books?id=DWCAh7jWO98C& lpg=PP1& pg=PA279#v=onepage& q& f=false) [3] Schaum's Outline of Theory and Problems of Probability by By Seymour Lipschutz and Marc Lipson, p. 141 (http:/ / books. google. com/ books?id=ZKdqlw2ZnAMC& lpg=PP1& pg=PA141#v=onepage& q& f=false)

• Hardy, G.H.; Littlewood, J.E.; Pólya, G. (1988), Inequalities (2nd ed.), Cambridge University Press, ISBN 978-0521358804

External links • Comparison between arithmetic and geometric mean of two numbers (http://www.sengpielaudio.com/ calculator-geommean.htm)

Statistical population

Statistical population In statistics, a statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest. Notice that if we choose a population like all crows, we will be limited to observing crows that exist now or will exist in the future. Probably, geography will also constitute a limitation in that our resources for studying crows are also limited. Population is also used to refer to a set of potential measurements or values, including not only cases actually observed but those that are potentially observable. Suppose, for example, we are interested in the set of all adult crows now alive in the county of Cambridgeshire, and we want to know the mean weight of these birds. For each bird in the population of crows there is a weight, and the set of these weights is called the population of weights.

Subpopulation A subset of a population is called a subpopulation. If different subpopulations have different properties, they can often be better understood if they are first separated into distinct subpopulations. For instance, a particular medicine may have different effects on different subpopulations, and its effects may be obscured or dismissed if the subpopulation is not identified and examined in isolation. Similarly, one can often estimate parameters more accurately if one separates out subpopulations: distribution of heights among people is better modeled by considering men and women as separate subpopulations, for instance. Populations consisting of subpopulations can be modeled by mixture models, which combine the distributions within subpopulations into an overall population distribution.

See also • Population • Sample (statistics) • Sampling (statistics)

External links • Statistical Terms Made Simple [1]

References [1] http:/ / www. socialresearchmethods. net/ kb/ sampstat. htm

36

Sampling (statistics)

Sampling (statistics) Sampling is that part of statistical practice concerned with the selection of an unbiased or random subset of individual observations within a population of individuals intended to yield some knowledge about the population of concern, especially for the purposes of making predictions based on statistical inference. Sampling is an important aspect of data collection. Researchers rarely survey the entire population for two reasons (Adèr, Mellenbergh, & Hand, 2008): the cost is too high, and the population is dynamic in that the individuals making up the population may change over time. The three main advantages of sampling are that the cost is lower, data collection is faster, and since the data set is smaller it is possible to ensure homogeneity and to improve the accuracy and quality of the data. Each observation measures one or more properties (such as weight, location, color) of observable bodies distinguished as independent objects or individuals. In survey sampling, survey weights can be applied to the data to adjust for the sample design. Results from probability theory and statistical theory are employed to guide practice. In business and medical research, sampling is widely used for gathering information about a population.[1]

Process The sampling process comprises several stages: • • • • • •

Defining the population of concern Specifying a sampling frame, a set of items or events possible to measure Specifying a sampling method for selecting items or events from the frame Determining the sample size Implementing the sampling plan Sampling and data collecting

Population definition Successful statistical practice is based on focused problem definition. In sampling, this includes defining the population from which our sample is drawn. A population can be defined as including all people or items with the characteristic one wishes to understand. Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population. Sometimes that which defines a population is obvious. For example, a manufacturer needs to decide whether a batch of material from production is of high enough quality to be released to the customer, or should be sentenced for scrap or rework due to poor quality. In this case, the batch is the population. Although the population of interest often consists of physical objects, sometimes we need to sample over time, space, or some combination of these dimensions. For instance, an investigation of supermarket staffing could examine checkout line length at various times, or a study on endangered penguins might aim to understand their usage of various hunting grounds over time. For the time dimension, the focus may be on periods or discrete occasions. In other cases, our 'population' may be even less tangible. For example, Joseph Jagger studied the behaviour of roulette wheels at a casino in Monte Carlo, and used this to identify a biased wheel. In this case, the 'population' Jagger wanted to investigate was the overall behaviour of the wheel (i.e. the probability distribution of its results over infinitely many trials), while his 'sample' was formed from observed results from that wheel. Similar considerations arise when taking repeated measurements of some physical characteristic such as the electrical conductivity of copper.

37

Sampling (statistics) This situation often arises when we seek knowledge about the cause system of which the observed population is an outcome. In such cases, sampling theory may treat the observed population as a sample from a larger 'superpopulation'. For example, a researcher might study the success rate of a new 'quit smoking' program on a test group of 100 patients, in order to predict the effects of the program if it were made available nationwide. Here the superpopulation is "everybody in the country, given access to this treatment" - a group which does not yet exist, since the program isn't yet available to all. Note also that the population from which the sample is drawn may not be the same as the population about which we actually want information. Often there is large but not complete overlap between these two groups due to frame issues etc. (see below). Sometimes they may be entirely separate - for instance, we might study rats in order to get a better understanding of human health, or we might study records from people born in 2008 in order to make predictions about people born in 2009. Time spent in making the sampled population and population of concern precise is often well spent, because it raises many issues, ambiguities and questions that would otherwise have been overlooked at this stage.

Sampling frame In the most straightforward case, such as the sentencing of a batch of material from production (acceptance sampling by lots), it is possible to identify and measure every single item in the population and to include any one of them in our sample. However, in the more general case this is not possible. There is no way to identify all rats in the set of all rats. Where voting is not compulsory, there is no way to identify which people will actually vote at a forthcoming election (in advance of the election). These imprecise populations are not amenable to sampling in any of the ways below and to which we could apply statistical theory. As a remedy, we seek a sampling frame which has the property that we can identify every single element and include any in our sample.[1] The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll, possible sampling frames include: • Electoral register • Telephone directory Not all frames explicitly list population elements. For example, a street map can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then visit all houses on those streets. (One advantage of such a frame is that it would include people who have recently moved and are not yet on the list frames discussed above.) The sampling frame must be representative of the population and this is a question outside the scope of statistical theory demanding the judgment of experts in the particular subject matter being studied. All the above frames omit some people who will vote at the next election and contain some people who will not; some frames will contain multiple records for the same person. People not in the frame have no prospect of being sampled. Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. In extrapolating from frame to population, its role is motivational and suggestive. To the scientist, however, representative sampling is the only justified procedure for choosing individual objects for use as the basis of generalization, and is therefore usually the only acceptable basis for ascertaining truth. —Andrew A. Marino[2] It is important to understand this difference to steer clear of confusing prescriptions found in many web pages. In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future.

38

Sampling (statistics) The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early death of a living man, Gottfried Leibniz recognized the problem in replying: Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary. —Gottfried Leibniz Kish posited four basic problems of sampling frames: 1. 2. 3. 4.

Missing elements: Some members of the population are not included in the frame. Foreign elements: The non-members of the population are included in the frame. Duplicate entries: A member of the population is surveyed more than once. Groups or clusters: The frame lists clusters instead of individuals.

A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.) Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census.

Probability and nonprobability sampling A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. The combination of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their probability of selection. Example: We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. (For example, we can allocate each person a random number, generated from a uniform distribution between 0 and 1, and select the person with the highest number in each household). We then interview the selected person and find their income. People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection. To reflect this, when we come to such a household, we would count the selected person's income twice towards the total. (In effect, the person who is selected from that household is taken as representing the person who isn't selected.) In the above example, not everybody has the same probability of selection; what makes it a probability sample is the fact that each person's probability is known. When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design. Such designs are also referred to as 'self-weighting' because all sampled units are given the same weight. Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified Sampling, Probability Proportional to Size Sampling, and Cluster or Multistage Sampling. These various ways of probability sampling have two things in common:

39

Sampling (statistics) 1. Every element has a known nonzero probability of being sampled and 2. involves random selection at some point. Nonprobability sampling is any sampling method where some elements of the population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population. Example: We visit every household in a given street, and interview the first person to answer the door. In any household with more than one occupant, this is a nonprobability sample, because some people are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is more likely to answer than an employed housemate who might be at work when the interviewer calls) and it's not practical to calculate these probabilities. Nonprobability Sampling includes: Accidental Sampling, Quota Sampling and Purposive Sampling. In addition, nonresponse effects may turn any probability design into a nonprobability design if the characteristics of nonresponse are not well understood, since nonresponse effectively modifies each element's probability of being sampled.

Sampling methods Within any of the types of frame identified above, a variety of sampling methods can be employed, individually or in combination. Factors commonly influencing the choice between these designs include: • • • • •

Nature and quality of the frame Availability of auxiliary information about units on the frame Accuracy requirements, and the need to measure accuracy Whether detailed analysis of the sample is expected Cost/operational concerns

Simple random sampling In a simple random sample ('SRS') of a given size, all such subsets of the frame are given an equal probability. Each element of the frame thus has an equal probability of selection: the frame is not subdivided or partitioned. Furthermore, any given pair of elements has the same chance of selection as any other such pair (and similarly for triples, and so on). This minimises bias and simplifies analysis of results. In particular, the variance between individual results within the sample is a good indicator of variance in the overall population, which makes it relatively easy to estimate the accuracy of results. However, SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn't reflect the makeup of the population. For instance, a simple random sample of ten people from a given country will on average produce five men and five women, but any given trial is likely to overrepresent one sex and underrepresent the other. Systematic and stratified techniques, discussed below, attempt to overcome this problem by using information about the population to choose a more representative sample. SRS may also be cumbersome and tedious when sampling from an unusually large target population. In some cases, investigators are interested in research questions specific to subgroups of the population. For example, researchers might be interested in examining whether cognitive ability as a predictor of job performance is equally applicable across racial groups. SRS cannot accommodate the needs of researchers in this situation because it does not provide

40

Sampling (statistics) subsamples of the population. Stratified sampling, which is discussed below, addresses this weakness of SRS. Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.

Systematic sampling Systematic sampling relies on arranging the target population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list. A simple example would be to select every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10'). As long as the starting point is randomized, systematic sampling is a type of probability sampling. It is easy to implement and the stratification induced can make it efficient, if the variable by which the list is ordered is correlated with the variable of interest. 'Every 10th' sampling is especially useful for efficient sampling from databases. Example: Suppose we wish to sample people from a long street that starts in a poor district (house #1) and ends in an expensive district (house #1000). A simple random selection of addresses from this street could easily end up with too many from the high end and too few from the low end (or vice versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th street number along the street ensures that the sample is spread evenly along the length of the street, representing all of these districts. (Note that if we always start at house #1 and end at #991, the sample is slightly biased towards the low end; by randomly selecting the start between #1 and #10, this bias is eliminated.) However, systematic sampling is especially vulnerable to periodicities in the list. If periodicity is present and the period is a multiple or factor of the interval used, the sample is especially likely to be unrepresentative of the overall population, making the scheme less accurate than simple random sampling. Example: Consider a street where the odd-numbered houses are all on the north (expensive) side of the road, and the even-numbered houses are all on the south (cheap) side. Under the sampling scheme given above, it is impossible' to get a representative sample; either the houses sampled will all be from the odd-numbered, expensive side, or they will all be from the even-numbered, cheap side. Another drawback of systematic sampling is that even in scenarios where it is more accurate than SRS, its theoretical properties make it difficult to quantify that accuracy. (In the two examples of systematic sampling that are given above, much of the potential sampling error is due to variation between neighbouring houses - but because this method never selects two neighbouring houses, the sample will not give us any information on that variation.) As described above, systematic sampling is an EPS method, because all elements have the same probability of selection (in the example given, one in ten). It is not 'simple random sampling' because different subsets of the same size have different selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection, but the set {4,13,24,34,...} has zero probability of selection. Systematic sampling can also be adapted to a non-EPS approach; for an example, see discussion of PPS samples below.

41

Sampling (statistics)

Stratified sampling Where the population embraces a number of distinct categories, the frame can be organized by these categories into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected.[3] There are many several potential benefits to stratified sampling. First, dividing the population into distinct, independent strata can enable researchers to draw inferences about specific subgroups that may be lost in a more generalized random sample. Second, utilizing a stratified sampling method can lead to more efficient statistical estimates (provided that strata are selected based upon relevance to the criterion in question, instead of availability of the samples). Even if a stratified sampling approach does not lead to increased statistical efficiency, such a tactic will not result in less efficiency than would simple random sampling, provided that each stratum is proportional to the group’s size in the population. Third, it is sometimes the case that data are more readily available for individual, pre-existing strata within a population than for the overall population; in such cases, using a stratified sampling approach may be more convenient than aggregating data across groups (though this may potentially be at odds with the previously noted importance of utilizing criterion-relevant strata). Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to different strata, potentially enabling researchers to use the approach best suited (or most cost-effective) for each identified subgroup within the population. There are, however, some potential drawbacks to using stratified sampling. First, identifying strata and implementing such an approach can increase the cost and complexity of sample selection, as well as leading to increased complexity of population estimates. Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata. Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods (although in most cases, the required sample size would be no larger than would be required for simple random sampling. A stratified sampling approach is most effective when three conditions are met 1. Variability within strata are minimized 2. Variability between strata are maximized 3. The variables upon which the population is stratified are strongly correlated with the desired dependent variable. Advantages over other sampling methods 1. 2. 3. 4.

Focuses on important subpopulations and ignores irrelevant ones. Allows use of different sampling techniques for different subpopulations. Improves the accuracy/efficiency of estimation. Permits greater balancing of statistical power of tests of differences between strata by sampling equal numbers from strata varying widely in size.

Disadvantages 1. Requires selection of relevant stratification variables which can be difficult. 2. Is not useful when there are no homogeneous subgroups. 3. Can be expensive to implement. Poststratification Stratification is sometimes introduced after the sampling phase in a process called "poststratification".[3] This approach is typically implemented due to a lack of prior knowledge of an appropriate stratifying variable or when the experimenter lacks the necessary information to create a stratifying variable during the sampling phase. Although the method is susceptible to the pitfalls of post hoc approaches, it can provide several benefits in the right situation. Implementation usually follows a simple random sample. In addition to allowing for stratification on an ancillary

42

Sampling (statistics) variable, poststratification can be used to implement weighting, which can improve the precision of a sample's estimates.[3] Oversampling Choice-based sampling is one of the stratified sampling strategies. In choice-based sampling,[4] the data are stratified on the target and a sample is taken from each strata so that the rare target class will be more represented in the sample. The model is then built on this biased sample. The effects of the input variables on the target are often estimated with more precision with the choice-based sample even when a smaller overall sample size is taken, compared to a random sample. The results usually must be adjusted to correct for the oversampling.

Probability proportional to size sampling In some cases the sample designer has access to an "auxiliary variable" or "size measure", believed to be correlated to the variable of interest, for each element in the population. This data can be used to improve accuracy in sample design. One option is to use the auxiliary variable as a basis for stratification, as discussed above. Another option is probability-proportional-to-size ('PPS') sampling, in which the selection probability for each element is set to be proportional to its size measure, up to a maximum of 1. In a simple PPS design, these selection probabilities can then be used as the basis for Poisson sampling. However, this has the drawbacks of variable sample size, and different portions of the population may still be over- or under-represented due to chance variation in selections. To address this problem, PPS may be combined with a systematic approach. Example: Suppose we have six schools with populations of 150, 180, 200, 220, 260, and 490 students respectively (total 1500 students), and we want to use student population as the basis for a PPS sample of size three. To do this, we could allocate the first school numbers 1 to 150, the second school 151 to 330 (= 150 + 180), the third school 331 to 530, and so on to the last school (1011 to 1500). We then generate a random start between 1 and 500 (equal to 1500/3) and count through the school populations by multiples of 500. If our random start was 137, we would select the schools which have been allocated numbers 137, 637, and 1137, i.e. the first, fourth, and sixth schools. The PPS approach can improve accuracy for a given sample size by concentrating sample on large elements that have the greatest impact on population estimates. PPS sampling is commonly used for surveys of businesses, where element size varies greatly and auxiliary information is often available - for instance, a survey attempting to measure the number of guest-nights spent in hotels might use each hotel's number of rooms as an auxiliary variable. In some cases, an older measurement of the variable of interest can be used as an auxiliary variable when attempting to produce more current estimates.

Cluster sampling Sometimes it is cheaper to 'cluster' the sample in some way e.g. by selecting respondents from certain areas only, or certain time-periods only. (Nearly all samples are in some sense 'clustered' in time - although this is rarely taken into account in the analysis.) Cluster sampling is an example of 'two-stage sampling' or 'multistage sampling': in the first stage a sample of areas is chosen; in the second stage a sample of respondents within those areas is selected. This can reduce travel and other administrative costs. It also means that one does not need a sampling frame listing all elements in the target population. Instead, clusters can be chosen from a cluster-level frame, with an element-level frame created only for the selected clusters. Cluster sampling generally increases the variability of sample estimates above that of simple random sampling, depending on how the clusters differ between themselves, as compared with the within-cluster variation. Nevertheless, some of the disadvantages of cluster sampling are the reliance of sample estimate precision on the actual clusters chosen. If clusters chosen are biased in a certain way, inferences drawn about population parameters

43

Sampling (statistics) from these sample estimates will be far off from being accurate. Multistage sampling Multistage sampling is a complex form of cluster sampling in which two or more levels of units are embedded one in the other. The first stage consists of constructing the clusters that will be used to sample from. In the second stage, a sample of primary units is randomly selected from each cluster (rather than using all units contained in all selected clusters). In following stages, in each of those selected clusters, additional samples of units are selected, and so on. All ultimate units (individuals, for instance) selected at the last step of this procedure are then surveyed. This technique, thus, is essentially the process of taking random samples of preceding random samples. It is not as effective as true random sampling, but it probably solves more of the problems inherent to random sampling. Moreover, It is an effective strategy because it banks on multiple randomizations. As such, it is extremely useful. Multistage sampling is used frequently when a complete list of all members of the population does not exist and is inappropriate. Moreover, by avoiding the use of all sample units in all selected clusters, multistage sampling avoids the large, and perhaps unnecessary, costs associated traditional cluster sampling.

Matched random sampling A method of assigning participants to groups in which pairs of participants are first matched on some characteristic and then individually assigned randomly to groups.[5] The procedure for matched random sampling can be briefed with the following contexts, 1. Two samples in which the members are clearly paired, or are matched explicitly by the researcher. For example, IQ measurements or pairs of identical twins. 2. Those samples in which the same attribute, or variable, is measured twice on each subject, under different circumstances. Commonly called repeated measures. Examples include the times of a group of athletes for 1500m before and after a week of special training; the milk yields of cows before and after being fed a particular diet.

Quota sampling In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion. For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60. It is this second step which makes the technique one of non-probability sampling. In quota sampling the selection of the sample is non-random. For example interviewers might be tempted to interview those who look most helpful. The problem is that these samples may be biased because not everyone gets a chance of selection. This random element is its greatest weakness and quota versus probability has been a matter of controversy for many years

Convenience sampling Convenience sampling (sometimes known as grab or opportunity sampling) is a type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, a sample population selected because it is readily available and convenient. The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough. For example, if the interviewer was to conduct such a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week. This type of sampling is most useful for pilot testing. Several important considerations for researchers using convenience samples include: 1. Are there controls within the research design or experiment which can serve to lessen the impact of a non-random, convenience sample whereby ensuring the results will be more representative of the population?

44

Sampling (statistics) 2. Is there good reason to believe that a particular convenience sample would or should respond or behave differently than a random sample from the same population? 3. Is the question being asked by the research one that can adequately be answered using a convenience sample? In social science research, snowball sampling is a similar technique, where existing study subjects are used to recruit more subjects into the sample.

Line-intercept sampling Line-intercept sampling is a method of sampling elements in a region whereby an element is sampled if a chosen line segment, called a “transect”, intersects the element.

Panel sampling Panel sampling is the method of first selecting a group of participants through a random sampling method and then asking that group for the same information again several times over a period of time. Therefore, each participant is given the same survey or interview at two or more time points; each period of data collection is called a "wave". This sampling methodology is often chosen for large scale or nation-wide studies in order to gauge changes in the population with regard to any number of variables from chronic illness to job stress to weekly food expenditures. Panel sampling can also be used to inform researchers about within-person health changes due to age or help explain changes in continuous dependent variables such as spousal interaction. There have been several proposed methods of analyzing panel sample data, including MANOVA, growth curves, and structural equation modeling with lagged effects. For a more thorough look at analytical techniques for panel data, see Johnson (1995).

Event sampling methodology Event sampling methodology (ESM) is a new form of sampling method that allows researchers to study ongoing experiences and events that vary across and within days in its naturally-occurring environment. Because of the frequent sampling of events inherent in ESM, it enables researchers to measure the typology of activity and detect the temporal and dynamic fluctuations of work experiences. Popularity of ESM as a new form of research design increased over the recent years because it addresses the shortcomings of cross-sectional research, where once unable to, researchers can now detect intra-individual variances across time. In ESM, participants are asked to record their experiences and perceptions in a paper or electronic diary. There are three types of ESM: 1. Signal contingent – random beeping notifies participants to record data. The advantage of this type of ESM is minimization of recall bias. 2. Event contingent – records data when certain events occur 3. Interval contingent – records data according to the passing of a certain period of time ESM has several disadvantages. One of the disadvantages of ESM is it can sometimes be perceived as invasive and intrusive by participants. ESM also leads to possible self-selection bias. It may be that only certain types of individuals are willing to participate in this type of study creating a non-random sample. Another concern is related to participant cooperation. Participants may not be actually fill out their diaries at the specified times. Furthermore, ESM may substantively change the phenomenon being studied. Reactivity or priming effects may occur, such that repeated measurement may cause changes in the participants' experiences. This method of sampling data is also highly vulnerable to common method variance.[6] Further, it is important to think about whether or not an appropriate dependent variable is being used in an ESM design. For example, it might be logical to use ESM in order to answer research questions which involve dependent variables with a great deal of variation throughout the day. Thus, variables such as change in mood, change in stress level, or the immediate impact of particular events may be best studied using ESM methodology. However, it is not

45

Sampling (statistics) likely that utilizing ESM will yield meaningful predictions when measuring someone performing a repetitive task throughout the day or when dependent variables are long-term in nature (coronary heart problems).

Replacement of selected units Sampling schemes may be without replacement ('WOR' - no element can be selected more than once in the same sample) or with replacement ('WR' - an element may appear multiple times in the one sample). For example, if we catch fish, measure them, and immediately return them to the water before continuing with the sample, this is a WR design, because we might end up catching and measuring the same fish more than once. However, if we do not return the fish to the water (e.g. if we eat the fish), this becomes a WOR design.

Sample size Formulas, tables, and power function charts are well known approaches to determine sample size.

Formulas Where the frame and population are identical, statistical theory yields exact recommendations on sample size.[7] However, where it is not straightforward to define a frame representative of the population, it is more important to understand the cause system of which the population are outcomes and to ensure that all sources of variation are embraced in the frame. Large number of observations are of no value if major sources of variation are neglected in the study. In other words, it is taking a sample group that matches the survey category and is easy to survey. Bartlett, Kotrlik, and Higgins (2001) published a paper titled Organizational Research: Determining Appropriate Sample Size in Survey Research Information Technology, Learning, and Performance Journal[8] that provides an explanation of Cochran’s (1977) formulas. A discussion and illustration of sample size formulas, including the formula for adjusting the sample size for smaller populations, is included. A table is provided that can be used to select the sample size for a research problem based on three alpha levels and a set error rate.

Steps for using sample size tables 1. Postulate the effect size of interest, α, and β. 2. Check sample size table[9] 1. 2. 3. 4.

Select the table corresponding to the selected α Locate the row corresponding to the desired power Locate the column corresponding to the estimated effect size. The intersection of the column and row is the minimum sample size required.

Sampling and data collection Good data collection involves: • • • •

Following the defined sampling process Keeping the data in time order Noting comments and other contextual events Recording non-responses

Most sampling books and papers written by non-statisticians focus only in the data collection aspect, which is just a small though important part of the sampling process.

46

Sampling (statistics)

Errors in sample surveys Survey results are typically subject to some error. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic biases as well as random errors.

Sampling errors and biases Sampling errors and biases are induced by the sample design. They include: 1. Selection bias: When the true selection probabilities differ from those assumed in calculating the results. 2. Random sampling error: Random variation in the results due to the elements in the sample being selected at random.

Non-sampling error Non-sampling errors are caused by other problems in data collection and processing. They include: 1. 2. 3. 4.

Overcoverage: Inclusion of data from outside of the population. Undercoverage: Sampling frame does not include elements in the population. Measurement error: E.g. when respondents misunderstand a question, or find it difficult to answer. Processing error: Mistakes in data coding.

5. Non-response: Failure to obtain complete data from all selected individuals. After sampling, a review should be held of the exact process followed in sampling, rather than that intended, in order to study any effects that any divergences might have on subsequent analysis. A particular problem is that of non-response. Two major types of nonresponse exist: unit nonresponse (referring to lack of completion of any part of the survey) and item nonresponse (submission or participation in survey but failing to complete one or more components/questions of the survey).[10] [11] In survey sampling, many of the individuals identified as part of the sample may be unwilling to participate, not have the time to participate (opportunity cost),[12] or survey administrators may not have been able to contact them. In this case, there is a risk of differences, between respondents and nonrespondents, leading to biased estimates of population parameters. This is often addressed by improving survey design, offering incentives, and conducting follow-up studies which make a repeated attempt to contact the unresponsive and to characterize their similarities and differences with the rest of the frame.[13] The effects can also be mitigated by weighting the data when population benchmarks are available or by imputing data based on answers to other questions. Nonresponse is particularly a problem in internet sampling. Reasons for this problem include improperly designed surveys,[11] over-surveying (or survey fatigue),[14] [15] and the fact that potential participants hold multiple e-mail addresses, which they don't use anymore or don't check regularly. Web-based surveys also tend to demonstrate nonresponse bias; for example, studies have shown that females and those from a white/Caucasian background are more likely to respond than their counterparts.[16]

Survey weights In many situations the sample fraction may be varied by stratum and data will have to be weighted to correctly represent the population. Thus for example, a simple random sample of individuals in the United Kingdom might include some in remote Scottish islands who would be inordinately expensive to sample. A cheaper method would be to use a stratified sample with urban and rural strata. The rural sample could be under-represented in the sample, but weighted up appropriately in the analysis to compensate. More generally, data should usually be weighted if the sample design does not give each individual an equal chance of being selected. For instance, when households have equal selection probabilities but one person is interviewed from within each household, this gives people from large households a smaller chance of being interviewed. This can

47

Sampling (statistics) be accounted for using survey weights. Similarly, households with more than one telephone line have a greater chance of being selected in a random digit dialing sample, and weights can adjust for this. Weights can also serve other purposes, such as helping to correct for non-response.

History Random sampling by using lots is an old idea, mentioned several times in the Bible. In 1786 Pierre Simon Laplace estimated the population of France by using a sample, along with ratio estimator. He also computed probabilistic estimates of the error. These were not expressed as modern confidence intervals but as the sample size that would be needed to achieve a particular upper bound on the sampling error with probability 1000/1001. His estimates used Bayes' theorem with a uniform prior probability and it assumed his sample was random. The theory of small-sample statistics developed by William Sealy Gossett put the subject on a more rigorous basis in the 20th century. However, the importance of random sampling was not universally appreciated and in the USA the 1936 Literary Digest prediction of a Republican win in the presidential election went badly awry, due to severe bias [17]. More than two million people responded to the study with their names obtained through magazine subscription lists and telephone directories. It was not appreciated that these lists were heavily biased towards Republicans and the resulting sample, though very large, was deeply flawed.

See also • • • • • • • • • •

Acceptance sampling Data collection Official statistics Replication (statistics) Sample (statistics) Sample size rule of thumb for estimate of population mean Sampling (case studies) Sampling error Gy's sampling theory Horvitz–Thompson estimator

References • Adèr, H. J., Mellenbergh, G. J., & Hand, D. J. (2008). Advising on research methods: A consultant's companion. Huizen, The Netherlands: Johannes van Kessel Publishing. • Bartlett, J. E., II, Kotrlik, J. W., & Higgins, C. (2001). Organizational research: Determining appropriate sample size for survey research. Information Technology, Learning, and Performance Journal, 19(1) 43–50. [18] • Chambers, R L, and Skinner, C J (editors) (2003), Analysis of Survey Data, Wiley, ISBN 0-471-89987-9 • Cochran, William G. (1977). Sampling Techniques (Third ed.). Wiley. ISBN 0-471-16240-X. • Deming, W. Edwards (1975) On probability as a basis for action, The American Statistician, 29(4), pp146–152. • Deming, W. Edwards (1966). Some Theory of Sampling. Dover Publications. ISBN 0-486-64684-X. OCLC 166526. • Gy, P (1992) Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling and Homogenizing • Kish, Leslie (1995) Survey Sampling, Wiley, ISBN 0-471-10949-5 • Korn, E L, and Graubard, B I (1999) Analysis of Health Surveys, Wiley, ISBN 0-471-13773-1 • Lohr, Sharon L. (1999). Sampling: Design and Analysis. Duxbury. ISBN 0-534-35361-4.

48

Sampling (statistics) • Pedhazur, E., & Schmelkin, L. (1991). Measurement design and analysis: An integrated approach. New York: Psychology Press. • Särndal, Carl-Erik, and Swensson, Bengt, and Wretman, Jan (1992). Model Assisted Survey Sampling. Springer-Verlag. ISBN 0-387-40620-4. • Stuart, Alan (1962) Basic Ideas of Scientific Sampling, Hafner Publishing Company, New York • Smith, T. M. F. (1984). "Present Position and Potential Developments: Some Personal Views: Sample surveys" [19] . Journal of the Royal Statistical Society. Series A (General) 147 (The 150th Anniversary of the Royal Statistical Society): 208–221. doi:10.2307/2981677. JSTOR 2981677 • Smith, T. M. F. (1993). "Populations and Selection: Limitations of Statistics (Presidential address)" [20]. Journal of the Royal Statistical Society. Series A (Statistics in Society) 156 (2): 144–166. doi:10.2307/2982726. JSTOR 2982726 (Portrait of T. M. F. Smith on page 144) • Smith, T. M. F. (2001). "Biometrika centenary: Sample surveys" [21]. Biometrika 88, (1): 167–243. doi:10.1093/biomet/88.1.167. • Smith, T. M. F. (2001). "Biometrika centenary: Sample surveys". in D. M. Titterington and D. R. Cox. Biometrika: One Hundred Years. Oxford University Press. pp. 165–194. ISBN 0-19-850993-6. • Whittle, P. (May 1954). "Optimum preventative sampling" [22]. Journal of the Operations Research Society of America 2 (2): 197–203. • ASTM E105 Standard Practice for Probability Sampling Of Materials • ASTM E122 Standard Practice for Calculating Sample Size to Estimate, With a Specified Tolerable Error, the Average for Characteristic of a Lot or Process • ASTM E141 Standard Practice for Acceptance of Evidence Based on the Results of Probability Sampling • ASTM E1402 Standard Terminology Relating to Sampling • ASTM E1994 Standard Practice for Use of Process Oriented AOQL and LTPD Sampling Plans • ASTM E2234 Standard Practice for Sampling a Stream of Product by Attributes Indexedby AQL

External links • Chapter on Sampling at the Research Methods Knowledge Base [23] • Survey Sampling Methods at the SatPac survey software site [24] • TRSL – Template Range Sampling Library [25] is a free-software and open-source C++ library that implements several sampling schemes behind an (STL-like) iterator interface. • Continuous Sampling vs. Costs - Electronics Industry Example [26]

References [1] Ken Black (2004). Business Statistics for Contemporary Decision Making (Fourth (Wiley Student Edition for India) ed.). Wiley-India. ISBN 9788126508099. [2] Andrew A. Marino) Representative Sampling (http:/ / www. ortho. lsuhsc. edu/ Faculty/ Marino/ Point1/ Representative. html) [3] Pedhazur & Schmelkin, 1991 [4] Scott, A.J., and Wild, C.J. (1986). Fitting logistic models under case-control or choice-based sampling. J. Roy. Statist. Soc. B, 48, 170–182. [5] Brown, Cozby, Kee, & Worden, 1999, p.371). [6] Alliger & Williams, 1993 [7] Mathematical details are displayed in the Sample size article. [8] http:/ / www. osra. org/ itlpj/ bartlettkotrlikhiggins. pdf [9] Cohen, 1988 [10] Berinsky, A. J. (2008). Survey non-response. In W. Donsbach & M. W. Traugott (Eds.), The SAGE handbook of public opinion research (pp. 309-321). Thousand Oaks, CA: Sage Publications. [11] Dillman, D. A., Eltinge, J. L., Groves, R. M., & Little, R. J. A. (2002). Survey nonresponse in design, data collection, and analysis. In R. M. Groves, D. A. Dillman, J. L. Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 3-26). New York: John Wiley & Sons. [12] Dillman, D.A., Smyth, J.D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method. San Francisco: Jossey-Bass.

49

Sampling (statistics) [13] Vehovar, V., Batagelj, Z., Manfreda, K.L., & Zaletel, M. (2002). Nonresponse in web surveys. In R. M. Groves, D. A. Dillman, J. L. Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 229-242). New York: John Wiley & Sons. [14] Porter, Whitcomb, Weitzer (2004) Multiple surveys of students and survey fatigue. In S. R. Porter (Ed.), Overcoming survey research problems: Vol. 121. New directions for institutional research (pp. 63-74). San Francisco, CA: Jossey Bass. [15] Groves et al., Survey Methodology (2004) book [16] Sax, L. J., Gilmartin, S. K., & Bryant, A. N. (2003). Assessing response rates and nonresponse bias in web and paper surveys. Research in Higher Education, 44(4), 409-432. [17] http:/ / online. wsj. com/ public/ article/ SB115974322285279370-_rk13XDUHmIcnA8DYs5VUscZG94_20071001. html?mod=rss_free [18] http:/ / www. osra. org/ itlpj/ bartlettkotrlikhiggins. pdf [19] http:/ / www. jstor. org/ stable/ 2981677 [20] http:/ / www. jstor. org/ stable/ 2982726 [21] http:/ / biomet. oxfordjournals. org/ cgi/ content/ abstract/ 88/ 1/ 167 [22] http:/ / www. jstor. org/ stable/ 166605 [23] http:/ / www. socialresearchmethods. net/ kb/ sampling. php [24] http:/ / www. statpac. com/ surveys/ sampling. htm [25] http:/ / trsl. sourceforge. net/ [26] http:/ / inderscience. metapress. com/ openurl. asp?genre=article& eissn=1740-8857& volume=4& issue=4& spage=393

Probability theory Probability theory is the branch of mathematics concerned with analysis of random phenomena.[1] The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion. Although an individual coin toss or the roll of a die is a random event, if repeated many times the sequence of random events will exhibit certain statistical patterns, which can be studied and predicted. Two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, as in statistical mechanics. A great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics.

History The mathematical theory of probability has its roots in attempts to analyze games of chance by Gerolamo Cardano in the sixteenth century, and by Pierre de Fermat and Blaise Pascal in the seventeenth century (for example the "problem of points"). Christiaan Huygens published a book on the subject in 1657.[2] Initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory. This culminated in modern probability theory, the foundations of which were laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of sample space, introduced by Richard von Mises, and measure theory and presented his axiom system for probability theory in 1933. Fairly quickly this became the mostly undisputed axiomatic basis for modern probability theory but alternatives exist, in particular the adoption of finite rather than countable additivity by Bruno de Finetti [3]

50

Probability theory

51

Treatment Most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory based treatment of probability covers both the discrete, the continuous, any mix of these two and more.

Discrete probability distributions Discrete probability theory deals with events that occur in countable sample spaces. Examples: Throwing dice, experiments with decks of cards, and random walk. Classical definition: Initially the probability of an event to occur was defined as number of cases favorable for the event, over the number of total outcomes possible in an equiprobable sample space: see Classical definition of probability. For example, if the event is "occurrence of an even number when a die is rolled", the probability is given by

,

since 3 faces out of the 6 have even numbers and each face has the same probability of appearing. Modern definition: The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by . It is then assumed that for each element , an intrinsic "probability" value

is attached, which satisfies the following properties:

1. 2. That is, the probability function f(x) lies between zero and one for every value of x in the sample space Ω, and the sum of f(x) over all values x in the sample space Ω is equal to 1. An event is defined as any subset of the sample space

. The probability of the event

is defined as

So, the probability of the entire sample space is 1, and the probability of the null event is 0. The function

mapping a point in the sample space to the "probability" value is called a probability mass

function abbreviated as pmf. The modern definition does not try to answer how probability mass functions are obtained; instead it builds a theory that assumes their existence.

Continuous probability distributions Continuous probability theory deals with events that occur in a continuous sample space. Classical definition: The classical definition breaks down when confronted with the continuous case. See Bertrand's paradox. Modern definition: If the outcome space of a random variable X is the set of real numbers ( then a function called the cumulative distribution function (or cdf)

exists, defined by

) or a subset thereof, .

That is, F(x) returns the probability that X will be less than or equal to x. The cdf necessarily satisfies the following properties. 1.

is a monotonically non-decreasing, right-continuous function;

2. 3. If

is absolutely continuous, i.e., its derivative exists and integrating the derivative gives us the cdf back again,

then the random variable X is said to have a probability density function or pdf or simply density

Probability theory For a set

52 , the probability of the random variable X being in

is

In case the probability density function exists, this can be written as

Whereas the pdf exists only for continuous random variables, the cdf exists for all random variables (including discrete random variables) that take values in These concepts can be generalized for multidimensional cases on

and other continuous sample spaces.

Measure-theoretic probability theory The raison d'être of the measure-theoretic treatment of probability is that it unifies the discrete and the continuous cases, and makes the difference a question of which measure is used. Furthermore, it covers distributions that are neither discrete nor continuous nor mixtures of the two. An example of such distributions could be a mix of discrete and continuous distributions, for example, a random variable which is 0 with probability 1/2, and takes a random value from a normal distribution with probability 1/2. It can still be studied to some extent by considering it to have a pdf of , where is the Dirac delta function. Other distributions may not even be a mix, for example, the Cantor distribution has no positive probability for any single point, neither does it have a density. The modern approach to probability theory solves these problems using measure theory to define the probability space: Given any set

, (also called sample space) and a σ-algebra

on it, a measure

defined on

is called a

probability measure if If

is the Borel σ-algebra on the set of real numbers, then there is a unique probability measure on

for any cdf,

and vice versa. The measure corresponding to a cdf is said to be induced by the cdf. This measure coincides with the pmf for discrete variables, and pdf for continuous variables, making the measure-theoretic approach free of fallacies. The probability of a set

in the σ-algebra

is defined as

where the integration is with respect to the measure

induced by

Along with providing better understanding and unification of discrete and continuous probabilities, measure-theoretic treatment also allows us to work on probabilities outside , as in the theory of stochastic processes. For example to study Brownian motion, probability is defined on a space of functions.

Probability distributions Certain random variables occur very often in probability theory because they well describe many natural or physical processes. Their distributions therefore have gained special importance in probability theory. Some fundamental discrete distributions are the discrete uniform, Bernoulli, binomial, negative binomial, Poisson and geometric distributions. Important continuous distributions include the continuous uniform, normal, exponential, gamma and beta distributions.

Probability theory

53

Convergence of random variables In probability theory, there are several notions of convergence for random variables. They are listed below in the order of strength, i.e., any subsequent notion of convergence in the list implies convergence according to all of the preceding notions. Weak convergence: A sequence of random variables variable

converges weakly to the random

if their respective cumulative distribution functions

distribution function

of

, wherever

converge to the cumulative

is continuous. Weak convergence is also called convergence in

distribution. Most common short hand notation: Convergence in probability: The sequence of random variables random variable

is said to converge towards the

in probability if

for every ε > 0.

Most common short hand notation: Strong convergence: The sequence of random variables variable

strongly if

is said to converge towards the random

. Strong convergence is also known as almost sure

convergence. Most common short hand notation: As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies convergence in probability, and convergence in probability implies weak convergence. The reverse statements are not always true.

Law of large numbers Common intuition suggests that if a fair coin is tossed many times, then roughly half of the time it will turn up heads, and the other half it will turn up tails. Furthermore, the more often the coin is tossed, the more likely it should be that the ratio of the number of heads to the number of tails will approach unity. Modern probability provides a formal version of this intuitive idea, known as the law of large numbers. This law is remarkable because it is nowhere assumed in the foundations of probability theory, but instead emerges out of these foundations as a theorem. Since it links theoretically derived probabilities to their actual frequency of occurrence in the real world, the law of large numbers is considered as a pillar in the history of statistical theory.[4] The law of large numbers (LLN) states that the sample average

of

and identically distributed random variables with finite expectation

(independent

) converges towards the theoretical

expectation It is in the different forms of convergence of random variables that separates the weak and the strong law of large numbers

It follows from LLN that if an event of probability p is observed repeatedly during independent experiments, the ratio of the observed frequency of that event to the total number of repetitions converges towards p. Putting this in terms of random variables and LLN we have taking values 1 with probability p and 0 with probability 1-p. converges to p almost surely.

are independent Bernoulli random variables for all i and it follows from LLN that

Probability theory

54

Central limit theorem "The central limit theorem (CLT) is one of the great results of mathematics." (Chapter 18 in ubiquitous occurrence of the normal distribution in nature.

[5]

.) It explains the

The theorem states that the average of many independent and identically distributed random variables with finite variance tends towards a normal distribution irrespective of the distribution followed by the original random variables. Formally, let be independent random variables with mean and variance Then the sequence of random variables

converges in distribution to a standard normal random variable.

See also • • • • • • • • •

Expected value and Variance Fuzzy logic and Fuzzy measure theory Glossary of probability and statistics Likelihood function List of probability topics Catalog of articles in probability theory List of publications in statistics List of statistical topics Probabilistic proofs of non-probabilistic theorems

• • • • • • • •

Notation in probability Predictive modelling Probabilistic logic - A combination of probability theory and logic Probability Probability axioms Probability interpretations Statistical independence Subjective logic

References • Pierre Simon de Laplace (1812). Analytical Theory of Probability. The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités. • Andrei Nikolajevich Kolmogorov (1950). Foundations of the Theory of Probability. The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933. • Patrick Billingsley (1979). Probability and Measure. New York, Toronto, London: John Wiley and Sons. • Olav Kallenberg; Foundations of Modern Probability, 2nd ed. Springer Series in Statistics. (2002). 650 pp. ISBN 0-387-95313-2 • Henk Tijms (2004). Understanding Probability. Cambridge Univ. Press. A lively introduction to probability theory for the beginner. • Olav Kallenberg; Probabilistic Symmetries and Invariance Principles. Springer -Verlag, New York (2005). 510 pp. ISBN 0-387-25115-4 • Gut, Allan (2005). Probability: A Graduate Course. Springer-Verlag. ISBN 0387228330.

Probability theory

References [1] Probability theory, Encyclopaedia Britannica (http:/ / www. britannica. com/ ebc/ article-9375936) [2] Grinstead, Charles Miller; James Laurie Snell. "Introduction". Introduction to Probability. pp. vii. [3] "The origins and legacy of Kolmogorov's Grundbegriffe", by Glenn Shafer and Vladimir Vovk (http:/ / www. probabilityandfinance. com/ articles/ 04. pdf) [4] http:/ / www. leithner. com. au/ circulars/ circular17. htm [5] David Williams, "Probability with martingales", Cambridge 1991/2008

55

Normal distribution

56

Normal distribution Probability density function

The red line is the standard normal distribution Cumulative distribution function

Colors match the image above notation: parameters:

μ ∈ R — mean (location) σ2 ≥ 0 — variance (squared scale)

support:

x ∈ R   if σ2 > 0 x = μ   if σ2 = 0

pdf: cdf: mean:

μ

median:

μ

mode:

μ

variance:

σ2

skewness:

0

ex.kurtosis:

0

entropy: mgf: cf: Fisher information:

In probability theory and statistics, the normal distribution, or Gaussian distribution, is an absolutely continuous probability distribution whose cumulants of all orders above two are zero. The graph of the associated probability density function is  “bell”-shaped, with peak at the mean, and is known as the Gaussian function or bell curve:[1]

Normal distribution

where parameters μ and σ 2 are the mean and the variance. The distribution with μ = 0 and σ 2 = 1 is called standard normal. The normal distribution is often used to describe, at least approximately, any variable that tends to cluster around the mean. For example, the heights of adult males in the United States are roughly normally distributed, with a mean of about 70 inches (1.8 m). Most men have a height close to the mean, though a small number of outliers have a height significantly above or below the mean. A histogram of male heights will appear similar to a bell curve, with the correspondence becoming closer if more data are used. By the central limit theorem, under certain conditions the sum of a number of random variables with finite means and variances approaches a normal distribution as the number of variables increases. For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural sciences, and social sciences[2] as a simple model for complex phenomena. For example, the observational error in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertainty is computed using this assumption. The Gaussian distribution was named after Carl Friedrich Gauss, who introduced it in 1809 as a way of rationalizing the method of least squares. One year later Laplace proved the first version of the central limit theorem, demonstrating that the normal distribution occurs as a limiting distribution of arithmetic means of independent, identically distributed random variables with finite second moment. For this reason the normal distribution is sometimes called Laplacian, especially in French-speaking countries.

Definition The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density function

The constant in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof] and 1⁄2 in the exponent makes the  “width” of the curve (measured as half of the distance between the inflection points of the curve) also equal to one. It is traditional[3] in statistics to denote this function with the Greek letter ϕ (phi), whereas density functions for all other distributions are usually denoted with letters ƒ or p. The alternative glyph φ is also used quite often, however within this article we reserve  “φ” to denote characteristic functions. More generally, a normal distribution results from exponentiating a quadratic function (just as an exponential distribution results from exponentiating a linear function):

This yields the classic  “bell curve” shape (provided that a < 0 so that the quadratic function is concave). Notice that f(x) > 0 everywhere. One can adjust a to control the  “width” of the bell, then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control the  “height” of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0). Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean μ = −b/(2a) and variance σ2 = −1/(2a). Changing to these new parameters allows us to rewrite the probability density function in a convenient standard form,

Notice that for a standard normal distribution, μ = 0 and σ2 = 1. The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell

57

Normal distribution curve’s central peak, and σ specifies the  “width” of the bell curve. The parameter μ is at the same time the mean, the median and the mode of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean. The square root of σ2 is called the standard deviation and is the width of the density function. The normal distribution is usually denoted by N(μ, σ2).[4] Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeX). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write

Alternative formulations Some authors[5] instead of σ2 use its reciprocal τ = σ−2, which is called the precision. This parameterization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parameterization is in the study of conditional distributions in multivariate normal case. The question which normal distribution should be called the  “standard” one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance σ2 = 1/2:

Stigler (1982) goes even further and suggests the standard normal with variance σ2 = 1/(2π):

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution.

Characterization In the previous section the normal distribution was defined by specifying its probability density function. However there are other ways to characterize a probability distribution. They include: the cumulative distribution function, the moments, the cumulants, the characteristic function, the moment-generating function, etc.

Probability density function The probability density function (pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section: This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the  “Gaussian function”. When σ2 = 0, the density function doesn’t exist. However we can consider a generalized function that would behave in a manner similar to the regular density function (in the sense that it defines a measure on the real line, and it can be plugged in into an integral in order to calculate expected values of different quantities):

This is the Dirac delta function, it is equal to infinity at x = μ and is zero elsewhere. Properties: • Function ƒ(x) is symmetric around the point x = μ, which is at the same time the mode, the median and the mean of the distribution.

58

Normal distribution • The inflection points of the curve occur one standard deviation away from the mean (i.e., at x = μ − σ and x = μ + σ). • The standard normal density ϕ(x) is an eigenfunction of the Fourier transform. • The function is supersmooth of order 2, implying that it is infinitely differentiable. • The first derivative of ϕ(x) is ϕ′(x) = −x·ϕ(x); the second derivative is ϕ′′(x) = (x2 − 1)ϕ(x). More generally, the n-th derivative is given by ϕ(n)(x) = (−1)nHn(x)ϕ(x), where Hn is the Hermite polynomial of order n.[6]

Cumulative distribution function The cumulative distribution function (cdf) describes probabilities for a random variable to fall in the intervals of the form (−∞, x]. The cdf of the standard normal distribution is denoted with the capital Greek letter Φ (phi), and can be computed as an integral of the probability density function:

This integral can only be expressed in terms of a special function erf, called the error function. The numerical methods for calculation of the standard normal cdf are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the cdf will be equal to

For a normal distribution with zero variance, the cdf is the Heaviside step function:

The complement of the standard normal cdf, Q(x) = 1 − Φ(x), is referred to as the Q-function, especially in engineering texts.[7] [8] This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are simple transformations of Φ, are also used occasionally.[9] Properties: • The standard normal cdf is 2-fold rotationally symmetric around point (0, ½):  Φ(−x) = 1 − Φ(x). • The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  Φ′(x) = ϕ(x). • The antiderivative of Φ(x) is:  ∫ Φ(x) dx = x Φ(x) + ϕ(x).

Quantile function The inverse of the standard normal cdf, called the quantile function or probit function, is expressed in terms of the inverse error function:

Quantiles of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the (−∞, zp] interval. The quantiles are used in hypothesis testing, construction of confidence intervals and Q-Q plots. The most  “famous” normal quantile is 1.96 = z0.975. A standard normal random variable is greater than 1.96 in absolute value in only 5% of cases. For a normal random variable with mean μ and variance σ2, the quantile function is

59

Normal distribution

Characteristic function and moment generating function The characteristic function φX(t) of a random variable X is defined as the expected value of eitX, where i is the imaginary unit, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transform of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is [10]

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.

Moments The normal distribution has moments of all orders. That is, for a normally distributed X with mean μ and variance σ 2 , the expectation E|X|p exists and is finite for all p such that Re[p] > −1. Usually we are interested only in moments of integer orders: p = 1, 2, 3, …. • Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of (X − μ) p. Using standardization of normal random variables, this expectation will be equal to σ p · E[Zp], where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every other number from n to 1. • Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders, but are nonzero for all odd p’s.

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric functions 1F1 and U.

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials. • First two cumulants are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.

60

Normal distribution

61

Order

Raw moment

Central moment

Cumulant

1

μ

0

μ

2

μ2 + σ2

σ2

σ2

3

μ3 + 3μσ2

0

0

4

μ4 + 6μ2σ2 + 3σ4

3σ 4

0

5

μ5 + 10μ3σ2 + 15μσ4

0

0

6

μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6

15σ 6

0

7

μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6

0

0

8

μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8

105σ 8

0

Properties Standardizing normal random variables As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This  “standardizing” transformation is convenient as it allows one to compute the pdf and especially the cdf of a normal distribution having the table of pdf and cdf values for the standard normal. They will be related via

Standard deviation and confidence intervals

Normal distribution

62

About 68% of values drawn from a normal distribution are within one standard deviation σ > 0 away from the mean μ; about 95% of the values are within two standard deviations and about 99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule, or the empirical rule, or the 3-sigma rule. To be more precise, the area under the bell curve between μ − nσ and μ + nσ in terms of the cumulative normal distribution function is given by

Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for about 68% of the set (dark blue), while two standard deviations from the mean (medium and dark blue) account for about 95%, and three standard deviations (light, medium, and dark blue) account for about 99.7%.

where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are: i.e. 1 minus ...

or 1 in ...

1 0.682689492137 0.317310507863 3.15148718753 2 0.954499736104 0.045500263896 21.9778945081 3 0.997300203937 0.002699796063 370.398347380 4 0.999936657516 0.000063342484 15,787.192684 5 0.999999426697 0.000000573303

1,744,278.331

6 0.999999998027 0.000000001973

506,842,375.7

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence intervals of the specified levels based on normally distributed (or asymptotically normal) estimators:

0.80

1.281551565545

0.90

1.644853626951

0.95

1.959963984540

0.98

2.326347874041

0.99

2.575829303549

0.995

2.807033768344

0.998

3.090232306168

0.999

3.290526731492

0.9999

3.890591886413

0.99999 4.417173413469

Normal distribution

63

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.

Central limit theorem The theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of xi’s can be arbitrary, then the central limit theorem states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed. The importance of the central limit theorem cannot be overemphasized. A great number of test statistics, scores, and estimators encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result. Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example: • The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one. • The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ. • The chi-squared distribution χ2(k) is approximately normal N(k, 2k) for large ks. • The Student’s t-distribution t(ν) is approximately normal N(0, 1) when ν is large.

As the number of discrete events increases, the function begins to resemble a normal distribution

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution. A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theorem, improvements of the approximation are given by the Edgeworth expansions.

Miscellaneous 1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform aX + b (for some real numbers a and b) is also normally distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proof] 2. The converse of (1) is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér’s theorem. The interpretation of this property is that

Normal distribution

64

a normal distribution is only divisible by other normal distributions. 3. It is a common fallacy that if two normal random variables are uncorrelated then they are also independent. This is false.[proof] The correct statement is that if the two random variables are jointly normal and uncorrelated, only then they are independent. 4. Normal distribution is infinitely divisible: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that 5. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent N(μ, σ2) random variables and a, b are arbitrary real numbers, then where X3 is also N(μ, σ2). This relationship directly follows from property (1). 6. The Kullback–Leibler divergence between two normal distributions X1 ∼ N(μ1, σ21 )and X2 ∼ N(μ2, σ22 )is given by:[11]

The Hellinger distance between the same distributions is equal to

7. The Fisher information matrix for normal distribution is diagonal and takes form

8. Normal distributions belongs to an exponential family with natural parameters 2

and

, and natural 2

statistics x and x . The dual, expectation parameters for normal distribution are η1 = μ and η2 = μ + σ2. 9. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the one with the maximum entropy. 10. The family of normal distributions forms a manifold with constant curvature −1. The same family is flat with respect to the (±1)-connections ∇(e) and ∇(m).[12]

Related distributions • If X is distributed normally with mean μ and variance σ2, then • The exponent of X is distributed log-normally: eX ~ lnN (μ, σ2). • The absolute value of X has folded normal distribution: IXI ~ Nf (μ, σ2). If μ = 0 this is known as the half-normal distribution. • The square of X/σ has the non-central chi-square distribution with one degree of freedom: X2/σ2 ~ χ21(μ2/σ2). If μ = 0, the distribution is called simply chi-square. • Variable X restricted to an interval [a, b] is called the truncated normal distribution. • (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2. • If X1 and X2 are two independent standard normal random variables, then • Their sum and difference is distributed normally with mean zero and variance two: X1 ± X2 ∼ N(0, 2). • Their product Z = X1·X2 follows the  “product-normal” distribution[13] with density function fZ(z) = π−1K0(|z|), where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic function φZ(t) = (1 + t 2)−1/2. • Their ratio follows the standard Cauchy distribution: X1 ÷ X2 ∼ Cauchy(0, 1).

Normal distribution • Their Euclidean norm

65 has the Rayleigh distribution, also known as the chi distribution with 2

degrees of freedom. • If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chi-square distribution with n degrees of freedom: . • If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviation, which can be demonstrated using the Basu’s theorem or Cochran’s theorem. The ratio of these two quantities will have the Student’s t-distribution with n − 1 degrees of freedom: • If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the F-distribution with (n, m) degrees of freedom:

Extensions The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is one-dimensional) case. All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists. • Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space. A vector X ∈ Rk is multivariate-normally distributed if any linear combination of its components     has a (univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V. • Complex normal distribution deals with the complex normal vectors. A complex vector X ∈ Ck is said to be normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C. • Matrix normal distribution describes the case of normally distributed matrices. • Gaussian processes are the normally distributed stochastic processes. These can be viewed as elements of some infinite-dimensional Hilbert space H, and thus are the analogues of multivariate normal vectors for the case k = ∞. A random element h ∈ H is said to be normal if for any constant a ∈ H the scalar product (a, h) has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance operator K: H → H. Several Gaussian processes became popular enough to have their own names: • Brownian motion, • Brownian bridge, • Ornstein-Uhlenbeck process. • Gaussian q-distribution is an abstract mathematical construction which represents a  “q-analogue” of the normal distribution. One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are: • Pearson distribution — a four-parametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.

Normal distribution

Normality tests Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesis H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below: • “Visual” tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis. • Q-Q plot — is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it’s a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k−α)/(n+1−2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line. • P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0,0) and (1,1). • Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly. • Normal probability plot (rankit plot) • Moment tests: • D’Agostino’s K-squared test • Jarque–Bera test • Empirical distribution function tests: • Kolmogorov–Smirnov test • Lilliefors test • Anderson–Darling test

Estimation of parameters It is often the case that we don’t know the parameters of the normal distribution, but instead want to estimate them. That is, having a sample (x1, …, xn) from a normal N(μ, σ2) population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood method, which requires maximization of the log-likelihood function: Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:

Estimator is called the sample mean, since it is the arithmetic mean of all observations. The statistic is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theorem, is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix . This implies that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error of is proportional to , that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion polls and the number of trials in Monte Carlo simulations.

66

Normal distribution From the standpoint of the asymptotic theory, is consistent, that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the . This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from by having (n − 1) instead of n in the denominator (the so called Bessel’s correction):

The difference between s2 and becomes negligibly small for large n’s. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the  “best” estimator among all unbiased ones. However it can be shown that the biased estimator is  “better” than the s2 in terms of the mean squared error (MSE) criterion. In finite samples both s2 and have scaled chi-squared distribution with (n − 1) degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to 2σ4/(n−1), which is slightly greater than the σσ-element of the inverse Fisher information matrix . Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist. Applying the asymptotic theory, both estimators s2 and are consistent, that is they converge in probability to σ2 as the sample size n → ∞. The two estimators are also both asymptotically normal: In particular, both estimators are asymptotically efficient for σ2. By Cochran’s theorem, for normal distribution the sample mean and the sample variance s2 are independent, which means there can be no gain in considering their joint distribution. There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between and s can be employed to construct the so-called t-statistic:

This quantity t has the Student’s t-distribution with (n − 1) degrees of freedom, and it is an ancillary statistic (independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct the confidence interval for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2: where tk,p and χk,p2 are the pth quantiles of the t- and χ2-distributions respectively. These confidence intervals are of the level 1 − α, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take α = 5%, resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of α = 5%, results in |z0.025| = 1.96.

67

Normal distribution

68

Occurrence The occurrence of normal distribution in practical problems can be loosely classified into three categories: 1. Exactly normal distributions; 2. Approximately normal laws, for example when such approximation is justified by the central limit theorem; and 3. Distributions modeled as normal — the normal distribution being one of the simplest and most convenient to use, frequently researchers are tempted to assume that certain quantity is distributed normally, without justifying such assumption rigorously. In fact, the maturity of a scientific field can be judged by the prevalence of the normality assumption in its methods.

Exact normality Certain quantities in physics are distributed normally, as was first demonstrated by James Clerk Maxwell. Examples of such quantities are: • Velocities of the molecules in the ideal gas. More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle. • Probability density function of a ground state in a quantum harmonic oscillator. • The density of an electron cloud in 1s state.

The ground state of a quantum harmonic oscillator has the Gaussian distribution.

• The position of a particle which experiences diffusion. If initially the particle is located at a specific point (that is its probability distribution is a dirac delta function), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation  . If the initial location is given by a certain density function g(x), then the density at time t is the convolution of g and the normal pdf.

Approximate normality Approximately normal distributions occur in many situations, as explained by the central limit theorem. When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects. • In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where infinitely divisible and decomposable distributions are involved, such as • Binomial random variables, associated with binary response variables; • Poisson random variables, associated with rare events; • Thermal light has a Bose–Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

Normal distribution

69

Assumed normality



I can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first approximation, particularly in theoretical investigations. — Pearson (1901)



There are statistical methods to empirically test that assumption, see the #Normality tests section. • In biology: • The logarithm of measures of size of living tissue (length, height, skin area, weight);[14] • The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category; • Certain physiological measurements, such as blood pressure of adult humans (after separation on male/female subpopulations). • In finance, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît Mandelbrot argue that log-Levy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crashes. • Measurement errors in physical experiments are often assumed to be normally distributed. This assumption allows for particularly simple practical rules for how to combine errors in measurements of different quantities. However, whether this assumption is valid or not in practice is debatable. A famous remark of Lippmann says:  “Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental fact; and the experimenters, because they suppose it is a theorem of mathematics.” [15] • In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ test), or by transforming the raw test scores into  “output” scores by fitting them to the normal distribution. For example, the SAT’s traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100. • Many scores are derived from the normal distribution, including percentile ranks (  “percentiles” or   “quantiles”), normal curve equivalents, stanines, z-scores, and T-scores. Additionally, a number of behavioral statistical procedures are based on the assumption that scores are normally distributed; for example, t-tests and ANOVAs. Bell curve grading assigns relative grades based on a normal distribution of scores.

Normal distribution

70

Generating values from normal distribution For computer simulations, especially in applications of Monte-Carlo method, it is often useful to generate values that have a normal distribution. All algorithms described here are concerned with generating the standard normal, since a N(μ, σ2) can be generated as X = μ + σZ, where Z is standard normal. The algorithms rely on the availability of a random number generator capable of producing random values distributed uniformly. • The most straightforward method is based on the probability integral transform property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in Hart (1968) and in the erf article.

The bean machine, a device invented by sir Francis Galton, can be called the first generator of normal random variables. This machine consists of a vertical board with interleaved rows of pins. Small balls are dropped from the top and then bounce randomly left or right as they hit the pins. The balls are collected into bins at the bottom and settle down into a pattern resembling the Gaussian curve.

• A simple approximate approach that is easy to program is as follows: simply sum 12 uniform (0,1) deviates and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).[16] • The Box–Muller method uses two independent random numbers U and V distributed uniformly on (0,1]. Then two random variables X and Y

will both have the standard normal distribution, and be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm X2 + Y2 will have the chi-square distribution with two degrees of freedom, which is an easily generated exponential random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V. • Marsaglia polar method is a modification of the Box–Muller method algorithm, which does not require computation of functions sin() and cos(). In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities

are returned. Again, X and Y here will be independent and standard normally distributed. • Ratio method[17] starts with generating two independent uniform deviates U and V. The algorithm proceeds as follows: • Compute X = √(8/e) (V − 0.5)/U; • If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm; • If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1; • If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.

Normal distribution

71

• The ziggurat algorithm (Marsaglia & Tsang 2000) is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the  “core of the ziggurat” a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed. • There is also some investigation into the connection between the fast Hadamard transform and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.

Numerical approximations for the normal cdf The standard normal cdf is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy. • Abramowitz & Stegun (1964) give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8 (algorithm 26.2.17 [18]):where ϕ(x) is the standard normal pdf, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429. • Hart (1968) lists almost a hundred of rational function approximations for the erfc() function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by West (2009) combines Hart’s algorithm 5666 with a continued fraction approximation in the tail to provide a fast computation algorithm with a 16-digit precision. • Marsaglia (2004) suggested a simple algorithm[19] based on the Taylor series expansion for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when x = 10). • The GNU Scientific Library calculates values of the standard normal cdf using Hart’s algorithms and approximations with Chebyshev polynomials.

History Some authors[20] [21] attribute at least partially the credit for the discovery of the normal distribution to de Moivre, who in 1738 published in the second edition of his  “The Doctrine of Chances”[22] [23] the study of the coefficients in the binomial expansion of (a + b)n. De Moivre proved that the middle term in this expansion has the approximate magnitude of , and that  “If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term diſtant from the middle by the Interval ℓ, has to the middle Term, is

.” Although this theorem can be

interpreted as the first obscure expression for the normal probability law, Stigler points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.[24]

Normal distribution

Carl Friedrich Gauss invented the normal distribution in 1809 as a way to rationalize the method of least squares.

72 In 1809 Gauss published the monograph  “Theoria motus corporum coelestium in sectionibus conicis solem ambientium” where among other things he introduces and describes several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, M′, M′′, … to denote the measurements of some unknown quantity V, and sought the  “most probable” estimator: the one which maximizes the probability φ(M−V) · φ(M′−V) · φ(M′′−V) · … of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the well-known answer: the arithmetic mean of the measured values.[25] Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:[26]

where h is  “the measure of the precision of the observations”. Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.[27] Although Gauss was the first to suggest the normal distribution law, the merit of the contributions of Laplace cannot be underestimated.[28] It was Laplace who first posed the problem of aggregating several observations in 1774,[29] although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral ∫ e−t ²dt = √π in 1782, providing the normalization constant for the normal distribution.[30] Finally, it was Laplace who in 1810 proved and presented to the Academy the fundamental central limit theorem, which emphasized the theoretical importance of the normal distribution.[31] It is of interest to note that in 1809 an American mathematician Adrain published two derivations of the normal probability law, simultaneously and independently from Gauss.[32] His works remained unnoticed until 1871 when they were rediscovered by Abbe,[33] mainly because the scientific community was virtually non-existent in the United States at that time.

Marquis de Laplace proved the central limit theorem in 1810, consolidating the importance of the normal distribution in statistics.

In the middle of the 19th century Maxwell demonstrated that the normal distribution is not just a convenient mathematical tool, but may also occur in natural phenomena:[34] “The number of particles whose velocity, resolved in a certain direction, lies between x and x+dx is

Since its introduction, the normal distribution has been known by many different names: the law of error, the law of facility of errors, Laplace’s second law, Gaussian law, etc. By the end of the 19th century some authors[35] start to

Normal distribution

73

occasionally use the name normal distribution, where the word “normal” is used as an adjective — the term was derived from the fact that this distribution was seen as typical, common, normal. Around the turn of the 20th century Pearson popularizes the term normal as a designation for this distribution.[36]



Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal.’ — Pearson (1920)



Also, it was Pearson who first wrote the distribution in terms of the standard deviation σ as in modern notation. Soon after this, in year 1915, Fisher added the location parameter to the formula for normal distribution, expressing it in the way it is written nowadays:

The term “standard normal” which denotes the normal distribution with zero mean and unit variance came into general use around 1950s, appearing in the popular textbooks by P.G. Hoel (1947) “Introduction to mathematical statistics” and A.M. Mood (1950) “Introduction to the theory of statistics”.[37]

See also • Behrens–Fisher problem — the long-standing problem of testing whether two normal samples with different variances have same means; • Erdős-Kac theorem — on the occurrence of the normal distribution in number theory • Gaussian blur — convolution which uses the normal distribution as a kernel

Notes [1] The designation  “bell curve” is ambiguous: there are many other distributions in probability theory which can be recognized as  “bell-shaped”: the Cauchy distribution, Student’s t-distribution, generalized normal, logistic, etc. [2] Gale Encyclopedia of Psychology — Normal Distribution (http:/ / findarticles. com/ p/ articles/ mi_g2699/ is_0002/ ai_2699000241) [3] Halperin & et al. (1965, item 7) [4] McPherson (1990) page 110 [5] Bernardo & Smith (2000) [6] Patel & Read (1996, [2.1.8]) [7] Scott, Clayton; Robert Nowak (August 7, 2003). "The Q-function" (http:/ / cnx. org/ content/ m11537/ 1. 2/ ). Connexions. . [8] Barak, Ohad (April 6, 2006). "Q function and error function" (http:/ / www. eng. tau. ac. il/ ~jo/ academic/ Q. pdf). Tel Aviv University. . [9] Weisstein, Eric W., " Normal Distribution Function (http:/ / mathworld. wolfram. com/ NormalDistributionFunction. html)" from MathWorld. [10] Sanders, Mathijs A.. "Characteristic function of the univariate normal distribution" (http:/ / www. planetmathematics. com/ CharNormal. pdf). . Retrieved 2009-03-06. [11] http:/ / www. allisons. org/ ll/ MML/ KL/ Normal/ [12] Amari & Nagaoka (2000) [13] Mathworld entry for Normal Product Distribution (http:/ / mathworld. wolfram. com/ NormalProductDistribution. html) [14] Huxley (1932) [15] Whittaker, E. T.; Robinson, G. (1967). The Calculus of Observations: A Treatise on Numerical Mathematics. New York: Dover. p. 179. [16] Johnson et al. (1995, Equation (26.48)) [17] Kinderman & Monahan (1976) [18] http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_932. htm [19] For example, this algorithm is given in the article Bc programming language. [20] Johnson et al. (1994, page 85) [21] Le Cam (2000, p. 74) [22] De Moivre (1738) [23] De Moivre first published his findings in 1733, in a pamphlet  “Approximatio ad Summam Terminorum Binomii (a + b)n in Seriem Expansi” that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example Helen M. Walker (1985). [24] Stigler (1986, p. 76)

Normal distribution [25] “It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it.” — Gauss (1809, section 177) [26] Gauss (1809, section 177) [27] Gauss (1809, section 179) [28] “My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians.” quote from Pearson (1905, p. 189) [29] Laplace (1774, Problem III) [30] Pearson (1905, p. 189) [31] Stigler (1986, p. 144) [32] Stigler (1978, p. 243) [33] Stigler (1978, p. 244) [34] Maxwell (1860), p. 23 [35] Such use is encountered in the works of Peirce, Galton and Lexis approximately around 1875. [36] Kruskal & Stigler (1997) [37] "Earliest uses… (entry STANDARD NORMAL CURVE)" (http:/ / jeff560. tripod. com/ s. html). .

Literature • Aldrich, John; Miller, Jeff. "Earliest uses of symbols in probability and statistics" (http://jeff560.tripod.com/ stat.html). • Aldrich, John; Miller, Jeff. "Earliest known uses of some of the words of mathematics" (http://jeff560.tripod. com/mathword.html). In particular, the entries for “bell-shaped and bell curve” (http://jeff560.tripod.com/b. html), “normal (distribution)” (http://jeff560.tripod.com/n.html), “Gaussian” (http://jeff560.tripod.com/g. html), and “Error, law of error, theory of errors, etc.” (http://jeff560.tripod.com/e.html). • Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford University Press. ISBN 0-8218-0531-2. • Bernardo, J. M.; Smith, A.F.M. (2000). Bayesian Theory. Wiley. ISBN 0-471-49464-X. • de Moivre, Abraham (1738). The Doctrine of Chances. ISBN 0821821032. • Gavss, Carolo Friderico (1809) (in Latin). Theoria motvs corporvm coelestivm in sectionibvs conicis Solem ambientivm [Theory of the motion of the heavenly bodies moving about the Sun in conic sections]. English translation (http://books.google.com/books?id=1TIAAAAAQAAJ). • Gould, Stephen Jay (1981). The mismeasure of man (first ed.). W.W. Norton. ISBN 0-393-01489-4. • Halperin, Max; Hartley, H. O.; Hoel, P. G. (1965). "Recommended standards for statistical symbols and notation. COPSS committee on symbols and notation" (http://jstor.org/stable/2681417). The American Statistician 19 (3): 12–14. doi:10.2307/2681417. • Hart, John F.; et al (1968). Computer approximations. New York: John Wiley & Sons, Inc. ISBN 0882756427. • Herrnstein, C.; Murray (1994). The bell curve: intelligence and class structure in American life. Free Press. ISBN 0-02-914673-9. • Huxley, Julian S. (1932). Problems of relative growth. London. ISBN 0486611140. OCLC 476909537. • Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 1. Wiley. ISBN 0-471-58495-9. • Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 2. Wiley. ISBN 0-471-58494-0. • Kruskal, William H.; Stigler, Stephen M. (1997). Normative terminology: ‘normal’ in statistics and elsewhere. Statistics and public policy, edited by Bruce D. Spencer. Oxford University Press. ISBN 0-19-852341-6. • la Place, M. de (1774). "Mémoire sur la probabilité des causes par les évènemens". Mémoires de Mathématique et de Physique, Presentés à l’Académie Royale des Sciences, par divers Savans & lûs dans ses Assemblées, Tome Sixième: 621–656. Translated by S.M.Stigler in Statistical Science 1 (3), 1986: JSTOR 2245476. • Laplace, Pierre-Simon (1812). Analytical theory of probabilities.

74

Normal distribution • McPherson, G. (1990). Statistics in scientific investigation: its basis, application and interpretation. Springer-Verlag. ISBN 0-387-97137-8. • Marsaglia, George; Tsang, Wai Wan (2000). "The ziggurat method for generating random variables" (http:// www.jstatsoft.org/v05/i08/paper). Journal of Statistical Software 5 (8). • Marsaglia, George (2004). "Evaluating the normal distribution" (http://www.jstatsoft.org/v11/i05/paper). Journal of Statistical Software 11 (4). • Maxwell, James Clerk (1860). "V. Illustrations of the dynamical theory of gases. — Part I: On the motions and collisions of perfectly elastic spheres". Philosophical Magazine, series 4 19 (124): 19–32. doi:10.1080/14786446008642818 (inactive 2010-09-14). • Patel, Jagdish K.; Read, Campbell B. (1996). Handbook of the normal distribution. ISBN 0824715411. • Pearson, Karl (1905). "‘Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson’. A rejoinder". Biometrika 4: 169–212. JSTOR 2331536. • Pearson, Karl (1920). "Notes on the history of correlation". Biometrika 13 (1): 25–45. doi:10.1093/biomet/13.1.25. JSTOR 2331722. • Stigler, Stephen M. (1978). "Mathematical statistics in the early states". The Annals of Statistics 6 (2): 239–265. doi:10.1214/aos/1176344123. JSTOR 2958876. • Stigler, Stephen M. (1982). "A modest proposal: a new standard for the normal". The American Statistician 36 (2). JSTOR 2684031. • Stigler, Stephen M. (1986). The history of statistics: the measurement of uncertainty before 1900. Harvard University Press. ISBN 0-674-40340-1. • Stigler, Stephen M. (1999). Statistics on the table. Harvard University Press. ISBN 0674836014. • Walker, Helen M. (editor) (1985) "De Moivre on the law of normal probability" in: Smith, David Eugene (1985), A Source Book in Mathematics, Dover. ISBN 0486646904 pages 566–575. (online pdf) (http://www.york.ac. uk/depts/maths/histstat/demoivre.pdf) • Weisstein, Eric W. "Normal distribution" (http://mathworld.wolfram.com/NormalDistribution.html). MathWorld. • West, Graeme (2009). "Better approximations to cumulative normal functions" (http://www.wilmott.com/pdfs/ 090721_west.pdf). Wilmott Magazine: 70–76. • Zelen, Marvin; Severo, Norman C. (1964). Probability functions (chapter 26) (http://www.math.sfu.ca/~cbm/ aands/page_931.htm). Handbook of mathematical functions with formulas, graphs, and mathematical tables, by Abramowitz and Stegun: National Bureau of Standards. New York: Dover. ISBN 0-486-61272-4.

75

Standard deviation

76

Standard deviation In probability theory and statistics, the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance. Standard deviation is a widely used measure of the variability or dispersion, being algebraically more tractable though practically less robust than the expected deviation or average absolute deviation. It shows how much variation there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data is spread out over a large range of values.

A plot of a normal distribution (or bell curve). Each colored band has a width of one standard deviation.

For example, the average height for adult men in the United States is about 70 inches (178 cm), with a standard Cumulative probability of a normal distribution with expected value 0 and standard deviation 1 deviation of around 3 in (8 cm). This means that most men (about 68 percent, assuming a normal distribution) have a height within 3 in (8 cm) of the mean (67–73 in/170–185 cm), one standard deviation. Whereas almost all men (about 95%) have a height within 6 in (15 cm) of the mean (64–76 in/163–193 cm), 2 standard deviations. If the standard deviation were zero, then all men would be exactly 70 in (178 cm) high. If the standard deviation were 20 in (51 cm), then men would have much more A data set with a mean of 50 (shown in blue) and a standard deviation (σ) of 20. variable heights, with a typical range of about 50–90 in (127–229 cm). Three standard deviations account for 99.7% of the sample population being studied, assuming the distribution is normal (bell-shaped).

Standard deviation

In addition to expressing the variability of a population, standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the expected standard deviation in the results if the same poll were to be conducted multiple times. The reported margin of error is typically about twice the standard deviation–the radius of a 95% confidence interval. In science, researchers commonly report the standard deviation of experimental data, and only effects that fall far outside the range of standard deviation are considered statistically significant—normal random error or variation in the measurements is in this way distinguished from causal variation. Standard deviation is also important in finance, where the standard deviation on the rate of return on an investment is a measure of the volatility of the investment.

77

Example of two sample populations with the same mean and different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50.

The term standard deviation was first used[1] in writing by Karl Pearson[2] in 1894, following his use of it in lectures. This was as a replacement for earlier alternative names for the same idea: for example, Gauss used mean error.[3] A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. Note, however, that for measurements with percentage as unit, the standard deviation will have percentage points as unit. When only a sample of data from a population is available, the population standard deviation can be estimated by a modified quantity called the sample standard deviation, explained below.

Basic examples Consider a population consisting of the following eight values:

These eight data points have the mean (average) of 5:

To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result of each:

Next compute the average of these values, and take the square root:

This quantity is the population standard deviation, it is equal to the square root of the variance. The formula is valid only if the eight values we began with form the complete population. If they instead were a random sample, drawn from some larger,  “parent” population, then we should have used 7 instead of 8 in the denominator of the last formula, and then the quantity thus obtained would have been called the sample standard deviation. See the section Estimation below for more details.

Standard deviation

Definition of population values Let X be a random variable with mean value μ:

Here the operator E denotes the average or expected value of X. Then the standard deviation of X is the quantity That is, the standard deviation σ (sigma) is the square root of the average value of (X − μ)2. The standard deviation of a (univariate) probability distribution is the same as that of a random variable having that distribution. Not all random variables have a standard deviation, since these expected values need not exist. For example, the standard deviation of a random variable which follows a Cauchy distribution is undefined because its expected value μ is undefined.

Discrete random variable In the case where X takes random values from a finite data set x1, x2, …, xN, with each value having the same probability, the standard deviation is

or, using summation notation,

Continuous random variable The standard deviation of a continuous real-valued random variable X with probability density function p(x) is

where

and where the integrals are definite integrals taken for x ranging over the sample space of X. In the case of a parametric family of distributions, the standard deviation can be expressed in terms of the parameters. For example, in the case of the log-normal distribution with parameters μ and σ2, the standard deviation is [(exp(σ2) − 1)exp(2μ + σ2)]1/2.

78

Standard deviation

79

Estimation One can find the standard deviation of an entire population in cases (such as standardized testing) where every member of a population is sampled. In cases where that cannot be done, the standard deviation σ is estimated by examining a random sample taken from the population. Some estimators are given below:

With standard deviation of the sample An estimator for σ sometimes used is the standard deviation of the sample, denoted by sN and defined as follows:

This estimator has a uniformly smaller mean squared error than the sample standard deviation (see below), and is the maximum-likelihood estimate when the population is normally distributed. But this estimator, when applied to a small or moderately sized sample, tends to be too low: it is a biased estimator. The standard deviation of the sample is the same as the population standard deviation of a discrete random variable that can assume precisely the values from the data set, where the probability for each value is proportional to its multiplicity in the data set.

With sample standard deviation The most common estimator for σ used is an adjusted version, the sample standard deviation, denoted by s and defined as follows:

where

are the observed values of the sample items and

is the mean value of these observations.

This correction (the use of N − 1 instead of N) is known as Bessel's correction. The reason for this correction is that s2 is an unbiased estimator for the variance σ2 of the underlying population, if that variance exists and the sample values are drawn independently with replacement. However, s is not an unbiased estimator for the standard deviation σ; it tends to underestimate the population standard deviation. The term standard deviation of the sample is used for the uncorrected estimator (using N) whilst the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .

Other estimators Although an unbiased estimator for σ is known when the random variable is normally distributed, the formula is complicated and amounts to a minor correction. Moreover, unbiasedness (in this sense of the word) is not always desirable.

Identities and mathematical properties The standard deviation is invariant to changes in location, and scales directly with the scale of the random variable. Thus, for a constant c and random variables X and Y:

The standard deviation of the sum of two random variables can be related to their individual standard deviations and the covariance between them:

Standard deviation

where

80

and

stand for variance and covariance, respectively.

The calculation of the sum of squared deviations can be related to moments calculated directly from the data. In general, we have

For a finite population with equal probabilities on all points, we have

Thus, the standard deviation is equal to the square root of (the average of the squares less the square of the average). See computational formula for the variance for a proof of this fact, and for an analogous result for the sample standard deviation.

Interpretation and application A large standard deviation indicates that the data points are far from the mean and a small standard deviation indicates that they are clustered closely around the mean. For example, each of the three populations {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} has a mean of 7. Their standard deviations are 7, 5, and 1, respectively. The third population has a much smaller standard deviation than the other two because its values are all close to 7. In a loose sense, the standard deviation tells us how far from the mean the data points tend to be. It will have the same units as the data points themselves. If, for instance, the data set {0, 6, 8, 14} represents the ages of a population of four siblings in years, the standard deviation is 5 years. As another example, the population {1000, 1006, 1008, 1014} may represent the distances traveled by four athletes, measured in meters. It has a mean of 1007 meters, and a standard deviation of 5 meters. Standard deviation may serve as a measure of uncertainty. In physical science, for example, the reported standard deviation of a group of repeated measurements should give the precision of those measurements. When deciding whether measurements agree with a theoretical prediction the standard deviation of those measurements is of crucial importance: if the mean of the measurements is too far away from the prediction (with the distance measured in standard deviations), then the theory being tested probably needs to be revised. This makes sense since they fall outside the range of values that could reasonably be expected to occur if the prediction were correct and the standard deviation appropriately quantified. See prediction interval.

Application examples The practical value of understanding the standard deviation of a set of values is in appreciating how much variation there is from the "average" (mean). Climate As a simple example, consider the average daily maximum temperatures for two cities, one inland and one on the coast. It is helpful to understand that the range of daily maximum temperatures for cities near the coast is smaller than for cities inland. Thus, while these two cities may each have the same average maximum temperature, the standard deviation of the daily maximum temperature for the coastal city will be less than that of the inland city as, on any particular day, the actual maximum temperature is more likely to be farther from the average maximum temperature for the inland city than for the coastal one.

Standard deviation Sports Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at some things and poorly at others. Chances are, the teams that lead in the standings will not show such disparity, but will perform well in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent they will tend to be. Whereas, teams with a higher standard deviation will be more unpredictable. For example, a team that is consistently bad in most categories will have a low standard deviation. A team that is consistently good in most categories will also have a low standard deviation. However, a team with a high standard deviation might be the type of team that scores a lot (strong offense) but also concedes a lot (weak defense), or, vice versa, that might have a poor offense but compensates by being difficult to score on. Trying to predict which teams, on any given day, will win, may include looking at the standard deviations of the various team "stats" ratings, in which anomalies can match strengths vs. weaknesses to attempt to understand what factors may prevail as stronger indicators of eventual scoring outcomes. In racing, a driver is timed on successive laps. A driver with a low standard deviation of lap times is more consistent than a driver with a higher standard deviation. This information can be used to help understand where opportunities might be found to reduce lap times. Finance In finance, standard deviation is a representation of the risk associated with a given security (stocks, bonds, property, etc.), or the risk of a portfolio of securities (actively managed mutual funds, index mutual funds, or ETFs). Risk is an important factor in determining how to efficiently manage a portfolio of investments because it determines the variation in returns on the asset and/or portfolio and gives investors a mathematical basis for investment decisions (known as mean-variance optimization). The overall concept of risk is that as it increases, the expected return on the asset will increase as a result of the risk premium earned – in other words, investors should expect a higher return on an investment when said investment carries a higher level of risk, or uncertainty of that return. When evaluating investments, investors should estimate both the expected return and the uncertainty of future returns. Standard deviation provides a quantified estimate of the uncertainty of future returns. For example, let's assume an investor had to choose between two stocks. Stock A over the last 20 years had an average return of 10%, with a standard deviation of 20 percentage points (pp) and Stock B, over the same period, had average returns of 12%, but a higher standard deviation of 30 pp. On the basis of risk and return, an investor may decide that Stock A is the safer choice, because Stock B's additional 2% points of return is not worth the additional 10 pp standard deviation (greater risk or uncertainty of the expected return). Stock B is likely to fall short of the initial investment (but also to exceed the initial investment) more often than Stock A under the same circumstances, and is estimated to return only 2% more on average. In this example, Stock A is expected to earn about 10%, plus or minus 20 pp (a range of 30% to -10%), about two-thirds of the future year returns. When considering more extreme possible returns or outcomes in future, an investor should expect results of up to 10% plus or minus 60 pp, or a range from 70% to (−)50%, which includes outcomes for three standard deviations from the average return (about 99.7% of probable returns). Calculating the average return (or arithmetic mean) of a security over a given period will generate an expected return on the asset. For each period, subtracting the expected return from the actual return results in the variance. Square the variance in each period to find the effect of the result on the overall risk of the asset. The larger the variance in a period, the greater risk the security carries. Taking the average of the squared variances results in the measurement of overall units of risk associated with the asset. Finding the square root of this variance will result in the standard deviation of the investment tool in question. Population standard deviation is used to set the width of Bollinger Bands, a widely adopted technical analysis tool. For example, the upper Bollinger Band is given as: x + nσx The most commonly used value for n is 2; there is about 5% chance of going outside, assuming the normal distribution is right.

81

Standard deviation

82

Geometric interpretation To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P = (x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

Chebyshev's inequality An observation is rarely more than a few standard deviations away from the mean. Chebyshev's inequality ensures, for all distributions for which the standard deviation is defined, the amount of data within a number of standard deviations is at least that as follows. The following table gives some exemplar values of the minimum population within a number of standard deviations. Min. population

Distance from mean

50%

√2

75%

2

89%

3

94%

4

96%

5

97%

6 [4]

Rules for normally distributed data

Standard deviation

83

The central limit theorem says that the distribution of a sum of many independent, identically distributed random variables tends towards the famous bell-shaped normal distribution with a probability density function of:

Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68.27 % of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45%; three standard deviations (light, medium, and dark blue) account for 99.73%; and four standard deviations account for 99.994%. The two points of the curve which are one standard deviation from the mean are also the inflection points.

where μ is the arithmetic mean of the sample. The standard deviation therefore is simply a scaling variable that adjusts how broad the curve will be, though also appears in the normalizing constant to keep the distribution normalized for different widths. If a data distribution is approximately normal then the proportion of data values within z standard deviations of the mean is defined by , where is the error function. If a data distribution is approximately normal then about 68% of the data values are within 1 standard deviation of the mean (mathematically, μ ± σ, where μ is the arithmetic mean), about 95% are within two standard deviations (μ ± 2σ), and about 99.7% lie within 3 standard deviations (μ ± 3σ). This is known as the 68-95-99.7 rule, or the empirical rule. For various values of z, the percentage of values expected to lie in and outside the symmetric confidence interval, CI = (−zσ, zσ), are as follows: zσ

Percentage within CI

Percentage outside CI

Ratio outside CI



68.2689492%

31.7310508%

1 / 3.1514871

1.645σ

90%

10%

1 / 10

1.960σ

95%

5%

1 / 20



95.4499736%

4.5500264%

1 / 21.977894

2.576σ

99%

1%

1 / 100



99.7300204%

0.2699796%

1 / 370.398

3.2906σ

99.9%

0.1%

1 / 1000



99.993666%

0.006334%

1 / 15,788



99.9999426697%

0.0000573303%

1 / 1744278



99.9999998027%

0.0000001973%

1 / 506,800,000



99.9999999997440%

0.0000000002560%

1 / 390600000000

Standard deviation

84

Relationship between standard deviation and mean The mean and the standard deviation of a set of data are usually reported together. In a certain sense, the standard deviation is a "natural" measure of statistical dispersion if the center of the data is measured about the mean. This is because the standard deviation from the mean is smaller than from any other point. The precise statement is the following: suppose x1, ..., xn are real numbers and define the function:

Using calculus or by completing the square, it is possible to show that σ(r) has a unique minimum at the mean:

The coefficient of variation of a sample is the ratio of the standard deviation to the mean. It is a dimensionless number that can be used to compare the amount of variance between populations with means that are close together. The reason is that if you compare populations with same standard deviations but different means then coefficient of variation will be bigger for the population with the smaller mean. Thus in comparing variability of data, coefficient of variation should be used with care and better replaced with another method. Often we want some information about the accuracy of the mean we obtained. We can obtain this by determining the standard deviation of the sampled mean. The standard deviation of the mean is related to the standard deviation of the distribution by:

where N is the number of observation in the sample used to estimate the mean. This can easily be proven with:

hence

Resulting in:

Worked example The standard deviation of a discrete random variable is the root-mean-square (RMS) deviation of its values from the mean. If the random variable X takes on N values standard deviation σ can be calculated as follows: 1. Find the mean, , of the values. 2. For each value calculate its deviation

(which are real numbers) with equal probability, then its

from the mean.

3. Calculate the squares of these deviations. 4. Find the mean of the squared deviations. This quantity is the variance σ2. 5. Take the square root of the variance.

Standard deviation This calculation is described by the following formula:

where is the arithmetic mean of the values xi, defined as:

If not all values have equal probability, but the probability of value xi equals pi, the standard deviation can be computed by:

where

Suppose we wished to find the standard deviation of the distribution placing probabilities 1⁄4, 1⁄2, and 1⁄4 on the points in the sample space 3, 7, and 19. Step 1: find the probability-weighted mean

Step 2: find the deviation of each value in the sample space from the mean,

Step 3: square each of the deviations, which amplifies large deviations and makes negative values positive,

Step 4: find the probability-weighted mean of the squared deviations,

Step 5: take the positive square root of the quotient (converting squared units back to regular units),

So, the standard deviation of the set is 6. This example also shows that, in general, the standard deviation is different from the mean absolute deviation (which is 5 in this example).

85

Standard deviation

Rapid calculation methods The following two formulas can represent a running (continuous) standard deviation. A set of three power sums s0, s1, s2 are each computed over a set of N values of x, denoted as xk.

Note that s0 raises x to the zero power, and since x0 is always 1, s0 evaluates to N. Given the results of these three running summations, the values s0, s1, s2 can be used at any time to compute the current value of the running standard deviation. This definition for sj can represent the two different phases (summation computation sj, and σ calculation).

Similarly for sample standard deviation,

In a computer implementation, as the three sj sums become large, we need to consider round-off error, arithmetic overflow, and arithmetic underflow. The method below calculates the running sums method with reduced rounding errors:

where A is the mean value.

Sample variance:

Standard variance:

Weighted calculation When the values xi are weighted with unequal weights wi, the power sums s0, s1, s2 are each computed as:

And the standard deviation equations remain unchanged. Note that s0 is now the sum of the weights and not the number of samples N. The incremental method with reduced rounding errors can also be applied, with some additional complexity. A running sum of weights must be computed:

and places where 1/i is used above must be replaced by wi/Wi:

86

Standard deviation In the final division,

and

where n is the total number of elements, and n' is the number of elements with non-zero weights. The above formulas become equal to the simpler formulas given above if weights are taken as equal to 1.

Combining standard deviations Population-based statistics The populations of sets, which may overlap, can be calculated simply as follows:

Standard deviations of non-overlapping (X ∩ Y = ∅) sub-populations can be aggregated as follows if the size (actual or relative to one another) and means of each are known:

For example, suppose it is known that the average American man has a mean height of 70 inches with a standard deviation of 3 inches and that the average American woman has a mean height of 65 inches with a standard deviation of 2 inches. Also assume that the number of men, N, is equal to the number of woman. Then the mean and standard deviation of heights of American adults could be calculated as:

For the more general M non-overlapping data sets X1 through XM: where

If the size (actual or relative to one another), mean, and standard deviation of two overlapping populations are known for the populations as well as their intersection, then the standard deviation of the overall population can still be calculated as follows: If two or more sets of data are being added in a pairwise fashion, the standard deviation can be calculated if the covariance between the each pair of data sets is known.

For the special case where no correlation exists between all pairs of data sets, then the relation reduces to the root-mean-square:

87

Standard deviation

88

Sample-based statistics Standard deviations of non-overlapping, means of each are known:

, sub-samples can be aggregated as follows if the actual size and

For the more general M non-overlapping data sets,

:

where:

If the size, mean, and standard deviation of two overlapping samples are known for the samples as well as their intersection, then the standard deviation of the samples can still be calculated. In general:

See also •

Accuracy and precision



Median



An inequality on location and scale parameters •

Pooled standard deviation



Cumulant



Raw score



Deviation (statistics)



Root mean square



Distance standard deviation



Sample size



Error bar



Samuelson's inequality



Geometric standard deviation



Saturation (color theory)



Kurtosis



Skewness



Mean absolute error



Unbiased estimation of standard deviation



Variance



Volatility (finance)



Yamartino method for calculating standard deviation of wind direction

External links • A Guide to Understanding & Calculating Standard Deviation [5] • C++ Source Code [6] (license free) C++ implementation of rapid mean, variance and standard deviation calculation • Interactive Demonstration and Standard Deviation Calculator [7] • Standard Deviation – an explanation without maths [8] • Standard Deviation, an elementary introduction [9] • Standard Deviation, a simpler explanation for writers and journalists [10] • Standard Deviation Calculator [11] • Texas A&M Standard Deviation and Confidence Interval Calculators [12] • The concept of Standard Deviation is shown in this 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. [13] from Index Funds Advisors IFA.com [14]

Standard deviation

References [1] Dodge, Yadolah (2003). The Oxford Dictionary of Statistical Terms. Oxford University Press. ISBN 0-19-920613-9. [2] Pearson, Karl (1894). "On the dissection of asymmetrical frequency curves". Phil. Trans. Roy. Soc. London, Series A 185: 719–810. [3] Miller, Jeff. "Earliest Known Uses of Some of the Words of Mathematics" (http:/ / jeff560. tripod. com/ mathword. html). . [4] Ghahramani, Saeed (2000). Fundamentals of Probability (2nd Edition). Prentice Hall: New Jersey. p. 438. [5] http:/ / stats4students. com/ measures-of-spread-3. php [6] http:/ / www. chrisevansdev. com/ rapidlive-statistics/ [7] http:/ / www. usablestats. com/ tutorials/ StandardDeviation [8] http:/ / www. techbookreport. com/ tutorials/ stddev-30-secs. html [9] http:/ / davidmlane. com/ hyperstat/ A16252. html [10] http:/ / www. robertniles. com/ stats/ stdev. shtml [11] http:/ / invsee. asu. edu/ srinivas/ stdev. html [12] http:/ / www. stat. tamu. edu/ ~jhardin/ applets/ [13] http:/ / www. youtube. com/ watch?v=AUSKTk9ENzg [14] http:/ / www. ifa. com

Random variable In mathematics, a random variable (or stochastic variable) is (in general) a measurable function that maps a probability space into a measurable space. Random variables mapping all possible outcomes of an event into the real numbers are frequently studied in elementary statistics and used in the sciences to make predictions based on data obtained from scientific experiments. In addition to scientific applications, random variables were developed for the analysis of games of chance and stochastic events. While the above definition of a random variable requires a familiarity with measure theory to appreciate, the language and structure of random variables can be grasped at various levels of mathematical fluency through limiting the variables one considers. Beyond the introductory level, however, set theory and calculus are fundamental to their study. The concept of a random variable is closely linked to the term "random variate": a random variate is a particular outcome (value) of a random variable. There are two types of random variables: discrete and continuous.[1] A discrete random variable maps events to values of a countable set (e.g., the integers), with each value in the range having probability greater than or equal to zero. A continuous random variable maps events to values of an uncountable set (e.g., the real numbers). For a continuous random variable, the probability of any specific value is zero, whereas the probability of some infinite set of values (such as an interval of non-zero length) may be positive. A random variable can be "mixed", with part of its probability spread out over an interval like a typical continuous variable, and part of it concentrated on particular values like a discrete variable. These classifications are equivalent to the categorisation of probability distributions. A random variable has an associated probability distribution and frequently also a probability density function. Probability density functions are commonly used for continuous variables.

89

Random variable

Intuitive description In the simplest case, a random variable maps events to real numbers. A random variable can be thought of as a function mapping the sample space of a random process to a set of numbers or quantifiable labels.

Examples For a coin toss, the possible events are heads or tails. The possible outcomes for one fair coin toss can be described using the following random variable:

and if the coin is equally likely to land on either side then it has a probability mass function given by:

It is sometimes convenient to model this situation using a random variable which takes numbers as its values, rather than the values head and tail. This can be done by using the real random variable defined as follows:

and if the coin is equally likely to land on either side then it has a probability mass function given by:

A random variable can also be used to describe the process of rolling a fair die and the possible outcomes. The most obvious representation is to take the set {1, 2, 3, 4, 5, 6} as the sample space, defining the random variable X as the number rolled. In this case,

An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North West, East South East, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, X = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any range of values. For example, the probability of choosing a number in [0, 180] is ½. Instead of speaking of a probability mass function, we say that the probability density of X is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating

90

Random variable

91

the density over the given set. An example of a random variable of mixed type would be based on an experiment where a coin is flipped and the spinner is spun only if the result of the coin toss is heads. If the result is tails, X = −1; otherwise X = the value of the spinner as in the preceding example. There is a probability of ½ that this random variable will have the value −1. Other ranges of values would have half the probability of the last example.

Non-real-valued form Very commonly a random variable takes values which are numbers. This is by no means always so; one can consider random variables of any type. This often includes vector-valued random variables or complex-valued random variables, but in general can include arbitrary types such as sequences, sets, shapes, manifolds, matrices, and functions.

Formal definition Let (Ω, ℱ, P) be a probability space, and (E, ℰ) a measurable space. Then an (E, ℰ)-valued random variable is a function X: Ω→E, which is (ℱ, ℰ)-measurable. That is, such function that for every subset B ∈ ℰ, its preimage lies in ℱ:  X −1(B) ∈ ℱ, where X −1(B) = {ω: X(ω) ∈ B}.[2] When E is a topological space, then the most common choice for the σ-algebra ℰ is to take it equal to the Borel σ-algebra ℬ(E), which is the σ-algebra generated by the collection of all open sets in E. In such case the (E, ℰ)-valued random variable is called the E-valued random variable. Moreover, when space E is the real line ℝ, then such real-valued random variable is called simply the random variable. The meaning of this definition is following: suppose (Ω, ℱ, P) is the underlying probability space, whereas we want to consider a probability space based on the space E with σ-algebra ℰ. In order to turn the pair (E, ℰ) into a probability space, we need to equip it with some probability function, call it Q. This function would have to assign the probability to each set B in ℰ. If X is some function from Ω to E, then it is natural to postulate that the probability of B must be the same as the probability of its preimage in Ω: Q(B) = P(X −1(B)). In order for this formula to be meaningful, X −1(B) must lie in ℱ, since the probability function P is defined only on ℱ. And this is exactly what the definition of the random variable requires: that X −1(B) ∈ ℱ for every B ∈ ℰ.

Real-valued random variables In this case the observation space is the real numbers with a suitable measure. Recall, space. For real observation space, the function This definition is a special case of the above because

is the probability

is a real-valued random variable if generates the Borel sigma-algebra on

the real numbers, and it is enough to check measurability on a generating set. (Here we are using the fact that .)

Random variable

92

Distribution functions of random variables Associating a cumulative distribution function (CDF) with a random variable is a generalization of assigning a value to a variable. If the CDF is a (right continuous) Heaviside step function then the variable takes on the value at the jump with probability 1. In general, the CDF specifies the probability that the variable takes on particular values. If a random variable

defined on the probability space

"How likely is it that the value of which

is

often

is given, we can ask questions like

is bigger than 2?". This is the same as the probability of the event written

as

for

short,

and

easily

obtained

since

Recording all these probabilities of output ranges of a real-valued random variable X yields the probability distribution of X. The probability distribution "forgets" about the particular probability space used to define X and only records the probabilities of various values of X. Such a probability distribution can always be captured by its cumulative distribution function

and sometimes also using a probability density function. In measure-theoretic terms, we use the random variable X to "push-forward" the measure P on Ω to a measure dF on R. The underlying probability space Ω is a technical device used to guarantee the existence of random variables, and sometimes to construct them. In practice, one often disposes of the space Ω altogether and just puts a measure on R that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables.

Moments The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation. For example, it is often enough to know what its "average value" is. This is captured by the mathematical concept of expected value of a random variable, denoted E[X], and also called the first moment. In general, E[f(X)] is not equal to f(E[X]). Once the "average value" is known, one could then ask how far from this average value the values of X typically are, a question that is answered by the variance and standard deviation of a random variable. E[X] can be viewed intuitively as an average obtained from an infinite population, the members of which are particular evaluations of X. Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X, find a collection {fi} of functions such that the expectation values E[fi(X)] fully characterise the distribution of the random variable X.

Functions of random variables If we have a random variable be a random variable on true if

on

and a Borel measurable function

, then

will also

, since the composition of measurable functions is also measurable. (However, this is not

is Lebesgue measurable.) The same procedure that allowed one to go from a probability space can be used to obtain the distribution of

. The cumulative distribution function of

to

is

If function g is invertible, i.e. g-1 exists, and increasing, then the previous relation can be extended to obtain

and, again with the same hypotheses of invertibility of g, assuming also differentiability, we can find the relation between the probability density functions by differentiating both sides with respect to y, in order to obtain .

Random variable

93

If there is no invertibility of g but each y admits at most a countable number of roots (i.e. a finite, or countably infinite, number of xi such that y = g(xi)) then the previous relation between the probability density functions can be generalized with

where xi = gi-1(y). The formulas for densities do not demand g to be increasing.

Example 1 Let X be a real-valued, continuous random variable and let Y = X2. If y < 0, then P(X2 ≤ y) = 0, so

If y ≥ 0, then

so

Example 2 Suppose

where

is a random variable with a cumulative distribution

is a fixed parameter. Consider the random variable

Then,

The last expression can be calculated in terms of the cumulative distribution of

so

Equivalence of random variables There are several different senses in which random variables can be considered to be equivalent. Two random variables can be equal, equal almost surely, equal in mean, or equal in distribution. In increasing order of strength, the precise definition of these notions of equivalence is given below.

Equality in distribution If the sample space is a subset of the real line a possible definition is that random variables X and Y are equal in distribution if they have the same distribution functions:

Two random variables having equal moment generating functions have the same distribution. This provides, for example, a useful method of checking equality of certain functions of i.i.d. random variables. However, the moment generating function exists only for distributions that are good enough.

Random variable

Almost sure equality Two random variables X and Y are equal almost surely if, and only if, the probability that they are different is zero:

For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is associated to the following distance:

where "ess sup" represents the essential supremum in the sense of measure theory.

Equality Finally, the two random variables X and Y are equal if they are equal as functions on their probability space, that is,

Convergence Much of mathematical statistics consists in proving convergence results for certain sequences of random variables; see for instance the law of large numbers and the central limit theorem. There are various senses in which a sequence (Xn) of random variables can converge to a random variable X. These are explained in the article on convergence of random variables.

See also • • • • • • • • • • •

Observable variable Probability distribution Algebra of random variables Multivariate random variable Event (probability theory) Randomness Random element Random vector Random function Random measure Stochastic process

References [1] Rice, John (1999). Mathematical Statistics and Data Analysis. Duxbury Press. ISBN 0534209343. [2] Fristedt & Gray (1996, page 11)

Literature • Fristedt, Bert; Gray, Lawrence (1996). A modern approach to probability theory. Boston: Birkhäuser. ISBN 3-7643-3807-5. • Kallenberg, O., Random Measures, 4th edition. Academic Press, New York, London; Akademie-Verlag, Berlin (1986). MR0854102 ISBN 0-12-394960-2 • Kallenberg, O., Foundations of Modern Probability, 2nd edition. Springer-Verlag, New York, Berlin, Heidelberg (2001). ISBN 0-387-95313-2 • Papoulis, Athanasios 1965 Probability, Random Variables, and Stochastic Processes. McGraw–Hill Kogakusha, Tokyo, 9th edition, ISBN 0-07-119981-0.

94

Random variable

95

This article incorporates material from Random variable on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.

Probability distribution In probability theory and statistics, a probability distribution identifies either the probability of each value of a random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous).[1] The probability distribution describes the range of possible values that a random variable can attain and the probability that the value of the random variable is within any (measurable) subset of that range. When the random variable takes values in the set of real numbers, the probability distribution is completely described by the cumulative distribution function, whose value at each real x is the probability that the random variable is smaller than or equal to x. The concept of the probability distribution and the random variables The Normal distribution, often called the "bell curve". which they describe underlies the mathematical discipline of probability theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are made with some intrinsic error; in physics many processes are described probabilistically, from the kinetic properties of gases to the quantum mechanical description of fundamental particles. For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate. There are various probability distributions that show up in various different applications. One of the more important ones is the normal distribution, which is also known as the Gaussian distribution or the bell curve and approximates many different naturally occurring distributions. The toss of a fair coin yields another familiar distribution, where the possible values are heads or tails, each with probability 1/2.

Formal definition In the measure-theoretic formalization of probability theory, a random variable is defined as a measurable function X from a probability space to measurable space . A probability distribution is the pushforward measure X*P = PX −1 on

.

Probability distributions of real-valued random variables Because a probability distribution Pr on the real line is determined by the probability of a real-valued random variable X being in a half-open interval (-∞, x], the probability distribution is completely characterized by its cumulative distribution function:

Probability distribution

96

Discrete probability distribution A probability distribution is called discrete if its cumulative distribution function only increases in jumps. More precisely, a probability distribution is discrete if there is a finite or countable set whose probability is 1. For many familiar discrete distributions, the set of possible values is topologically discrete in the sense that all its points are isolated points. But, there are discrete distributions for which this countable set is dense on the real line. Discrete distributions are characterized by a probability mass function,

such that

Continuous probability distribution By one convention, a probability distribution

is called continuous if its cumulative distribution function

is continuous and, therefore, the probability measure of singletons

for all

.

Another convention reserves the term continuous probability distribution for absolutely continuous distributions. These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

Terminology The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be understood as the points or elements that are actual members of the distribution. A discrete random variable is a random variable whose probability distribution is discrete. Similarly, a continuous random variable is a random variable whose probability distribution is continuous.

Simulated sampling The following algorithm lets one sample from a probability distribution (either discrete or continuous). This algorithm assumes that one has access to the inverse of the cumulative distribution (easy to calculate with a discrete distribution, can be approximated for continuous distributions) and a computational primitive called "random()" which returns an arbitrary-precision floating-point-value in the range of [0,1). define function sampleFrom(cdfInverse (type="function")): // input: //

cdfInverse(x) - the inverse of the CDF of the probability distribution

//

example: if distribution is [[Gaussian]], one can use a [[Taylor approximation]] of the inverse of [[erf]](x)

//

example: if distribution is discrete, see explanation below pseudocode

// output: //

type="real number" - a value sampled from the probability distribution represented by cdfInverse

r = random()

while(r == 0):

(make sure r is not equal to 0; discontinuity possible)

r = random()

return cdfInverse(r)

Probability distribution For discrete distributions, the function cdfInverse (inverse of cumulative distribution function) can be calculated from samples as follows: for each element in the sample range (discrete values along the x-axis), calculating the total samples before it. Normalize this new discrete distribution. This new discrete distribution is the CDF, and can be turned into an object which acts like a function: calling cdfInverse(query) returns the smallest x-value such that the CDF is greater than or equal to the query. define function dataToCdfInverse(discreteDistribution (type="dictionary")) // input: //

discreteDistribution - a mapping from possible values to frequencies/probabilities

//

example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]] with chance=p

//

example: setting p=0.5 in the above example, this is a [[fair coin]] where P(X=1)->"heads" and P(X=0)->"tails"

// output: //

type="function" - a function that represents (CDF^-1)(x)

define function cdfInverse(x): integral = 0 go through mapping (key->value) in sorted order, adding value to integral... stop when integral > x (or integral >= x, doesn't matter) return last key we added

return cdfInverse

Note that often, mathematics environments and computer algebra systems will have some way to represent probability distributions and sample from them. This functionality might even have been developed in third-party libraries. Such packages greatly facilitate such sampling, most likely have optimizations for common distributions, and are likely to be more elegant than the above bare-bones solution.

Some properties • The probability density function of the sum of two independent random variables is the convolution of each of their density functions. • The probability density function of the difference of two independent random variables is the cross-correlation of their density functions. • Probability distributions are not a vector space – they are not closed under linear combinations, as these do not preserve non-negativity or total integral 1 – but they are closed under convex combination, thus forming a convex subset of the space of functions (or measures).

97

Probability distribution

98

See also • • •

Copula (statistics) • Cumulative distribution function Histogram • •

Inverse transform sampling Likelihood function List of statistical topics

• • •

Probability density function Random variable Riemann–Stieltjes integral application to probability theory

External links • An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. [13] from Index Funds Advisors IFA.com [14], youtube.com • Interactive Discrete and Continuous Probability Distributions [2], socr.ucla.edu • A Compendium of Common Probability Distributions [3] • A Compendium of Distributions [4], vosesoftware.com • Statistical Distributions - Overview [5], xycoon.com • Probability Distributions [6] in Quant Equation Archive, sitmo.com • A Probability Distribution Calculator [7], covariable.com • Sourceforge.net [8], Distribution Explorer: a mixed C++ and C# Windows application that allows you to explore the properties of 20+ statistical distributions, and calculate CDF, PDF & quantiles. Written using open-source C++ from the Boost.org [9] Math Toolkit library. • Explore different probability distributions and fit your own dataset online - interactive tool [10], xjtek.com

References [1] Everitt, B.S. (2006) The Cambridge Dictionary of Statistics, Third Edition. pp. 313–314. Cambridge University Press, Cambridge. ISBN 0521690277 [2] http:/ / www. socr. ucla. edu/ htmls/ SOCR_Distributions. html [3] http:/ / www. causascientia. org/ math_stat/ Dists/ Compendium. pdf [4] http:/ / www. vosesoftware. com/ content/ ebook. pdf [5] http:/ / www. xycoon. com/ contdistroverview. htm [6] http:/ / www. sitmo. com/ eqcat/ 8 [7] http:/ / www. covariable. com/ continuous. html [8] http:/ / sourceforge. net/ projects/ distexplorer/ [9] http:/ / www. boost. org [10] http:/ / www. xjtek. com/ anylogic/ demo_models/ 111/

Real number

Real number In computing, 'real number' often refers to non-complex floating-point numbers. In mathematics, the real numbers include both rational numbers, such as 42 and −23/129, and irrational numbers, such as pi and the square root of two. A real number can be given by an infinite decimal representation, such as 2.4871773339..., where the digits continue indefinitely. The real numbers are sometimes thought of as points on an infinitely long number line. Symbol often used to These descriptions of the real numbers, while intuitively accessible, are not sufficiently denote the set of real rigorous for the purposes of pure mathematics. The discovery of a suitably rigorous numbers definition of the real numbers—indeed, the realization that a better definition was needed—was one of the most important developments of 19th century mathematics. Popular definitions in use today include equivalence classes of Cauchy sequences of rational numbers; Dedekind cuts; a more sophisticated version of "decimal representation"; and an axiomatic definition of the real numbers as the unique complete Archimedean ordered field. These definitions are all described in detail below.

Basic properties A real number may be either rational or Real numbers can be thought of as points on an infinitely long number line. irrational; either algebraic or transcendental; and either positive, negative, or zero. Real numbers are used to measure continuous quantities. They may in theory be expressed by decimal representations that have an infinite sequence of digits to the right of the decimal point; these are often represented in the same form as 324.823122147… The ellipsis (three dots) indicate that there would still be more digits to come. More formally, real numbers have the two basic properties of being an ordered field, and having the least upper bound property. The first says that real numbers comprise a field, with addition and multiplication as well as division by nonzero numbers, which can be totally ordered on a number line in a way compatible with addition and multiplication. The second says that if a nonempty set of real numbers has an upper bound, then it has a least upper bound. These two together define the real numbers completely, and allow its other properties to be deduced. For instance, we can prove from these properties that every polynomial of odd degree with real coefficients has a real root, and that if you add the square root of −1 to the real numbers, obtaining the complex numbers, the resulting field is algebraically closed.

Uses In the physical sciences, most physical constants such as the universal gravitational constant, and physical variables, such as position, mass, speed, and electric charge, are modeled using real numbers. Note importantly, however, that all actual measurements of physical quantities yield rational numbers because the precision of such measurements can only be finite. Computers cannot directly operate on real numbers, but only on a finite subset of rational numbers, limited by the number of bits used to store them. However, computer algebra systems are able to treat some irrational numbers exactly by storing their algebraic description (such as "sqrt(2)") rather than their rational approximation.[1] A real number is said to be computable if there exists an algorithm that yields its digits. Because there are only countably many algorithms, but an uncountable number of reals, "most" real numbers fail to be computable. Some constructivists accept the existence of only those reals that are computable. The set of definable numbers is broader, but still only countable. If computers could use unlimited precision real numbers (real computation), then one could

99

Real number

100

solve NP-complete problems, and even #P-complete problems in polynomial time, answering affirmatively the P = NP problem. Unlimited precision real numbers in the physical universe are prohibited by the holographic principle and the Bekenstein bound.[2] Mathematicians use the symbol R (or alternatively,

, the letter "R" in blackboard bold, Unicode ℝ) to represent

n

the set of all real numbers. The notation R refers to an n-dimensional space with real coordinates; for example, a value from R3 consists of three real numbers and specifies a location in 3-dimensional space. In mathematics, real is used as an adjective, meaning that the underlying field is the field of real numbers. For example real matrix, real polynomial and real Lie algebra. As a substantive, the term is used almost strictly in reference to the real numbers themselves (e.g., The "set of all reals").

History Vulgar fractions had been used by the Egyptians around 1000 BC; the Vedic "Sulba Sutras" ("The rules of chords") in, ca. 600 BC, include what may be the first 'use' of irrational numbers. The concept of irrationality was implicitly accepted by early Indian mathematicians since Manava (c. 750–690 BC), who was aware that the square roots of certain numbers such as 2 and 61 could not be exactly determined.[3] Around 500 BC, the Greek mathematicians led by Pythagoras realized the need for irrational numbers, in particular the irrationality of the square root of 2. The Middle Ages saw the acceptance of zero, negative, integral and fractional numbers, first by Indian and Chinese mathematicians, and then by Arabic mathematicians, who were also the first to treat irrational numbers as algebraic objects,[4] which was made possible by the development of algebra. Arabic mathematicians merged the concepts of "number" and "magnitude" into a more general idea of real numbers.[5] The Egyptian mathematician Abū Kāmil Shujā ibn Aslam (c. 850–930) was the first to accept irrational numbers as solutions to quadratic equations or as coefficients in an equation, often in the form of square roots, cube roots and fourth roots.[6] In the 16th century, Simon Stevin created the basis for modern decimal notation, and insisted that there is no difference between rational and irrational numbers in this regard. In the 18th and 19th centuries there was much work on irrational and transcendental numbers. Lambert (1761) gave the first flawed proof that π cannot be rational; Legendre (1794) completed the proof, and showed that π is not the square root of a rational number. Ruffini (1799) and Abel (1842) both constructed proofs of Abel–Ruffini theorem: that the general quintic or higher equations cannot be solved by a general formula involving only arithmetical operations and roots. Évariste Galois (1832) developed techniques for determining whether a given equation could be solved by radicals, which gave rise to the field of Galois theory. Joseph Liouville (1840) showed that neither e nor e2 can be a root of an integer quadratic equation, and then established existence of transcendental numbers, the proof being subsequently displaced by Georg Cantor (1873). Charles Hermite (1873) first proved that e is transcendental, and Ferdinand von Lindemann (1882), showed that π is transcendental. Lindemann's proof was much simplified by Weierstrass (1885), still further by David Hilbert (1893), and has finally been made elementary by Hurwitz and Paul Albert Gordan. The development of calculus in the 18th century used the entire set of real numbers without having defined them cleanly. The first rigorous definition was given by Georg Cantor in 1871. In 1874 he showed that the set of all real numbers is uncountably infinite but the set of all algebraic numbers is countably infinite. Contrary to widely held beliefs, his first method was not his famous diagonal argument, which he published in 1891. See Cantor's first uncountability proof.

Real number

Definition Construction from the rational numbers The real numbers can be constructed as a completion of the rational numbers in such a way that a sequence defined by a decimal or binary expansion like {3, 3.1, 3.14, 3.141, 3.1415,...} converges to a unique real number. For details and other constructions of real numbers, see construction of the real numbers.

Axiomatic approach Let R denote the set of all real numbers. Then: • The set R is a field, meaning that addition and multiplication are defined and have the usual properties. • The field R is ordered, meaning that there is a total order ≥ such that, for all real numbers x, y and z: • if x ≥ y then x + z ≥ y + z; • if x ≥ 0 and y ≥ 0 then xy ≥ 0. • The order is Dedekind-complete; that is, every non-empty subset S of R with an upper bound in R has a least upper bound (also called supremum) in R. The last property is what differentiates the reals from the rationals. For example, the set of rationals with square less than 2 has a rational upper bound (e.g., 1.5) but no rational least upper bound, because the square root of 2 is not rational. The real numbers are uniquely specified by the above properties. More precisely, given any two Dedekind-complete ordered fields R1 and R2, there exists a unique field isomorphism from R1 to R2, allowing us to think of them as essentially the same mathematical object. For another axiomatization of R, see Tarski's axiomatization of the reals.

Properties Completeness The main reason for introducing the reals is that the reals contain all limits. More technically, the reals are complete (in the sense of metric spaces or uniform spaces, which is a different sense than the Dedekind completeness of the order in the previous section). This means the following: A sequence (xn) of real numbers is called a Cauchy sequence if for any ε > 0 there exists an integer N (possibly depending on ε) such that the distance |xn − xm| is less than ε for all n and m that are both greater than N. In other words, a sequence is a Cauchy sequence if its elements xn eventually come and remain arbitrarily close to each other. A sequence (xn) converges to the limit x if for any ε > 0 there exists an integer N (possibly depending on ε) such that the distance |xn − x| is less than ε provided that n is greater than N. In other words, a sequence has limit x if its elements eventually come and remain arbitrarily close to x. It is easy to see that every convergent sequence is a Cauchy sequence. An important fact about the real numbers is that the converse is also true: Every Cauchy sequence of real numbers is convergent to a real number. That is, the reals are complete. Note that the rationals are not complete. For example, the sequence (1, 1.4, 1.41, 1.414, 1.4142, 1.41421, ...), where each term adds a digit of the decimal expansion of the positive square root of 2, is Cauchy but it does not converge to a rational number. (In the real numbers, in contrast, it converges to the positive square root of 2.) The existence of limits of Cauchy sequences is what makes calculus work and is of great practical use. The standard numerical test to determine if a sequence has a limit is to test if it is a Cauchy sequence, as the limit is typically not

101

Real number known in advance. For example, the standard series of the exponential function

converges to a real number because for every x the sums

can be made arbitrarily small by choosing N sufficiently large. This proves that the sequence is Cauchy, so we know that the sequence converges even if the limit is not known in advance.

"The complete ordered field" The real numbers are often described as "the complete ordered field", a phrase that can be interpreted in several ways. First, an order can be lattice-complete. It is easy to see that no ordered field can be lattice-complete, because it can have no largest element (given any element z, z + 1 is larger), so this is not the sense that is meant. Additionally, an order can be Dedekind-complete, as defined in the section Axioms. The uniqueness result at the end of that section justifies using the word "the" in the phrase "complete ordered field" when this is the sense of "complete" that is meant. This sense of completeness is most closely related to the construction of the reals from Dedekind cuts, since that construction starts from an ordered field (the rationals) and then forms the Dedekind-completion of it in a standard way. These two notions of completeness ignore the field structure. However, an ordered group (in this case, the additive group of the field) defines a uniform structure, and uniform structures have a notion of completeness (topology); the description in the section Completeness above is a special case. (We refer to the notion of completeness in uniform spaces rather than the related and better known notion for metric spaces, since the definition of metric space relies on already having a characterisation of the real numbers.) It is not true that R is the only uniformly complete ordered field, but it is the only uniformly complete Archimedean field, and indeed one often hears the phrase "complete Archimedean field" instead of "complete ordered field". Since it can be proved that any uniformly complete Archimedean field must also be Dedekind-complete (and vice versa, of course), this justifies using "the" in the phrase "the complete Archimedean field". This sense of completeness is most closely related to the construction of the reals from Cauchy sequences (the construction carried out in full in this article), since it starts with an Archimedean field (the rationals) and forms the uniform completion of it in a standard way. But the original use of the phrase "complete Archimedean field" was by David Hilbert, who meant still something else by it. He meant that the real numbers form the largest Archimedean field in the sense that every other Archimedean field is a subfield of R. Thus R is "complete" in the sense that nothing further can be added to it without making it no longer an Archimedean field. This sense of completeness is most closely related to the construction of the reals from surreal numbers, since that construction starts with a proper class that contains every ordered field (the surreals) and then selects from it the largest Archimedean subfield.

Advanced properties The reals are uncountable; that is, there are strictly more real numbers than natural numbers, even though both sets are infinite. In fact, the cardinality of the reals equals that of the set of subsets (i.e., the power set) of the natural numbers, and Cantor's diagonal argument states that the latter set's cardinality is strictly bigger than the cardinality of N. Since only a countable set of real numbers can be algebraic, almost all real numbers are transcendental. The non-existence of a subset of the reals with cardinality strictly between that of the integers and the reals is known as the continuum hypothesis. The continuum hypothesis can neither be proved nor be disproved; it is independent from

102

Real number the axioms of set theory. The real numbers form a metric space: the distance between x and y is defined to be the absolute value |x − y|. By virtue of being a totally ordered set, they also carry an order topology; the topology arising from the metric and the one arising from the order are identical, but yield different presentations for the topology – in the order topology as intervals, in the metric topology as epsilon-balls. The Dedekind cuts construction uses the order topology presentation, while the Cauchy sequences construction uses the metric topology presentation. The reals are a contractible (hence connected and simply connected), separable metric space of dimension 1, and are everywhere dense. The real numbers are locally compact but not compact. There are various properties that uniquely specify them; for instance, all unbounded, connected, and separable order topologies are necessarily homeomorphic to the reals. Every nonnegative real number has a square root in R, and no negative number does. This shows that the order on R is determined by its algebraic structure. Also, every polynomial of odd degree admits at least one real root: these two properties make R the premier example of a real closed field. Proving this is the first half of one proof of the fundamental theorem of algebra. The reals carry a canonical measure, the Lebesgue measure, which is the Haar measure on their structure as a topological group normalised such that the unit interval [0,1] has measure 1. The supremum axiom of the reals refers to subsets of the reals and is therefore a second-order logical statement. It is not possible to characterize the reals with first-order logic alone: the Löwenheim–Skolem theorem implies that there exists a countable dense subset of the real numbers satisfying exactly the same sentences in first order logic as the real numbers themselves. The set of hyperreal numbers satisfies the same first order sentences as R. Ordered fields that satisfy the same first-order sentences as R are called nonstandard models of R. This is what makes nonstandard analysis work; by proving a first-order statement in some nonstandard model (which may be easier than proving it in R), we know that the same statement must also be true of R.

Generalizations and extensions The real numbers can be generalized and extended in several different directions: • The complex numbers contain solutions to all polynomial equations and hence are an algebraically closed field unlike the real numbers. However, the complex numbers are not an ordered field. • The affinely extended real number system adds two elements +∞ and −∞. It is a compact space. It is no longer a field, not even an additive group; it still has a total order; moreover, it is a complete lattice. • The real projective line adds only one value ∞. It is also a compact space. Again, it is no longer a field, not even an additive group. However, it allows division of a non-zero element by zero. It is not ordered anymore. • The long real line pastes together ℵ1* + ℵ1 copies of the real line plus a single point (here ℵ1* denotes the reversed ordering of ℵ1) to create an ordered set that is "locally" identical to the real numbers, but somehow longer; for instance, there is an order-preserving embedding of ℵ1 in the long real line but not in the real numbers. The long real line is the largest ordered set that is complete and locally Archimedean. As with the previous two examples, this set is no longer a field or additive group. • Ordered fields extending the reals are the hyperreal numbers and the surreal numbers; both of them contain infinitesimal and infinitely large numbers and thus are not Archimedean. • Self-adjoint operators on a Hilbert space (for example, self-adjoint square complex matrices) generalize the reals in many respects: they can be ordered (though not totally ordered), they are complete, all their eigenvalues are real and they form a real associative algebra. Positive-definite operators correspond to the positive reals and normal operators correspond to the complex numbers.

103

Real number

"Reals" in set theory In set theory, specifically descriptive set theory, the Baire space is used as a surrogate for the real numbers since the latter have some topological properties (connectedness) that are a technical inconvenience. Elements of Baire space are referred to as "reals".

See also • • • • • • •

Completeness Continued fraction Limit of a sequence Real analysis Simon Stevin Imaginary number Complex number

References • Georg Cantor, 1874, "Über eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen", Journal für die Reine und Angewandte Mathematik, volume 77, pages 258–262. • Robert Katz, 1964, Axiomatic Analysis, D. C. Heath and Company. • Edmund Landau, 2001, ISBN 0-8218-2693-X, Foundations of Analysis, American Mathematical Society. • Howie, John M., Real Analysis, Springer, 2005, ISBN 1-85233-314-6

External links • The real numbers: Pythagoras to Stevin [7] • The real numbers: Stevin to Hilbert [8] • The real numbers: Attempts to understand [9]

References [1] Cohen, Joel S. (2002). Computer algebra and symbolic computation: elementary algorithms. 1. A K Peters, Ltd.. p. 32. ISBN 9781568811581. [2] Scott Aaronson, NP-complete Problems and Physical Reality (http:/ / arxiv. org/ abs/ quant-ph/ 0502072), ACM SIGACT News, Vol. 36, No. 1. (March 2005), pp. 30–52. [3] T. K. Puttaswamy, "The Accomplishments of Ancient Indian Mathematicians", pp. 410–1, in Selin, Helaine; D'Ambrosio, Ubiratan (2000), Mathematics Across Cultures: The History of Non-western Mathematics, Springer, ISBN 1402002602 [4] O'Connor, John J.; Robertson, Edmund F., "Arabic mathematics: forgotten brilliance?" (http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ Arabic_mathematics. html), MacTutor History of Mathematics archive, University of St Andrews, . [5] Matvievskaya, Galina (1987), "The Theory of Quadratic Irrationals in Medieval Oriental Mathematics", Annals of the New York Academy of Sciences 500: 253–277 [254], doi:10.1111/j.1749-6632.1987.tb37206.x [6] Jacques Sesiano, "Islamic mathematics", p. 148, in Selin, Helaine; D'Ambrosio, Ubiratan (2000), Mathematics Across Cultures: The History of Non-western Mathematics, Springer, ISBN 1402002602 [7] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_1. html [8] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_2. html [9] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_3. html

104

Variance

Variance In probability theory and statistics, the variance is used as one of several descriptors of a distribution. It describes how far values lie from the mean. In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions. While other such approaches have been developed, those based on moments are advantageous in terms of mathematical and computational simplicity. The variance is a parameter describing a theoretical probability distribution, while a sample of data from such a distribution can be used to construct an estimate of this variance: in the simplest cases this estimate can be the sample variance.

Background The variance of a random variable or distribution is the expectation, or mean, of the squared deviation of that variable from its expected value or mean. Thus the variance is a measure of the amount of variation within the values of that variable, taking account of all possible values and their probabilities or weightings (not just the extremes which give the range). For example, a perfect die, when thrown, has expected value (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5, expected absolute deviation 1.5 (the mean of the equally likely absolute deviations (3.5 − 1, 3.5 − 2, 3.5 − 3, 4 − 3.5, 5 − 3.5, 6 − 3.5), giving (2.5, 1.5, 0.5, 0.5, 1.5, 2.5), but expected square deviation or variance of 17.5/6 ≈ 2.9 (the mean of the equally likely squared deviations 2.52, 1.52, 0.52, 0.52, 1.52, 2.52). As another example, if a coin is tossed twice, the number of heads is: 0 with probability 0.25, 1 with probability 0.5 and 2 with probability 0.25. Thus the variance is 0.25 × (0 − 1)2 + 0.5 × (1 − 1)2 + 0.25 × (2 − 1)2 = 0.25 + 0 + 0.25 = 0.5. (Note that in this case, where tosses of coins are independent, the variance is additive, i.e., if the coin is tossed n times, the variance will be 0.25n.) Unlike expected deviation, the variance of a variable has units that are the square of the units of the variable itself. For example, a variable measured in inches will have a variance measured in square inches. For this reason, describing data sets via their standard deviation or root mean square deviation is often preferred over variance. In the dice example the standard deviation is √(17.5/6) ≈ 1.7, slightly larger than the expected deviation of 1.5. The standard deviation and the expected deviation can both be used as an indicator of the "spread" of a distribution. The standard deviation is more amenable to algebraic manipulation, and, together with variance and its generalization covariance, is used frequently in theoretical statistics; however the expected deviation tends to be more robust as it is less sensitive to outliers arising from measurement anomalies or an unduly heavy-tailed distribution. Real-world distributions such as the distribution of yesterday’s rain throughout the day are typically not fully known, unlike the behavior of perfect dice or an ideal distribution such as the normal distribution, because it is impractical to account for every raindrop. Instead one estimates the mean and variance of the whole distribution as the computed mean and variance of n samples drawn suitably randomly from the whole sample space, in this example yesterday’s rainfall. This method of estimation is close to optimal, with the caveat that it underestimates the variance by a factor of (n−1)/n (when n = 1 the variance of a single sample is obviously zero regardless of the true variance), a bias which should be corrected for when n is small. If the mean is determined in some other way than from the same samples used to estimate the variance then this bias does not arise and the variance can safely be estimated as that of the samples. The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant. Just as some distributions do not have a mean, some do not have a variance. The mean exists whenever the variance exists, but not vice versa.

105

Variance

106

Definition If a random variable X has the expected value (mean) μ = E[X], then the variance of X is given by:

This definition encompasses random variables that are discrete, continuous, or neither. It can be expanded as follows:

The variance of random variable X is typically designated as Var(X),

, or simply σ2 (pronounced “sigma

squared”). If a distribution does not have an expected value, as is the case for the Cauchy distribution, it does not have a variance either. Many other distributions for which the expected value does exist do not have a finite variance because the relevant integral diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤ 2.

Continuous case If the random variable X is continuous with probability density function f(x),

where

and where the integrals are definite integrals taken for x ranging over the range of X.

Discrete case If the random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn, then

where . (When such a discrete weighted variance is specified by weights whose sum is not 1, then one divides by the sum of the weights.) That is, it is the expected value of the square of the deviation of X from its own mean. In plain language, it can be expressed as “The mean of the square of the deviation of each data point from the average”. It is thus the mean squared deviation.

Variance

107

Examples Exponential distribution The exponential distribution with parameter λ is a continuous distribution whose support is the semi-infinite interval [0,∞). Its probability density function is given by: and it has expected value μ = λ−1. Therefore the variance is equal to:

So for an exponentially distributed random variable σ2 = μ2.

Fair dice A six-sided fair dice can be modelled with a discrete random variable with outcomes 1 through 6, each with equal probability . The expected value is (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. Therefore the variance can be computed to be:

Properties Variance is non-negative because the squares are positive or zero. The variance of a constant random variable is zero, and the variance of a variable in a data set is 0 if and only if all entries have the same value. Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of the variable, the variance is unchanged. If all values are scaled by a constant, the variance is scaled by the square of that constant. These two properties can be expressed in the following formula:

The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances. This stems from the identity:

and that for uncorrelated variables covariance is zero. In general, for the sum of

variables:

, we have:

or

Suppose that the observations can be partitioned into equal-sized subgroups according to some second variable. Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the means of the subgroups. This property is known as variance decomposition or the law of total variance and plays an important role in the analysis of variance. For example, suppose that a group consists of a subgroup of men and an equally large subgroup of women. Suppose that the men have a mean body length of 180 and that the variance of their lengths is 100. Suppose that the women have a mean length of 160 and that the variance of their lengths is 50. Then the mean of the variances is (100 + 50) / 2 = 75; the variance of the means is the variance of 180, 160 which is 100. Then, for the total group of men and women combined, the variance of the body lengths will be 75 + 100 = 175.

Variance Note that this uses N for the denominator instead of N − 1.    In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size in the computations of the means and variances. The formula is also valid with more than two groups, and even if the grouping variable is continuous.    This formula implies that the variance of the total group cannot be smaller than the mean of the variances of the subgroups. Note, however, that the total variance is not necessarily larger than the variances of the subgroups. In the above example, when the subgroups are analyzed separately, the variance is influenced only by the man-man differences and the woman-woman differences. If the two groups are combined, however, then the men-women differences enter into the variance also. Many computational formulas for the variance are based on this equality: The variance is equal to the mean of the square minus the square of the mean:

For example, if we consider the numbers 1, 2, 3, 4 then the mean of the squares is (1 × 1 + 2 × 2 + 3 × 3 + 4 × 4) / 4 = 7.5. The regular mean of all four numbers is 2.5, so the square of the mean is 6.25. Therefore the variance is 7.5 − 6.25 = 1.25, which is indeed the same result obtained earlier with the definition formulas. Many pocket calculators use an algorithm that is based on this formula and that allows them to compute the variance while the data are entered, without storing all values in memory. The algorithm is to adjust only three variables when a new data value is entered: The number of data entered so far (n), the sum of the values so far (S), and the sum of the squared values so far (SS). For example, if the data are 1, 2, 3, 4, then after entering the first value, the algorithm would have n = 1, S = 1 and SS = 1. After entering the second value (2), it would have n = 2, S = 3 and SS = 5. When all data are entered, it would have n = 4, S = 10 and SS = 30. Next, the mean is computed as M = S / n, and finally the variance is computed as SS / n − M × M. In this example the outcome would be 30 / 4 − 2.5 × 2.5 = 7.5 − 6.25 = 1.25. If the unbiased sample estimate is to be computed, the outcome will be multiplied by n / (n − 1), which yields 1.667 in this example.

Properties, formal Sum of uncorrelated variables (Bienaymé formula) One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum (or the difference) of uncorrelated random variables is the sum of their variances:

This statement is called the Bienaymé formula.[1] and was discovered in 1853. It is often made with the stronger condition that the variables are independent, but uncorrelatedness suffices. So if all the variables have the same variance σ2, then, since division by n is a linear transformation, this formula immediately implies that the variance of their mean is

That is, the variance of the mean decreases when n increases. This formula for the variance of the mean is used in the definition of the standard error of the sample mean, which is used in the central limit theorem.

108

Variance

Sum of correlated variables In general, if the variables are correlated, then the variance of their sum is the sum of their covariances:

(Note: This by definition includes the variance of each variable, since Cov(X,X) = Var(X).) Here Cov is the covariance, which is zero for independent random variables (if it exists). The formula states that the variance of a sum is equal to the sum of all elements in the covariance matrix of the components. This formula is used in the theory of Cronbach's alpha in classical test theory. So if the variables have equal variance σ2 and the average correlation of distinct variables is ρ, then the variance of their mean is

This implies that the variance of the mean increases with the average of the correlations. Moreover, if the variables have unit variance, for example if they are standardized, then this simplifies to

This formula is used in the Spearman-Brown prediction formula of classical test theory. This converges to ρ if n goes to infinity, provided that the average correlation remains constant or converges too. So for the variance of the mean of standardized variables with equal correlations or converging average correlation we have

Therefore, the variance of the mean of a large number of standardized variables is approximately equal to their average correlation. This makes clear that the sample mean of correlated variables does generally not converge to the population mean, even though the Law of large numbers states that the sample mean will converge for independent variables.

Weighted sum of variables The scaling property and the Bienaymé formula, along with this property from the covariance page: Cov(aX, bY) = ab Cov(X, Y) jointly imply that

This implies that in a weighted sum of variables, the variable with the largest weight will have a disproportionally large weight in the variance of the total. For example, if X and Y are uncorrelated and the weight of X is two times the weight of Y, then the weight of the variance of X will be four times the weight of the variance of Y.

Decomposition The general formula for variance decomposition or the law of total variance is: If X and Y are two random variables and the variance of X exists, then

Here, E(X|Y) is the conditional expectation of X given Y, and Var(X|Y) is the conditional variance of X given Y. (A more intuitive explanation is that given a particular value of Y, then X follows a distribution with mean E(X|Y) and variance Var(X|Y). The above formula tells how to find Var(X) based on the distributions of these two quantities when Y is allowed to vary.) This formula is often applied in analysis of variance, where the corresponding formula is

It is also used in linear regression analysis, where the corresponding formula is

109

Variance

110

This can also be derived from the additivity of variances, since the total (observed) score is the sum of the predicted score and the error score, where the latter two are uncorrelated.

Computational formula The computational formula for the variance follows in a straightforward manner from the linearity of expected values and the above definition:

This is often used to calculate the variance in practice, although it suffers from catastrophic cancellation if the two components of the equation are similar in magnitude.

Characteristic property The second moment of a random variable attains the minimum value when taken around the first moment (i.e., mean) of the random variable, i.e. . Conversely, if a continuous function satisfies

for all random variables X, then it is necessarily of the form , where a > 0. This also holds in the multidimensional case.[2]

Calculation from the CDF The population variance for a non-negative random variable can be expressed in terms of the cumulative distribution function F using

where H(u) = 1 − F(u) is the right tail function. This expression can be used to calculate the variance in situations where the CDF, but not the density, can be conveniently expressed.

Approximating the variance of a function The delta method uses second-order Taylor expansions to approximate the variance of a function of one or more random variables: see Taylor expansions for the moments of functions of random variables. For example, the approximate variance of a function of one variable is given by

provided that f is twice differentiable and that the mean and variance of X are finite.

Variance

111

Population variance and sample variance In general, the population variance of a finite population of size N is given by

where

is the population mean. In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population. A common task is to estimate the variance of a population from a sample.[3] We take a sample with replacement of n values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample. There are several good estimators. Two of them are well known: and

[4]

Both are referred to as sample variance. Here,

denotes the sample mean:

The two estimators only differ slightly as can be seen, and for larger values of the sample size n the difference is negligible. While the first one may be seen as the variance of the sample considered as a population, the second one is the unbiased estimator of the population variance, meaning that its expected value E[s2] is equal to the true variance of the sampled random variable; the use of the term n − 1 is called Bessel's correction. The sample variance with n − 1 is a U-statistic for the function ƒ(x1, x2) = (x1 − x2)2/2, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.

Variance

112

While,

Distribution of the sample variance Being a function of random variables, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that yi are independent observations from a normal distribution, Cochran's theorem shows that s2 follows a scaled chi-square distribution:

As a direct consequence, it follows that E(s2)  = σ2. If the yi are independent and identically distributed, but not necessarily normally distributed, then

where κ is the kurtosis of the distribution. If the conditions of the law of large numbers hold, s2 is a consistent estimator of σ2.

Generalizations If

is a vector-valued random variable, with values in

generalization of variance is

, and thought of as a column vector, then the natural

, where

and

is the transpose of

, and

so is a row vector. This variance is a positive semi-definite square matrix, commonly referred to as the covariance matrix. If where

is a complex-valued random variable, with values in is the conjugate transpose of

, then its variance is

. This variance is also a positive semi-definite square matrix.

,

Variance

113

History The term variance was first introduced by Ronald Fisher in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance:[5] The great body of available statistics show us that the deviations of a human measurement from its mean follow very closely the Normal Law of Errors, and, therefore, that the variability may be uniformly measured by the standard deviation corresponding to the square root of the mean square error. When there are two independent causes of variability capable of producing in an otherwise uniform population distributions with standard deviations and , it is found that the distribution, when both causes act together, has a standard deviation

. It is therefore desirable in analysing the causes of

variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance...

Moment of inertia The variance of a probability distribution is analogous to the moment of inertia in classical mechanics of a corresponding mass distribution along a line, with respect to rotation about its center of mass. It is because of this analogy that such things as the variance are called moments of probability distributions. The covariance matrix is related to the moment of inertia tensor for multivariate distributions. The moment of inertia of a cloud of n points with a covariance matrix of is given by

This difference between moment of inertia in physics and in statistics is clear for points that are gathered along a line. Suppose many points are close to the x and distributed along it. The covariance matrix might look like

That is, there is the most variance in the x direction. However, physicists would consider this to have a low moment about the x axis so the moment-of-inertia tensor is

See also • • • • • • • • • • • • •

Algorithms for calculating variance An inequality on location and scale parameters Average absolute deviation Bhatia–Davis inequality Covariance Chebyshev's inequality Distance variance Estimation of covariance matrices Explained variance & unexplained variance Kurtosis Mean absolute error Mean difference Popoviciu's inequality on variances

Variance • • • • • •

Qualitative variation Sample mean and covariance Semivariance Skewness Standard deviation Weighted sample variance

External links • A Guide to Understanding & Calculating Variance [6] • Fisher's original paper [7] (pdf format) • A tutorial on Analysis of Variance devised for first-year Oxford University students [8]

References [1] Michel Loeve, "Probability Theory", Graduate Texts in Mathematics, Volume 45, 4th edition, Springer-Verlag, 1977, p. 12. [2] A. Kagan and L. A. Shepp, "Why the variance?", Statistics and Probability Letters, Volume 38, Number 4, 1998, pp. 329–333. (online (http:/ / dx. doi. org/ 10. 1016/ S0167-7152(98)00041-8)) [3] William Navidi, Statistics for Engineers and Scientists (2006), McGraw-Hill, pg 14. [4] Montgomery, D.C. and Runger, G.C.:Applied statistics and probability for engineers, page 201. John Wiley & Sons New York, 1994. [5] Ronald Fisher (1918) The correlation between relatives on the supposition of Mendelian Inheritance (http:/ / www. library. adelaide. edu. au/ digitised/ fisher/ 9. pdf) [6] http:/ / www. stats4students. com/ Essentials/ Measures-Of-Spread/ Overview_3. php [7] http:/ / www. library. adelaide. edu. au/ digitised/ fisher/ 9. pdf [8] http:/ / www. celiagreen. com/ charlesmccreery/ statistics/ anova. pdf

114

Probability density function

Probability density function In probability theory, a probability density function (abbreviated as pdf, or just density) of an absolutely continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point in the observation space. The probability for a random variable to fall within a given set is given by the integral of its density over the set. The terms  “probability distribution function”[1] and  “probability [2] function” have also been used to denote the probability density function. However, special care should be taken around this usage since it is not standard among probabilists and statisticians. In other sources,  “probability distribution function” may be used when the probability Boxplot and probability density function (pdf) of a gaussian probability distribution N(0, σ2). distribution is defined as a function over general sets of values, or it may refer to the cumulative distribution function, or it may be a probability mass function rather than the density.

Absolutely continuous univariate distributions A probability density function is most commonly associated with absolutely continuous univariate distributions. A random variable X has density ƒ, where ƒ is a non-negative Lebesgue-integrable function, if:

Hence, if F is the cumulative distribution function of X, then:

and (if ƒ is continuous at x)

Intuitively, one can think of ƒ(x) dx as being the probability of X falling within the infinitesimal interval [x, x + dx].

115

Probability density function

Formal definition This definition may be extended to any probability distribution using the measure-theoretic definition of probability. A random variable X has probability distribution X∗P: the density of X with respect to a reference measure μ is the Radon–Nikodym derivative:

That is, ƒ is any function with the property that:

for any measurable set A.

Discussion In the continuous univariate case above, the reference measure is the Lebesgue measure. The probability mass function of a discrete random variable is the density with respect to the counting measure over the sample space (usually the set of integers, or some subset thereof). Note that it is not possible to define a density with reference to an arbitrary measure (i.e. one can't choose the counting measure as a reference for a continuous random variable). Furthermore, when it does exist, the density is almost everywhere unique.

Further details For example, the uniform on the interval [0, 1] distribution has probability density ƒ(x) = 1 for 0 ≤ x ≤ 1 and ƒ(x) = 0 elsewhere. The standard normal distribution has probability density

If a random variable X is given and its distribution admits a probability density function ƒ, then the expected value of X (if it exists) can be calculated as

Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point. A distribution has a density function if and only if its cumulative distribution function F(x) is absolutely continuous. In this case: F is almost everywhere differentiable, and its derivative can be used as probability density:

If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds for finite and countable sets. Two probability densities ƒ and g represent the same probability distribution precisely if they differ only on a set of Lebesgue measure zero. In the field of statistical physics, a non-formal reformulation of the relation above between the derivative of the cumulative distribution function and the probability density function is generally used as the definition of the probability density function. This alternate definition is the following:

116

Probability density function If dt is an infinitely small number, the probability that X is included within the interval (t, t + dt) is equal to ƒ(t) dt, or:

Link between discrete and continuous distributions It is possible to represent certain discrete random variables as well as random variables involving both a continuous and a discrete part with a generalized probability density function, by using the Dirac delta function. For example, let us consider a binary discrete random variable taking −1 or 1 for values, with probability ½ each. The density of probability associated with this variable is:

More generally, if a discrete variable can take n different values among real numbers, then the associated probability density function is:

where x1, …, xn are the discrete values accessible to the variable and p1, …, pn are the probabilities associated with these values. This substantially unifies the treatment of discrete and continuous probability distributions. For instance, the above expression allows for determining statistical characteristics of such a discrete variable (such as its mean, its variance and its kurtosis), starting from the formulas given for a continuous distribution.

Densities associated with multiple variables For continuous random variables X1, …, Xn, it is also possible to define a probability density function associated to the set as a whole, often called joint probability density function. This density function is defined as a function of the n variables, such that, for any domain D in the n-dimensional space of the values of the variables X1, …, Xn, the probability that a realisation of the set variables falls inside the domain D is If F(x1, …, xn) = Pr(X1 ≤ x1, …, Xn ≤ xn) is the cumulative distribution function of the vector (X1, …, Xn), then the joint probability density function can be computed as a partial derivative

Marginal densities For i=1, 2, …,n, let ƒXi(xi) be the probability density function associated to variable Xi alone. This is called the “marginal” density function, and can be deduced from the probability densities associated of the random variables X1, …, Xn by integrating on all values of the n − 1 other variables:

Independence Continuous random variables X1, …, Xn admitting a joint density are all independent from each other if and only if

117

Probability density function

118

Corollary If the joint probability density function of a vector of n random variables can be factored into a product of n functions of one variable

(where each fi is not necessarily a density) then the n variables in the set are all independent from each other, and the marginal probability density function of each of them is given by

Example This elementary example illustrates the above definition of multidimensional probability density functions in the simple case of a function of a set of two variables. Let us call a 2-dimensional random vector of coordinates (X, Y): the probability to obtain

in the quarter plane of positive x and y is

Sums of independent random variables The probability density function of the sum of two independent random variables U and V, each of which has a probability density function, is the convolution of their separate density functions:

It is possible to generalize the previous relation to a sum of N independent random variables, with densities U1, …, UN:

Dependent variables and change of variables If the probability density function of a random variable X is given as ƒX(x), it is possible (but often not necessary; see below) to calculate the probability density function of some variable Y = g(X). This is also called a “change of variable” and is in practice used to generate a random variable of arbitrary shape ƒg(X) = ƒY using a known (for instance uniform) random number generator. If the function g is monotonic, then the resulting density function is

Here g−1 denotes the inverse function and g' denotes the derivative. This follows from the fact that the probability contained in a differential area must be invariant under change of variables. That is,

or For functions which are not monotonic the probability density function for y is

where n(y) is the number of solutions in x for the equation g(x) = y, and g−1k(y) are these solutions.

Probability density function It is tempting to think that in order to find the expected value E(g(X)) one must first find the probability density ƒg(X) of the new random variable Y = g(X). However, rather than computing

one may find instead

The values of the two integrals are the same in all cases in which both X and g(X) actually have probability density functions. It is not necessary that g be a one-to-one function. In some cases the latter integral is computed much more easily than the former.

Multiple variables The above formulas can be generalized to variables (which we will again call y) depending on more than one other variable. ƒ(x0, x1, …, xm−1) shall denote the probability density function of the variables y depends on, and the dependence shall be y = g(x0, x1, …, xm−1). Then, the resulting density function is

where the integral is over the entire (m-1)-dimensional solution of the subscripted equation and the symbolic dV must be replaced by a parametrization of this solution for a particular calculation; the variables x0, x1, …, xm−1 are then of course functions of this parametrization. This derives from the following, perhaps more intuitive representation: Suppose x is an n-dimensional random variable with joint density f. If y = H(x), where H is a bijective, differentiable function, then y has density g:

with the differential regarded as the Jacobian of the inverse of H, evaluated at y. Using the delta-function (and assuming independence) the same result is formulated as follows. If the probability density function of independent random variables Xi, i = 1, 2, …n are given as ƒXi(xi), it is possible to calculate the probability density function of some variable Y = G(X1, X2, …Xn). The following formula establishes a connection between the probability density function of Y denoted by ƒY(y) and ƒXi(xi) using the Dirac delta function:

See also • • • •

Likelihood function Density estimation Secondary measure Probability mass function

References [1] (http:/ / planetmath. org/ ?method=png& from=objects& id=2884& op=getobj) PlanetMath [2] (http:/ / mathworld. wolfram. com/ ProbabilityDistributionFunction. html) Mathworld

• Ushakov, N.G. (2001), "Density of a probability distribution" (http://eom.springer.de/D/d031110.htm), in Hazewinkel, Michiel, Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104

119

Probability density function

Bibliography • Pierre Simon de Laplace (1812). Analytical Theory of Probability. The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités. • Andrei Nikolajevich Kolmogorov (1950). Foundations of the Theory of Probability. The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933. • Patrick Billingsley (1979). Probability and Measure. New York, Toronto, London: John Wiley and Sons. • David Stirzaker (2003). Elementary Probability. Chapters 7 to 9 are about continuous variables. This book is filled with theory and mathematical proofs.

External links • Weisstein, Eric W., " Probability density function (http://mathworld.wolfram.com/ProbabilityDensityFunction. html)" from MathWorld.

Cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution. Cumulative distribution functions are also used to specify the distribution of multivariate random variables.

Definition For every real number x, the CDF of a real-valued random variable X is given by

where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x. The probability that X lies in the interval (a, b] is therefore if a < b. If treating several random variables X, Y, ... etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is omitted. It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions. This applies when discussing general distributions: some specific distributions have their own conventional notation, for example the normal distribution. The CDF of X can be defined in terms of the probability density function ƒ as follows:

Note that in the definition above, the "less than or equal to" sign, "≤", is a convention, not a universally used one (e.g. Hungarian literature uses "<"), but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depend upon this convention. Moreover, important formulas like Levy's inversion formula for the characteristic function also rely on the "less or equal" formulation.

120

Cumulative distribution function

121

Properties Every cumulative distribution function F is (not necessarily strictly) monotone non-decreasing (see monotone increasing) and right-continuous. Furthermore, we have

From top to bottom, the cumulative distribution function of a discrete probability distribution, continuous probability distribution, and a distribution which has both a continuous part and a discrete part.

Every function with these four properties is a CDF. The properties imply that all CDFs are càdlàg functions. If X is a discrete random variable, then it attains values x1, x2, ... with probability pi = P(xi), and the CDF of X will be discontinuous at the points xi and constant in between:

If the CDF F of X is continuous, then X is a continuous random variable; if furthermore F is absolutely continuous, then there exists a Lebesgue-integrable function f(x) such that

for all real numbers a and b. (The first of the two equalities displayed above would not be correct in general if we had not said that the distribution is continuous. Continuity of the distribution implies that P (X = a) = P (X = b) = 0, so the difference between "<" and "≤" ceases to be important in this context.) The function f is equal to the derivative of F almost everywhere, and it is called the probability density function of the distribution of X.

Point probability The "point probability" that X is exactly b can be found as

Kolmogorov–Smirnov and Kuiper's tests The Kolmogorov–Smirnov test is based on cumulative distribution functions and can be used to test to see whether two empirical distributions are different or whether an empirical distribution is different from an ideal distribution. The closely related Kuiper's test (pronounced /ˈkaɪpərz/) is useful if the domain of the distribution is cyclic as in day of the week. For instance we might use Kuiper's test to see if the number of tornadoes varies during the year or if

Cumulative distribution function

122

sales of a product vary by day of the week or day of the month.

Complementary cumulative distribution function Sometimes, it is useful to study the opposite question and ask how often the random variable is above a particular level. This is called the complementary cumulative distribution function (ccdf) or exceedance, and is defined as

This has applications in statistical hypothesis testing, for example, because one-sided P-value is the probability of observing a test statistic at least as extreme as the one observed; hence, the one-sided P-value is simply given by the ccdf. In survival analysis,

is called the survival function and denoted

, while the term reliability function is

common in engineering.

Folded cumulative distribution While the plot of a cumulative distribution often has an S-like shape, an alternative illustration is the folded cumulative distribution or mountain plot, which folds the top half of the graph over,[1] thus using two scales, one for the upslope and another for the downslope. This form of illustration emphasises the median and dispersion of the distribution or of the empirical results.

Examples As an example, suppose X is uniformly distributed on the unit interval [0, 1]. Then the CDF of X is given by Example of the folded cumulative distribution for a normal distribution function

Take another example, suppose X takes only the discrete values 0 and 1, with equal probability. Then the CDF of X is given by

Cumulative distribution function

123

Inverse If the CDF F is strictly increasing and continuous then

is the unique real number

. Unfortunately, the distribution does not, in general, have an inverse. One may define, for

Example 1: The median is Example 2: Put

such that

,

. . Then we call

the 95th percentile.

The inverse of the cdf is called the quantile function. The inverse of the cdf can be used to translate results obtained for the uniform distribution to other distributions. Some useful properties of the inverse cdf are: 1.

is nondecreasing

2. 3. 4. 5. If

has a

if and only if distribution then

is distributed as

. This is used in random number generation

using the inverse transform sampling-method. 6. If is a collection of independent -distributed random variables defined on the same sample space, then there exist random variables for all

such that

is distributed as

and

with probability 1

.

Multivariate case When dealing simultaneously with more than one random variable the joint cumulative distribution function can also be defined. For example, for a pair of random variables X,Y, the joint CDF is given by

where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x and that Y takes on a value less than or equal to y. Every multivariate CDF is: 1. Monotonically non-decreasing for each of its variables 2. Right-continuous for each of its variables. 3. 4.

and

See also • • • • • •

Descriptive statistics Empirical distribution function Cumulative frequency analysis Q-Q plot Ogive Single crossing condition

Cumulative distribution function

References [1] Gentle, J.E. (2009). Computational Statistics (http:/ / books. google. de/ books?id=m4r-KVxpLsAC& lpg=PA348& ots=8Wxj0G_GC6& dq=folded cumulative distribution or mountain plot& hl=en& pg=PA348#v=onepage& q=folded cumulative distribution or mountain plot& f=false). Springer. . Retrieved 2010-08-06.

Expected value In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or mean, or first moment) of a random variable is the integral of the random variable with respect to its probability measure.[1] [2] For discrete random variables this is equivalent to the probability-weighted sum of the possible values. For continuous random variables with a density function it is the probability density-weighted integral of the possible values. The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment. The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is almost surely the limit of the sample mean as sample size grows to infinity. The value may not be expected in the general sense — the "expected value" itself may be unlikely or even impossible (such as having 2.5 children), just like the sample mean. The expected value does not exist for some distributions with large "tails", such as the Cauchy distribution.[3] It is possible to construct an expected value equal to the probability of an event by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.

History The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, posed by a French nobleman chevalier de Méré. The problem was that of two players who want to finish a game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance each has of winning the game from that point. This problem was solved in 1654 by Blaise Pascal in his private correspondence with Pierre de Fermat, however the idea was not communicated to the broad scientific community. Three years later, in 1657, a Dutch mathematician Christiaan Huygens published a treatise (see Huygens (1657)) “De ratiociniis in ludo aleæ” on probability theory, which not only lay down the foundations of the theory of probability, but also considered the problem of points, presenting a solution essentially the same as Pascal’s. [4] Neither Pascal nor Huygens used the term “expectation” in its modern sense. In particular, Huygens writes: “That my Chance or Expectation to win any thing is worth just such a Sum, as wou’d procure me in the same Chance and Expectation at a fair Lay. … If I expect a or b, and have an equal Chance of gaining them, my Expectation is worth .” More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract “Théorie analytique des probabilités”, where the concept of expected value was defined explicitly:

124

Expected value

125

… This advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage mathematical hope.





The use of letter E to denote expected value goes back to W.A. Whitworth (1901) “Choice and chance”. The symbol has become popular since for English writers it meant “Expectation”, for Germans “Erwartungswert”, and for French “Espérance mathématique”.[5]

Examples The expected outcome from one roll of an ordinary (that is, fair) six-sided die is which is not among the possible outcomes.[6] A common application of expected value is gambling. For example, an American roulette wheel has 38 places where the ball may land, all equally likely. A winning bet on a single number pays 35-to-1, meaning that the original stake is not lost, and 35 times that amount is won, so you receive 36 times what you've bet. Considering all 38 possible outcomes, the expected value of the profit resulting from a dollar bet on a single number is the sum of potential net loss times the probability of losing and potential net gain times the probability of winning, that is, The net change in your financial holdings is −$1 when you lose, and $35 when you win. Thus one may expect, on average, to lose about five cents for every dollar bet, and the expected value of a one-dollar bet is $0.947368421. In gambling, an event of which the expected value equals the stake (i.e. the bettor's expected profit, or net gain, is zero) is called a “fair game”.

Mathematical definition In general, if

is a random variable defined on a probability space

denoted by

,

,

or

, then the expected value of

,

, is defined as

When this integral converges absolutely, it is called the expectation of X.The absolute convergence is necessary because conditional convergence means that different order of addition gives different result, which is against the nature of expected value. Here the Lebesgue integral is employed. Note that not all random variables have an expected value, since the integral may not converge absolutely (e.g., Cauchy distribution). Two variables with the same probability distribution will have the same expected value, if it is defined. If

is a discrete random variable with probability mass function

, then the expected value becomes

as in the gambling example mentioned above. If the probability distribution of

admits a probability density function

, then the expected value can be

computed as

It follows directly from the discrete case definition that if fixed real number

, then the expected value of

is also

is a constant random variable, i.e.

for some

.

The expected value of an arbitrary function of X, g(X), with respect to the probability density function f(x) is given by the inner product of f and g:

Expected value

126

This is sometimes called the law of the unconscious statistician. Using representations as Riemann–Stieltjes integral and integration by parts the formula can be restated as • if • if

, .

As a special case let

In particular, for

denote a positive real number, then

, this reduces to:

if

, where F is the cumulative distribution function of X.

Conventional terminology • When one speaks of the "expected price", "expected height", etc. one means the expected value of a random variable that is a price, a height, etc. • When one speaks of the "expected number of attempts needed to get one successful attempt," one might conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected value of the geometric distribution.

Properties Constants The expected value of a constant is equal to the constant itself; i.e., if c is a constant, then

.

Monotonicity If X and Y are random variables so that

almost surely, then

.

Linearity The expected value operator (or expectation operator)

is linear in the sense that

Note that the second result is valid even if X is not statistically independent of Y. Combining the results from previous three equations, we can see that

for any two random variables numbers

and

.

and

(which need to be defined on the same probability space) and any real

Expected value

127

Iterated expectation Iterated expectation for discrete random variables For any two discrete random variables

which means that Then the expectation of

one may define the conditional expectation:[7]

is a function on

.

satisfies

Hence, the following equation holds:[8]

The right hand side of this equation is referred to as the iterated expectation and is also sometimes called the tower rule. This proposition is treated in law of total expectation. Iterated expectation for continuous random variables In the continuous case, the results are completely analogous. The definition of conditional expectation would use inequalities, density functions, and integrals to replace equalities, mass functions, and summations, respectively. However, the main result still holds:

Inequality If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or equal to that of Y: If

, then

In particular, since

. and

, the absolute value of expectation of a random variable is less

than or equal to the expectation of its absolute value:

Expected value

128

Non-multiplicativity In general, the expected value operator is not multiplicative, i.e. If multiplicativity occurs, the

and

is not necessarily equal to

.

variables are said to be uncorrelated (independent variables are a notable

case of uncorrelated variables). The lack of multiplicativity gives rise to study of covariance and correlation. If one considers the joint PDF of X and Y, say j(x,y), then the expectation of XY is

Now if X and Y are independent, then by definition j(x,y)=f(x)g(y) where f and g are the marginal PDFs for X and Y. Then Observe that independence of X and Y is required only to write j(x,y)=f(x)g(y), and this is required to establish the third equality above.

Functional non-invariance In general, the expectation operator and functions of random variables do not commute; that is

A notable inequality concerning this topic is Jensen's inequality, involving expected values of convex (or concave) functions.

Uses and applications The expected values of the powers of expected values of powers of

are called the moments of

; the moments about the mean of

are

. The moments of some random variables can be used to specify their

distributions, via their moment generating functions. To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate gets smaller. This property is often exploited in a wide variety of applications, including general problems of statistical estimation and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. where is the indicator function for set

, i.e.

.

In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose discrete random variable with values which are placed weights, at locations which the rod balances is

and corresponding probabilities along the rod and having masses

is a

. Now consider a weightless rod on (whose sum is one). The point at

.

Expected values can also be used to compute the variance, by means of the computational formula for the variance

A very important application of the expectation value is in the field of quantum mechanics. The expectation value of a quantum mechanical operator operating on a quantum state vector is written as . The uncertainty in

can be calculated using the formula

.

Expected value

129

Expectation of matrices If

is an

matrix, then the expected value of the matrix is defined as the matrix of expected values:

This is utilized in covariance matrices.

Formulas for special cases Discrete distribution taking only non-negative integer values When a random variable takes only values in

we can use the following formula for computing its

expectation:

Proof:

interchanging the order of summation, we have

as claimed. This result can be a useful computational shortcut. For example, suppose we toss a coin where the probability of heads is . How many tosses can we expect until the first heads (not including the heads itself)? Let be this number. Note that we are counting only the tails and not the heads which ends the experiment; in particular, we can have the number of tosses is at least

. The expectation of

may be computed by

exactly when the first

. This is because

tosses yielded tails. This matches the expectation of a

random variable with an Exponential distribution. We used the formula for Geometric progression:

Continuous distribution taking non-negative values Analogously with the discrete case above, when a continuous random variable X takes only non-negative values, we can use the following formula for computing its expectation:

Proof: It is first assumed that X has a density

interchanging the order of integration, we have

.

Expected value

as claimed. In case no density exists, it is seen that

See also • • • • • • •

Conditional expectation An inequality on location and scale parameters Expected value is also a key concept in economics, finance, and many other subjects The general term expectation Moment (mathematics) Expectation value (quantum mechanics) Wald's equation for calculating the expected value of a random number of random variables

Historical background • Edwards, A.W.F (2002). Pascal’s arithmetical triangle: the story of a mathematical idea (2nd ed.). JHU Press. ISBN 0-8018-6946-3. • Huygens, Christiaan (1657). De ratiociniis in ludo aleæ (English translation, published in 1714: [9]).

External links • An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. [13] from Index Funds Advisors IFA.com [14], youtube.com • Expectation [10] on PlanetMath

References [1] Sheldon M Ross (2007). "§2.4 Expectation of a random variable" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA38). Introduction to probability models (9th ed.). Academic Press. p. 38 ff. ISBN 0125980620. . [2] Richard W Hamming (1991). "§2.5 Random variables, mean and the expected value" (http:/ / books. google. com/ books?id=jX_F-77TA3gC& pg=PA64). The art of probability for scientists and engineers. Addison-Wesley. p. 64 ff. ISBN 0201406861. . [3] For a discussion of the Cauchy distribution, see Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution" (http:/ / books. google. com/ books?id=jX_F-77TA3gC& printsec=frontcover& dq=isbn:0201406861& cd=1#v=onepage& q=Cauchy& f=false). The art of probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0201406861. . "Sampling from the Cauchy distribution and averaging gets you nowhere – one sample has the same distribution as the average of 1000 samples!" [4] In the foreword to his book, Huygens writes: “It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.” (cited in Edwards (2002)). Thus, Huygens learned about de Méré’s problem in 1655 during his visit to France; later on in 1656 from his correspondence with Carcavi he learned that his method was essentially the same as Pascal’s; so that before his book went to press in 1657 he knew about Pascal’s priority in this subject. [5] "Earliest uses of symbols in probability and statistics" (http:/ / jeff560. tripod. com/ stat. html). . [6] Sheldon M Ross. "Example 2.15" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA39). cited work. p. 39. ISBN 0125980620. . [7] Sheldon M Ross. "Chapter 3: Conditional probability and conditional expectation" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA97). cited work. p. 97 ff. ISBN 0125980620. .

130

Expected value

131

[8] Sheldon M Ross. "§3.4: Computing expectations by conditioning" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA105). cited work. p. 105 ff. ISBN 0125980620. . [9] http:/ / www. york. ac. uk/ depts/ maths/ histstat/ huygens. pdf [10] http:/ / planetmath. org/ ?op=getobj& amp;from=objects& amp;id=505

Discrete probability distribution In probability theory and statistics, a discrete probability distribution is a probability distribution characterized by a probability mass function. Thus, the distribution of a random variable X is discrete, and X is then called a discrete random variable, if

The probability mass function of a discrete probability distribution. The probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero.

The cdf of a discrete probability distribution,...

... of a continuous probability distribution,...

... of a distribution which has both a continuous part and a discrete part.

as u runs through the set of all possible values of X. It follows that such a random variable can assume only a finite or countably infinite number of values. That is, the possible values might be listed, although the list might be infinite. For example, count observations such as the numbers of birds in flocks comprise only natural number values {0, 1, 2, ...}. By contrast, continuous observations such as the weights of birds comprise real number values and would typically be modeled by a continuous probability distribution such as the normal.

Discrete probability distribution

132

In cases more frequently considered, this set of possible values is a topologically discrete set in the sense that all its points are isolated points. But there are discrete random variables for which this countable set is dense on the real line. Among the most well-known discrete probability distributions that are used for statistical modeling are the Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative binomial distribution. In addition, the discrete uniform distribution is commonly used in computer programs that make equal-probability random selections between a number of choices.

Alternative description Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative distribution function (cdf) increases only by jump discontinuities—that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take. The number of such jumps may be finite or countably infinite. The set of locations of such jumps need not be topologically discrete; for example, the cdf might jump at each rational number. Consequently, a discrete probability distribution is often represented as a generalized probability density function involving Dirac delta functions, which substantially unifies the treatment of continuous and discrete distributions. This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.

Representation in terms of indicator functions For a discrete random variable X, let u0, u1, ... be the values it can take with non-zero probability. Denote These are disjoint sets, and by formula (1)

It follows that the probability that X takes any value except for u0, u1, ... is zero, and thus one can write X as

except on a set of probability zero, where definition of discrete random variables.

See also • Stochastic vector • Continuous probability distribution

is the indicator function of A. This may serve as an alternative

Continuous probability distribution

133

Continuous probability distribution In probability theory, a probability distribution is called continuous if its cumulative distribution function is continuous . This is equivalent to saying that for random variables X with the distribution in question, Pr[X = a] = 0 for all real numbers a, i.e.: the probability that X attains the value a is zero, for any number a. If the distribution of X is continuous then X is called a continuous random variable. While for a discrete probability distribution an event with probability zero is impossible (e.g. rolling 3.5 on a standard die is impossible, and has probability zero), this is not true in the case of a continuous random variable. For example, if one measures the width of an oak leaf, the result 3.5 cm is possible, but has probability zero because there are infinitely many possible values even between 3 cm and 4 cm. Each of these individual outcomes has probability zero, yet the probability that the outcome will fall into that interval is nonzero. This apparent paradox is resolved by the fact that the probability that X attains some value within an infinite set, such as an interval, cannot be found by naively adding the probabilities for individual values. Formally, each value has an infinitesimally small probability, which statistically is equivalent to zero.

Comparison with absolute continuity The term "continuous" is sometimes used as a synonym for "absolutely continuous with respect to Lebesgue" (see Radon–Nikodym theorem). An absolutely continuous distribution (with respect to Lebesgue) has a probability density function[1] . For a random variable X, being absolutely continuous is equivalent to saying that the probability that X attains a value in any given subset S of its range with Lebesgue measure zero is equal to zero. This does not follow from the condition Pr[X = a] = 0 for all real numbers a, since there are uncountable sets with Lebesgue-measure zero (e.g. the Cantor set). A random variable with the Cantor distribution is continuous (according to the first convention) but is not absolutely continuous. In practical applications, random variables are often either discrete, absolutely continuous, or mixtures thereof. However, the Cantor distribution is neither discrete nor a weighted average of discrete and absolutely continuous distributions. The normal distribution, continuous uniform distribution, Beta distribution, and Gamma distribution are well known absolutely continuous distributions. The normal distribution, also called the Gaussian or the bell curve, is ubiquitous in nature and statistics due to the central limit theorem: every variable that can be modelled as a sum of many small independent variables is approximately normal.

External links • Continuous Random Variables. [2] John Appleby, School of Mathematical Sciences, Dublin City University. Hazewinkel, Michiel, ed. (2001), "Continuous distribution" ISBN 978-1556080104

[3]

, Encyclopaedia of Mathematics, Springer,

References [1] Feller: An Introduction to Probability Theory and its Applications, volume 2, page 139 [2] http:/ / webpages. dcu. ie/ ~applebyj/ ms207/ CNSRV1. pdf [3] http:/ / eom. springer. de/ c/ c025620. htm

Probability mass function

134

Probability mass function In probability theory, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A pmf differs from a probability density function (pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range. See notation for the meaning of (a, b].

The graph of a probability mass function. All the values of this function must be non-negative and sum up to 1.

Mathematical description Suppose that X: S → R is a discrete random variable defined on a sample space S. Then the probability mass function fX: R → [0, 1] for X is defined as

The probability mass function of a fair die. All the numbers on the die have an equal chance of appearing on top when the die is rolled.

Note that fX is defined for all real numbers, including those not in the image of X; indeed, fX(x) = 0 for all x

X(S).

Since the image of X is countable, the probability mass function fX(x) is zero for all but a countable number of values of x. The discontinuity of probability mass functions reflects the fact that the cumulative distribution function of a discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points.

Example Suppose that S is the sample space of all outcomes of a single toss of a fair coin, and X is the random variable defined on S assigning 0 to "tails" and 1 to "heads". Since the coin is fair, the probability mass function is

See also • Discrete probability distribution

Probability mass function

References • Johnson, N.L., Kotz, S., Kemp A. (1993) Univariate Discrete Distributions (2nd Edition). Wiley. ISBN 0-471-54897-9 (p 36)

Continuous function In mathematics, a continuous function is a function for which, intuitively, small changes in the input result in small changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous". An intuitive (though imprecise) idea of continuity is given by the common statement that a continuous function is a function whose graph can be drawn without lifting the chalk from the blackboard. Continuity of functions is one of the core concepts of topology, which is treated in full generality below. The introductory portion of this article focuses on the special case where the inputs and outputs of functions are real numbers. In addition, this article discusses the definition for the more general case of functions between two metric spaces. In order theory, especially in domain theory, one considers a notion of continuity known as Scott continuity. Other forms of continuity do exist but they are not discussed in this article. As an example, consider the function h(t) which describes the height of a growing flower at time t. This function is continuous. In fact, there is a dictum of classical physics which states that in nature everything is continuous. By contrast, if M(t) denotes the amount of money in a bank account at time t, then the function jumps whenever money is deposited or withdrawn, so the function M(t) is discontinuous. (However, if one assumes a discrete set as the domain of function M, for instance the set of points of time at 4:00 PM on business days, then M becomes continuous function, as every function whose domain is a discrete subset of reals is.)

Real-valued continuous functions Historical infinitesimal definition Cauchy defined continuity of a function in the following intuitive terms: an infinitesimal change in the independent variable corresponds to an infinitesimal change of the dependent variable (see Cours d'analyse, page 34).

Definition in terms of limits Suppose we have a function that maps real numbers to real numbers and whose domain is some interval, like the functions h and M above. Such a function can be represented by a graph in the Cartesian plane; the function is continuous if, roughly speaking, the graph is a single unbroken curve with no "holes" or "jumps". In general, we say that the function f is continuous at some point c of its domain if, and only if, the following holds: • The limit of f(x) as x approaches c through domain of f does exist and is equal to f(c); in mathematical notation, . If the point c in the domain of f is not a limit point of the domain, then this condition is vacuously true, since x cannot approach c through values not equal c. Thus, for example, every function whose domain is the set of all integers is continuous. We call a function continuous, if, and only if, it is continuous at every point of its domain. More generally, we say that a function is continuous on some subset of its domain if it is continuous at every point of that subset. The notation C(Ω) or C0(Ω) is sometimes used to denote the set of all continuous functions with domain Ω. Similarly, C1(Ω) is used to denote the set of differentiable functions whose derivative is continuous, C²(Ω) for the twice-differentiable functions whose second derivative is continuous, and so on (see differentiability class). In the field of computer graphics, these three levels are sometimes called g0 (continuity of position), g1 (continuity of

135

Continuous function tangency), and g2 (continuity of curvature). The notation C(n, α)(Ω) occurs in the definition of a more subtle concept, that of Hölder continuity.

Weierstrass definition (epsilon-delta) of continuous functions Without resorting to limits, one can define continuity of real functions as follows. Again consider a function ƒ that maps a set of real numbers to another set of real numbers, and suppose c is an element of the domain of ƒ. The function ƒ is said to be continuous at the point c if the following holds: For any number ε > 0, however small, there exists some number δ > 0 such that for all x in the domain of ƒ with c − δ < x < c + δ, the value of ƒ(x) satisfies

Alternatively written: Given subsets I, D of R, continuity of ƒ : I → D at c ∈ I means that for every ε > 0 there exists a δ > 0 such that for all x ∈ I,:

A form of this epsilon-delta definition of continuity was first given by Bernard Bolzano in 1817. Preliminary forms of a related definition of the limit were given by Cauchy,[1] though the formal definition and the distinction between pointwise continuity and uniform continuity were first given by Karl Weierstrass. More intuitively, we can say that if we want to get all the ƒ(x) values to stay in some small neighborhood around ƒ(c), we simply need to choose a small enough neighborhood for the x values around c, and we can do that no matter how small the ƒ(x) neighborhood is; ƒ is then continuous at c. In modern terms, this is generalized by the definition of continuity of a function with respect to a basis for the topology, here the metric topology.

Heine definition of continuity The following definition of continuity is due to Heine. A real function ƒ is continuous if for any sequence (xn) such that it holds that

(We assume that all the points xn as well as L belong to the domain of ƒ.) One can say, briefly, that a function is continuous if, and only if, it preserves limits. Weierstrass's and Heine's definitions of continuity are equivalent on the reals. The usual (easier) proof makes use of the axiom of choice, but in the case of global continuity of real functions it was proved by Wacław Sierpiński that the axiom of choice is not actually needed.[2] In more general setting of topological spaces, the concept analogous to Heine definition of continuity is called sequential continuity. In general, the condition of sequential continuity is weaker than the analogue of Cauchy continuity, which is just called continuity (see continuity (topology) for details). However, if instead of sequences one uses nets (sets indexed by a directed set, not only the natural numbers), then the resulting concept is equivalent to the general notion of continuity in topology. Sequences are sufficient on metric spaces because they are first-countable spaces (every point has a countable neighborhood basis, hence representative points in each neighborhood are enough to ensure continuity), but general topological spaces are not first-countable, hence sequences do not suffice, and nets must be used.

136

Continuous function

Definition using oscillation

The failure of a function to be continuous at a point is quantified by its oscillation.

Continuity can also be defined in terms of oscillation: a function ƒ is continuous at a point x0 if and only if the oscillation is zero;[3] in symbols, A benefit of this definition is that it quantifies discontinuity: the oscillation gives how much the function is discontinuous at a point. This definition is useful in descriptive set theory to study the set of discontinuities and continuous points – the continuous points are the intersection of the sets where the oscillation is less than ε (hence a Gδ set) – and gives a very quick proof of one direction of the Lebesgue integrability condition.[4] The oscillation is equivalence to the ε-δ definition by a simple re-arrangement, and by using a limit (lim sup, lim inf) to define oscillation: if (at a given point) for a given ε0 there is no δ that satisfies the ε-δ definition, then the oscillation is at least ε0, and conversely if for every ε there is a desired δ, the oscillation is 0. The oscillation definition can be naturally generalized to maps from a topological space to a metric space.

Definition using the hyperreals Non-standard analysis is a way of making Newton-Leibniz-style infinitesimals mathematically rigorous. The real line is augmented by the addition of infinite and infinitesimal numbers to form the hyperreal numbers. In nonstandard analysis, continuity can be defined as follows. A function ƒ from the reals to the reals is continuous if its natural extension to the hyperreals has the property that for real x and infinitesimal dx, ƒ(x+dx) − ƒ(x) is infinitesimal.[5] In other words, an infinitesimal increment of the independent variable corresponds to an infinitesimal change of the dependent variable, giving a modern expression to Augustin-Louis Cauchy's definition of continuity.

Examples • All polynomial functions are continuous. • If a function has a domain which is not an interval, the notion of a continuous function as one whose graph you can draw without taking your pencil off the paper is not quite correct. Consider the functions f(x) = 1/x and g(x) = (sin x)/x. Neither function is defined at x = 0, so each has domain R \ {0} of real numbers except 0, and each function is continuous. The question of continuity at x = 0 does not arise, since x = 0 is neither in the domain of f nor in the domain of g. The function f cannot be extended to a continuous function whose domain is R, since no matter what value is assigned at 0, the resulting function will not be continuous. On the other hand, since the limit of g at 0 is 1, g can be extended continuously to R by defining its value at 0 to be 1.

137

Continuous function • The exponential functions, logarithms, square root function, trigonometric functions and absolute value function are continuous. Rational functions, however, are not necessarily continuous on all of R. • An example of a rational continuous function is f(x)=1⁄x-2. The question of continuity at x= 2 does not arise, since x = 2 is not in the domain of f. • An example of a discontinuous function is the function f defined by f(x) = 1 if x > 0, f(x) = 0 if x ≤ 0. Pick for instance ε = 1⁄2. There is no δ-neighborhood around x = 0 that will force all the f(x) values to be within ε of f(0). Intuitively we can think of this type of discontinuity as a sudden jump in function values. • Another example of a discontinuous function is the signum or sign function. • A more complicated example of a discontinuous function is Thomae's function. • Dirichlet's function

is continuous at only one point, namely x = 0. [6]

Facts about continuous functions If two functions f and g are continuous, then f + g, fg, and f/g are continuous. (Note. The only possible points x of discontinuity of f/g are the solutions of the equation g(x) = 0; but then any such x does not belong to the domain of the function f/g. Hence f/g is continuous on its entire domain, or - in other words - is continuous.) The composition f o g of two continuous functions is continuous. If a function is differentiable at some point c of its domain, then it is also continuous at c. The converse is not true: a function that is continuous at c need not be differentiable there. Consider for instance the absolute value function at c = 0. Intermediate value theorem The intermediate value theorem is an existence theorem, based on the real number property of completeness, and states: If the real-valued function f is continuous on the closed interval [a, b] and k is some number between f(a) and f(b), then there is some number c in [a, b] such that f(c) = k. For example, if a child grows from 1 m to 1.5 m between the ages of two and six years, then, at some time between two and six years of age, the child's height must have been 1.25 m. As a consequence, if f is continuous on [a, b] and f(a) and f(b) differ in sign, then, at some point c in [a, b], f(c) must equal zero. Extreme value theorem The extreme value theorem states that if a function f is defined on a closed interval [a,b] (or any closed and bounded set) and is continuous there, then the function attains its maximum, i.e. there exists c ∈ [a,b] with f(c) ≥ f(x) for all x ∈ [a,b]. The same is true of the minimum of f. These statements are not, in general, true if the function is defined on an open interval (a,b) (or any set that is not both closed and bounded), as, for example, the continuous function f(x) = 1/x, defined on the open interval (0,1), does not attain a maximum, being unbounded above.

138

Continuous function

139

Directional continuity

A right continuous function

A left continuous function

A function may happen to be continuous in only one direction, either from the "left" or from the "right". A right-continuous function is a function which is continuous at all points when approached from the right. Technically, the formal definition is similar to the definition above for a continuous function but modified as follows: The function ƒ is said to be right-continuous at the point c if the following holds: For any number ε > 0 however small, there exists some number δ > 0 such that for all x in the domain with c < x < c + δ, the value of ƒ(x) will satisfy

Notice that x must be larger than c, that is on the right of c. If x were also allowed to take values less than c, this would be the definition of continuity. This restriction makes it possible for the function to have a discontinuity at c, but still be right continuous at c, as pictured. Likewise a left-continuous function is a function which is continuous at all points when approached from the left, that is, c − δ < x < c. A function is continuous if and only if it is both right-continuous and left-continuous.

Continuous functions between metric spaces Now consider a function f from one metric space (X, dX) to another metric space (Y, dY). Then f is continuous at the point c in X if for any positive real number ε, there exists a positive real number δ such that all x in X satisfying dX(x, c) < δ will also satisfy dY(f(x), f(c)) < ε. This can also be formulated in terms of sequences and limits: the function f is continuous at the point c if for every sequence (xn) in X with limit lim xn = c, we have lim f(xn) = f(c). Continuous functions transform limits into limits. This latter condition can be weakened as follows: f is continuous at the point c if and only if for every convergent sequence (xn) in X with limit c, the sequence (f(xn)) is a Cauchy sequence, and c is in the domain of f. Continuous functions transform convergent sequences into Cauchy sequences. The set of points at which a function between metric spaces is continuous is a Gδ set – this follows from the ε-δ definition of continuity.

Continuous function

140

Continuous functions between topological spaces The above definitions of continuous functions can be generalized to functions from one topological space to another in a natural way; a function f : X → Y, where X and Y are topological spaces, is continuous if and only if for every open set V ⊆ Y, the inverse image

Continuity of a function at a point

is open. However, this definition is often difficult to use directly. Instead, suppose we have a function f from X to Y, where X, Y are topological spaces. We say f is continuous at x for some x ∈ X if for any neighborhood V of f(x), there is a neighborhood U of x such that f(U) ⊆ V. Although this definition appears complex, the intuition is that no matter how "small" V becomes, we can always find a U containing x that will map inside it. If f is continuous at every x ∈ X, then we simply say f is continuous. In a metric space, it is equivalent to consider the neighbourhood system of open balls centered at x and f(x) instead of all neighborhoods. This leads to the standard ε-δ definition of a continuous function from real analysis, which says roughly that a function is continuous if all points close to x map to points close to f(x). This only really makes sense in a metric space, however, which has a notion of distance. Note, however, that if the target space is Hausdorff, it is still true that f is continuous at a if and only if the limit of f as x approaches a is f(a). At an isolated point, every function is continuous.

Definitions Several equivalent definitions for a topological structure exist and thus there are several equivalent ways to define a continuous function. Open and closed set definition The most common notion of continuity in topology defines continuous functions as those functions for which the preimages(or inverse images) of open sets are open. Similar to the open set formulation is the closed set formulation, which says that preimages (or inverse images) of closed sets are closed. Neighborhood definition Definitions based on preimages are often difficult to use directly. Instead, suppose we have a function f : X → Y, where X and Y are topological spaces.[7] We say f is continuous at x for some x ∈ X if for any neighborhood V of f(x), there is a neighborhood U of x such that f(U) ⊆ V. Although this definition appears complicated, the intuition is that no matter how "small" V becomes, we can always find a U containing x that will map inside it. If f is continuous at every x ∈ X, then we simply say f is continuous.

Continuous function

In a metric space, it is equivalent to consider the neighbourhood system of open balls centered at x and f(x) instead of all neighborhoods. This leads to the standard δ-ε definition of a continuous function from real analysis, which says roughly that a function is continuous if all points close to x map to points close to f(x). This only really makes sense in a metric space, however, which has a notion of distance. Note, however, that if the target space is Hausdorff, it is still true that f is continuous at a if and only if the limit of f as x approaches a is f(a). At an isolated point, every function is continuous. Sequences and nets In several contexts, the topology of a space is conveniently specified in terms of limit points. In many instances, this is accomplished by specifying when a point is the limit of a sequence, but for some spaces that are too large in some sense, one specifies also when a point is the limit of more general sets of points indexed by a directed set, known as nets. A function is continuous only if it takes limits of sequences to limits of sequences. In the former case, preservation of limits is also sufficient; in the latter, a function may preserve all limits of sequences yet still fail to be continuous, and preservation of nets is a necessary and sufficient condition. In detail, a function f : X → Y is sequentially continuous if whenever a sequence (xn) in X converges to a limit x, the sequence (f(xn)) converges to f(x). Thus sequentially continuous functions "preserve sequential limits". Every continuous function is sequentially continuous. If X is a first-countable space, then the converse also holds: any function preserving sequential limits is continuous. In particular, if X is a metric space, sequential continuity and continuity are equivalent. For non first-countable spaces, sequential continuity might be strictly weaker than continuity. (The spaces for which the two properties are equivalent are called sequential spaces.) This motivates the consideration of nets instead of sequences in general topological spaces. Continuous functions preserve limits of nets, and in fact this property characterizes continuous functions. Closure operator definition Given two topological spaces (X,cl) and (X ' ,cl ') where cl and cl ' are two closure operators then a function

is continuous if for all subsets A of X

One might therefore suspect that given two topological spaces (X,int) and (X ' ,int ') where int and int ' are two interior operators then a function

is continuous if for all subsets A of X

or perhaps if

141

Continuous function

however, neither of these conditions is either necessary or sufficient for continuity. Instead, we must resort to inverse images: given two topological spaces (X,int) and (X ' ,int ') where int and int ' are two interior operators then a function

is continuous if for all subsets A of X '

We can also write that given two topological spaces (X,cl) and (X ' ,cl ') where cl and cl ' are two closure operators then a function

is continuous if for all subsets A of X '

Closeness relation definition Given two topological spaces (X,δ) and (X' ,δ') where δ and δ' are two closeness relations then a function

is continuous if for all points x and of X and all subsets A of X,

This is another way of writing the closure operator definition.

Useful properties of continuous maps Some facts about continuous maps between topological spaces: • If f : X → Y and g : Y → Z are continuous, then so is the composition g ∘ f : X → Z. • If f : X → Y is continuous and • X is compact, then f(X) is compact. • X is connected, then f(X) is connected. • X is path-connected, then f(X) is path-connected. • X is Lindelöf, then f(X) is Lindelöf. • X is separable, then f(X) is separable. • The identity map idX : (X, τ2) → (X, τ1) is continuous if and only if τ1 ⊆ τ2 (see also comparison of topologies).

Other notes If a set is given the discrete topology, all functions with that space as a domain are continuous. If the domain set is given the indiscrete topology and the range set is at least T0, then the only continuous functions are the constant functions. Conversely, any function whose range is indiscrete is continuous. Given a set X, a partial ordering can be defined on the possible topologies on X. A continuous function between two topological spaces stays continuous if we strengthen the topology of the domain space or weaken the topology of the codomain space. Thus we can consider the continuity of a given function a topological property, depending only on the topologies of its domain and codomain spaces. For a function f from a topological space X to a set S, one defines the final topology on S by letting the open sets of S be those subsets A of S for which f−1(A) is open in X. If S has an existing topology, f is continuous with respect to this topology if and only if the existing topology is coarser than the final topology on S. Thus the final topology can be characterized as the finest topology on S which makes f continuous. If f is surjective, this topology is canonically

142

Continuous function identified with the quotient topology under the equivalence relation defined by f. This construction can be generalized to an arbitrary family of functions X → S. Dually, for a function f from a set S to a topological space, one defines the initial topology on S by letting the open sets of S be those subsets A of S for which f(A) is open in X. If S has an existing topology, f is continuous with respect to this topology if and only if the existing topology is finer than the initial topology on S. Thus the initial topology can be characterized as the coarsest topology on S which makes f continuous. If f is injective, this topology is canonically identified with the subspace topology of S, viewed as a subset of X. This construction can be generalized to an arbitrary family of functions S → X. Symmetric to the concept of a continuous map is an open map, for which images of open sets are open. In fact, if an open map f has an inverse, that inverse is continuous, and if a continuous map g has an inverse, that inverse is open. If a function is a bijection, then it has an inverse function. The inverse of a continuous bijection is open, but need not be continuous. If it is, this special function is called a homeomorphism. If a continuous bijection has as its domain a compact space and its codomain is Hausdorff, then it is automatically a homeomorphism.

Continuous functions between partially ordered sets In order theory, continuity of a function between posets is Scott continuity. Let X be a complete lattice, then a function f : X → X is continuous if, for each subset Y of X, we have sup f(Y) = f(sup Y).

Continuous binary relation A binary relation R on A is continuous if R(a, b) whenever there are sequences (ak)i and (bk)i in A which converge to a and b respectively for which R(ak, bk) for all k. Clearly, if one treats R as a characteristic function in two variables, this definition of continuous is identical to that for continuous functions.

Continuity space A continuity space[8] [9] is a generalization of metric spaces and posets, which uses the concept of quantales, and that can be used to unify the notions of metric spaces and domains.[10]

See also • • • • • • • • • • • • • •

Absolute continuity Bounded linear operator Classification of discontinuities Coarse function Continuous functor Continuous stochastic process Dini continuity Discrete function Equicontinuity Lipschitz continuity Normal function Piecewise Scott continuity Semicontinuity

• Smooth function • Symmetrically continuous function • Uniform continuity

143

Continuous function

References • Visual Calculus [11] by Lawrence S. Husch, University of Tennessee (2001)

References [1] Grabiner, Judith V. (March 1983). "Who Gave You the Epsilon? Cauchy and the Origins of Rigorous Calculus" (http:/ / www. maa. org/ pubs/ Calc_articles/ ma002. pdf). The American Mathematical Monthly 90 (3): 185–194. doi:10.2307/2975545. . [2] "Heine continuity implies Cauchy continuity without the Axiom of Choice" (http:/ / www. apronus. com/ math/ cauchyheine. htm). Apronus.com. . [3] Introduction to Real Analysis (http:/ / ramanujan. math. trinity. edu/ wtrench/ texts/ TRENCH_REAL_ANALYSIS. PDF), updated April 2010, William F. Trench, Theorem 3.5.2, p. 172 [4] Introduction to Real Analysis (http:/ / ramanujan. math. trinity. edu/ wtrench/ texts/ TRENCH_REAL_ANALYSIS. PDF), updated April 2010, William F. Trench, 3.5 "A More Advanced Look at the Existence of the Proper Riemann Integral", pp. 171–177 [5] http:/ / www. math. wisc. edu/ ~keisler/ calc. html [6] http:/ / www. quantiphile. com/ 2010/ 09/ 13/ a-function-that-is-continuous-at-only-one-point/ [7] f is a function f : X → Y between two topological spaces (X,TX) and (Y,TY). That is, the function f is defined on the elements of the set X, not on the elements of the topology TX. However continuity of the function does depend on the topologies used. [8] Quantales and continuity spaces (http:/ / citeseerx. ist. psu. edu/ viewdoc/ download?doi=10. 1. 1. 48. 851& rep=rep1& type=pdf), RC Flagg Algebra Universalis, 1997 [9] All topologies come from generalized metrics, R Kopperman - American Mathematical Monthly, 1988 [10] Continuity spaces: Reconciling domains and metric spaces, B Flagg, R Kopperman - Theoretical Computer Science, 1997 [11] http:/ / archives. math. utk. edu/ visual. calculus/

Measure (mathematics) In mathematics, more specifically in measure theory, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, volume, et cetera. A particularly important example is the Lebesgue measure on a Euclidean space, which assigns the conventional length, area and volume of Euclidean geometry to suitable subsets of Rn, n = 1, 2, 3, .... For instance, the Lebesgue measure of [0, 1] in the real numbers is its length in the everyday sense of the word, specifically 1. To qualify as a measure (see Definition below), a function that assigns a non-negative real number or +∞ to a set's subsets must satisfy a few conditions. One important condition is countable additivity. This condition states that the size of the union of a sequence of disjoint subsets is equal to the sum of the sizes of the subsets. However, it is in general impossible to consistently associate a size to each subset of a given set and also satisfy the other axioms of a measure. This problem was resolved by defining measure only on a sub-collection of all subsets; the subsets on which the measure is to be defined are called Informally, a measure has the property of being measurable and they are required to form a sigma-algebra, meaning monotone in the sense that if A is a subset of B, that unions, intersections and complements of sequences of measurable the measure of A is less than or equal to the subsets are measurable. Non-measurable sets in a Euclidean space, on measure of B. Furthermore, the measure of the empty set is required to be 0. which the Lebesgue measure cannot be consistently defined, are necessarily complex to the point of incomprehensibility, in a sense badly mixed up with their complement; indeed, their existence is a non-trivial consequence of the axiom of choice.

144

Measure (mathematics)

145

Measure theory was developed in successive stages during the late 19th and early 20th centuries by Emile Borel, Henri Lebesgue, Johann Radon and Maurice Fréchet, among others. The main applications of measures are in the foundations of the Lebesgue integral, in Andrey Kolmogorov's axiomatisation of probability theory and in ergodic theory. In integration theory, specifying a measure allows one to define integrals on spaces more general than subsets of Euclidean space; moreover, the integral with respect to the Lebesgue measure on Euclidean spaces is more general and has a richer theory than its predecessor, the Riemann integral. Probability theory considers measures that assign to the whole set the size 1, and considers measurable subsets to be events whose probability is given by the measure. Ergodic theory considers measures that are invariant under, or arise naturally from, a dynamical system.

Definition Let Σ be a σ-algebra over a set X. A function μ from Σ to the extended real number line is called a measure if it satisfies the following properties: • Non-negativity: for all • Null empty set:

• Countable additivity (or σ-additivity): For all countable collections

of pairwise disjoint sets in Σ:

The second condition may be treated as a special case of countable additivity, if the empty collection is allowed as a countable collection (and the empty sum is interpreted as 0). Otherwise, if the empty collection is disallowed (but finite collections are allowed), the second condition still follows from countable additivity provided, however, that there is at least one set having finite measure. The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets, and the triple (X, Σ, μ) is called a measure space. If only the second and third conditions of the definition of measure above are met, and μ takes on at most one of the values ±∞, then μ is called a signed measure. A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability space is a measure space with a probability measure. For measure spaces that are also topological spaces various compatibility conditions can be placed for the measure and the topology. Most measures met in practice in analysis (and in many cases also in probability theory) are Radon measures. Radon measures have an alternative definition in terms of linear functionals on the locally convex space of continuous functions with compact support. This approach is taken by Bourbaki (2004) and a number of other authors. For more details see Radon measure.

Measure (mathematics)

Properties Several further properties can be derived from the definition of a countably additive measure.

Monotonicity A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆ E2 then

Measures of infinite unions of measurable sets A measure μ is countably subadditive: If E1, E2, E3, … is a countable sequence of sets in Σ, not necessarily disjoint, then

A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of En + 1 for all n, then the union of the sets En is measurable, and

Measures of infinite intersections of measurable sets A measure μ is continuous from above: If E1, E2, E3, … are measurable sets and En + 1 is a subset of En for all n, then the intersection of the sets En is measurable; furthermore, if at least one of the En has finite measure, then

This property is false without the assumption that at least one of the En has finite measure. For instance, for each n ∈ N, let

which all have infinite Lebesgue measure, but the intersection is empty.

Sigma-finite measures A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is called σ-finite if X can be decomposed into a countable union of measurable sets of finite measure. A set in a measure space has σ-finite measure if it is a countable union of sets with finite measure. For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite. Consider the closed intervals [k,k+1] for all integers k; there are countably many such intervals, each has measure 1, and their union is the entire real line. Alternatively, consider the real numbers with the counting measure, which assigns to each finite set of reals the number of points in the set. This measure space is not σ-finite, because every set with finite measure contains only finitely many points, and it would take uncountably many such sets to cover the entire real line. The σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared in this respect to the Lindelöf property of topological spaces. They can be also thought of as a vague generalization of the idea that a measure space may have 'uncountable measure'.

146

Measure (mathematics)

Completeness A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set. A negligible set need not be measurable, but every measurable negligible set is automatically a null set. A measure is called complete if every negligible set is measurable. A measure can be extended to a complete one by considering the σ-algebra of subsets Y which differ by a negligible set from a measurable set X, that is, such that the symmetric difference of X and Y is contained in a null set. One defines μ(Y) to equal μ(X).

Examples Some important measures are listed here. • The counting measure is defined by μ(S) = number of elements in S. • The Lebesgue measure on R is a complete translation-invariant measure on a σ-algebra containing the intervals in R such that μ([0,1]) = 1; and every other measure with these properties extends Lebesgue measure. • Circular angle measure is invariant under rotation. • The Haar measure for a locally compact topological group is a generalization of the Lebesgue measure (and also of counting measure and circular angle measure) and has similar uniqueness properties. • The Hausdorff measure which is a refinement of the Lebesgue measure to some fractal sets. • Every probability space gives rise to a measure which takes the value 1 on the whole space (and therefore takes all its values in the unit interval [0,1]). Such a measure is called a probability measure. See probability axioms. • The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a), where χS is the characteristic function of S. The measure of a set is 1 if it contains the point a and 0 otherwise. Other 'named' measures used in various theories include: Borel measure, Jordan measure, ergodic measure, Euler measure, Gaussian measure, Baire measure, Radon measure and Young measure. In physics an example of a measure is spatial distribution of mass (see e.g., gravity potential), or another non-negative extensive property, conserved (see conservation law for a list of these) or not. Negative values lead to signed measures, see "generalizations" below. Liouville measure, known also as the natural volume form on a symplectic manifold, is useful in classical statistical and Hamiltonian mechanics. Gibbs measure is widely used in statistical mechanics, often under the name canonical ensemble.

Non-measurable sets If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue measurable; examples of such sets include the Vitali set, and the non-measurable sets postulated by the Hausdorff paradox and the Banach–Tarski paradox.

Generalizations For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-negative reals or infinity. For instance, a countably additive set function with values in the (signed) real numbers is called a signed measure, while such a function with values in the complex numbers is called a complex measure. Measures that take values in Banach spaces have been studied extensively. A measure that takes values in the set of self-adjoint projections on a Hilbert space is called a projection-valued measure; these are used mainly in functional analysis for the spectral theorem. When it is necessary to distinguish the usual measures which take non-negative values from generalizations, the term positive measure is used. Positive measures are closed under conical combination but not general linear combination, while signed measures are the linear closure of positive measures.

147

Measure (mathematics) Another generalization is the finitely additive measure, which are sometimes called contents. This is the same as a measure except that instead of requiring countable additivity we require only finite additivity. Historically, this definition was used first, but proved to be not so useful. It turns out that in general, finitely additive measures are connected with notions such as Banach limits, the dual of L∞ and the Stone–Čech compactification. All these are linked in one way or another to the axiom of choice. A charge is a generalization in both directions: it is a finitely additive, signed measure. The remarkable result in integral geometry known as Hadwiger's theorem states that the space of translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite unions of compact convex sets in Rn consists (up to scalar multiples) of one "measure" that is "homogeneous of degree k" for each k = 0, 1, 2, ..., n, and linear combinations of those "measures". "Homogeneous of degree k" means that rescaling any set by any factor c > 0 multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary n-dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The one that is homogeneous of degree 1 is a mysterious function called the "mean width", a misnomer. The one that is homogeneous of degree 0 is the Euler characteristic.

See also • • • • • • • • • • • • • •

Outer measure Inner measure Hausdorff measure Product measure Pushforward measure Lebesgue measure Vector measure Almost everywhere Lebesgue integration Caratheodory extension theorem Measurable function Geometric measure theory Volume form Fuzzy measure theory

References • • • • • • • •

R. G. Bartle, 1995. The Elements of Integration and Lebesgue Measure. Wiley Interscience. Bourbaki, Nicolas (2004), Integration I, Springer Verlag, ISBN 3-540-41129-1 Chapter III. R. M. Dudley, 2002. Real Analysis and Probability. Cambridge University Press. Folland, Gerald B. (1999), Real Analysis: Modern Techniques and Their Applications, John Wiley and Sons, ISBN 0-471-317160-0 Second edition. D. H. Fremlin, 2000. Measure Theory [1]. Torres Fremlin. Paul Halmos, 1950. Measure theory. Van Nostrand and Co. R. Duncan Luce and Louis Narens (1987). "measurement, theory of," The New Palgrave: A Dictionary of Economics, v. 3, pp. 428–32. M. E. Munroe, 1953. Introduction to Measure and Integration. Addison Wesley.

• K. P. S. Bhaskara Rao and M. Bhaskara Rao (1983), Theory of Charges: A Study of Finitely Additive Measures, London: Academic Press, pp. x + 315, ISBN 0-1209-5780-9 • Shilov, G. E., and Gurevich, B. L., 1978. Integral, Measure, and Derivative: A Unified Approach, Richard A. Silverman, trans. Dover Publications. ISBN 0-486-63519-8. Emphasizes the Daniell integral.

148

Measure (mathematics)

External links • Tutorial: Measure Theory for Dummies [2]

References [1] http:/ / www. essex. ac. uk/ maths/ people/ fremlin/ mt. htm [2] http:/ / www. ee. washington. edu/ techsite/ papers/ documents/ UWEETR-2006-0008. pdf

Bias of an estimator In statistics, bias (or bias function) of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased. In ordinary English, the term bias is pejorative. In statistics, there are problems for which it may be good to use an estimator with a small, but nonzero, bias. In some cases, an estimator with a small bias may have lesser mean squared error or be median-unbiased (rather than mean-unbiased, the standard unbiasedness property). The property of median-unbiasedness is invariant under transformations while the property of mean-unbiasedness may be lost under nonlinear transformations.

Definition Suppose θ^is an estimator of parameter θ. Then the bias of this estimator is defined to be

where E[ ] denotes expected value. An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ. There are more general notions of bias and unbiasedness. What this article calls "bias" is called "mean-bias", to distinguish mean-bias from the other notions, notably "median-unbiased" estimators. The general theory of unbiased estimators is briefly discussed near the end of this article. In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using the mean signed difference.

Examples Sample variance Suppose X1, ..., Xn are independent and identically distributed (i.i.d) random variables with expectation μ and variance σ2. If the sample mean and sample variance are defined as

then S2 is a biased estimator of σ2, because In other words, the expected value of the sample variance does not equal the population variance σ2, unless multiplied by a normalization factor. The sample mean, on the other hand, is an unbiased estimator of the population mean μ. The reason that S2 is biased stems from the fact that the sample mean is an Ordinary Least Squares (OLS) estimator for μ: it's such a number which makes the sum Σ(Xi − m)2 as small as possible. That is, when you plug any other number into this sum, for example when you plug m = μ, the sum can only increase. Therefore σ2 will always be

149

Bias of an estimator

150

greater than the sample variance, since σ2 is an expected value of (Xi − m)2. Note that the usual definition of sample variance,

is an unbiased estimator of the population variance. This can be seen by noticing that and hence

gives the result (This pdf

[1]

contains the full proof by Scott D.

Anderson).

Estimating a Poisson probability A far more extreme case of a biased estimator being better than any unbiased estimator arises from the Poisson distribution:[2] [3] : Suppose X has a Poisson distribution with expectation λ. Suppose it is desired to estimate

(For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and λ is the average number of calls per minute, then e−2λ is the probability that no calls arrive in the next two minutes.) Since the expectation of an unbiased estimator δ(X) is equal to the estimand, i.e.

the only function of the data constituting an unbiased estimator is To see this, note that when decomposing e−λ from the above expression for expectation, the sum that is left is a Taylor Series expansion of e−λ as well, yielding e−λe−λ = e−2λ (see Characterizations of the exponential function). If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is obviously very likely to be near 0, which is the opposite extreme. And if X is observed to be 101, then the estimate is even more absurd: it is −1, although the quantity being estimated obviously must be positive. The (biased) maximum likelihood estimator

is far better than this unbiased estimator. Not only is its value always positive, but it is also more accurate in the sense that its mean squared error

is smaller; compare the unbiased estimator's MSE of

The MSEs are functions of the true value λ. The bias of the maximum-likelihood estimator is:

Bias of an estimator

Maximum of a discrete uniform distribution The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximum-likelihood estimator of n is X, even though the expectation of X is only (n + 1)/2; we can only be certain that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1.

Median-unbiased estimators, and bias with respect to other loss functions Any mean-unbiased estimator minimizes the risk (expected loss) with respect to the squared-error loss function, as observed by Gauss. A median-unbiased estimator minimizes the risk with respect to the absolute loss function, as observed by Laplace. Other loss functions are used in statistical theory, particularly in robust statistics. The theory of median-unbiased estimators was revived by George W. Brown [4] in 1947: An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if for fixed θ, the median of the distribution of the estimate is at the value θ, i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation.[4] Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. Besides being invariant under one-to-one transformations, median-unbiased estimators have surprising robustness.

Effect of transformations Note that when a transformation is applied to a mean-unbiased estimator, the result need not be a mean-unbiased estimator of its corresponding population statistic. That is, for a non-linear function f and a mean-unbiased estimator U of a parameter p, the composite estimator f(U) need not be a mean-unbiased estimator of f(p). For example the square root of the unbiased estimator of the population variance is not a mean-unbiased estimator of the population standard deviation.

See also • • • • • • • •

Omitted-variable bias Consistent estimator Estimation theory Expected loss Expected value Loss function Median Statistical decision theory

151

Bias of an estimator

References • Brown, George W. [4] "On Small-Sample Estimation." The Annals of Mathematical Statistics, Vol. 18, No. 4 (Dec., 1947), pp. 582–585. JSTOR 2236236 • Lehmann, E.L. "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, Vol. 22, No. 4 (Dec., 1951), pp. 587–592. JSTOR 2236928 • Allan Birnbaum. 1961. "A Unified Theory of Estimation, I", The Annals of Mathematical Statistics, Vol. 32, No. 1 (Mar., 1961), pp. 112–135 • van der Vaart, H.R. 1961. "Some Extensions of the Idea of Bias" The Annals of Mathematical Statistics, Vol. 32, No. 2 (Jun., 1961), pp. 436–447. • Pfanzagl, Johann. 1994. Parametric Statistical Theory. Walter de Gruyter. • Stuart, Alan; Ord, Keith; Arnold, Steven [F.] (1999). Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics. 2A (Sixth ed.). London: Arnold. pp. xxii+885. MR1687411. ISBN 0-340-66230-1. • V.G. Voinov and M.S. Nikulin. "Unbiased Estimators and Their Applications", in two volumes (vol. 1, Univariate case; vol. 2, Multivariate case). Kluwer Academic Publishers: Dordrecht, 1993, 1996.

References [1] http:/ / biology. ucf. edu/ ~pascencio/ classes/ Methods/ Proof%20that%20Sample%20Variance%20is%20Unbiased. pdf [2] J.P. Romano and A.F. Siegel, Counterexamples in Probability and Statistics, Wadsworth & Brooks/Cole, Monterey, CA, 1986 [3] Hardy, M. (1 March 2003). "An Illuminating Counterexample" (http:/ / jstor. org/ stable/ 3647938). American Mathematical Monthly 110 (3): 234–238. doi:10.2307/3647938. ISSN 00029890. . [4] Brown (1947), page 583

Probability Probability is a way of expressing knowledge or belief that an event will occur or has occurred. The concept has been given an exact mathematical meaning in probability theory, which is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of complex systems.

Interpretations The word probability does not have a consistent direct definition. In fact, there are two broad categories of probability interpretations, whose adherents possess different (and sometimes conflicting) views about the fundamental nature of probability: 1. Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency "in the long run" of outcomes.[1] 2. Bayesians, however, assign probabilities to any statement whatsoever, even when no random process is involved. Probability, for a Bayesian, is a way to represent an individual's degree of belief in a statement, or an objective degree of rational belief, given the evidence.

152

Probability

153

Etymology The word Probability derives from latin word probabilitas that can also mean probity, a measure of the authority of a witness in a legal case in Europe, and often correlated with the witness's nobility. In a sense, this differs much from the modern meaning of probability, which, in contrast, is used as a measure of the weight of empirical evidence, and is arrived at from inductive reasoning and statistical inference.[2] [3]

History The scientific study of probability is a modern development. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, but exact mathematical descriptions of use in those problems only arose much later. According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances."[4] However, in legal contexts especially, 'probable' could also apply to propositions for which there was good evidence.[5] Aside from some elementary considerations made by Girolamo Cardano in the 16th century, the doctrine of probabilities dates to the correspondence of Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi (posthumous, 1713) and Abraham de Moivre's Doctrine of Chances (1718) treated the subject as a branch of mathematics. See Ian Hacking's The Emergence of Probability and James Franklin's The Science of Conjecture for histories of the early development of the very concept of mathematical probability. The theory of errors may be traced back to Roger Cotes's Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve , being any error and

its probability, and laid down three properties of this curve:

1. it is symmetric as to the -axis; 2. the -axis is an asymptote, the probability of the error being 0; 3. the area enclosed is 1, it being certain that an error exists. He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors. The method of least squares is due to Adrien-Marie Legendre (1805), who introduced it in his Nouvelles méthodes pour la détermination des orbites des comètes (New Methods for Determining the Orbits of Comets). In ignorance of Legendre's contribution, an Irish-American writer, Robert Adrain, editor of "The Analyst" (1808), first deduced the law of facility of error,

being a constant depending on precision of observation, and

a scale factor ensuring that the area under the

curve equals 1. He gave two proofs, the second being essentially the same as John Herschel's (1850). Gauss gave the first proof which seems to have been known in Europe (the third after Adrain's) in 1809. Further proofs were given by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F. Donkin (1844, 1856), and Morgan Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864),

Probability

154

Glaisher (1872), and Giovanni Schiaparelli (1875). Peters's (1856) formula for observation, is well known.

, the probable error of a single

In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory. Andrey Markov introduced the notion of Markov chains (1906) playing an important role in theory of stochastic processes and its applications. The modern theory of probability based on the measure theory was developed by Andrey Kolmogorov (1931). On the geometric side (see integral geometry) contributors to The Educational Times were influential (Miller, Crofton, McColl, Wolstenholme, Watson, and Artemas Martin).

Mathematical treatment In mathematics, a probability of an event A is represented by a real number in the range from 0 to 1 and written as P(A), p(A) or Pr(A).[6] An impossible event has a probability of 0, and a certain event has a probability of 1. However, the converses are not always true: probability 0 events are not always impossible, nor probability 1 events certain. The rather subtle distinction between "certain" and "probability 1" is treated at greater length in the article on "almost surely". The opposite or complement of an event A is the event [not A] (that is, the event of A not occurring); its probability is given by P(not A) = 1 - P(A).[7] As an example, the chance of not rolling a six on a six-sided die is 1 – (chance of rolling a six)

. See Complementary event for a more complete treatment.

If both the events A and B occur on a single performance of an experiment this is called the intersection or joint probability of A and B, denoted as . If two events, A and B are independent then the joint probability is

for example, if two coins are flipped the chance of both being heads is

[8]

If either event A or event B or both events occur on a single performance of an experiment this is called the union of the events A and B denoted as . If two events are mutually exclusive then the probability of either occurring is

For example, the chance of rolling a 1 or 2 on a six-sided die is If the events are not mutually exclusive then

For example, when drawing a single card at random from a regular deck of cards, the chance of getting a heart or a face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13 are hearts, 12 are face cards, and 3 are both: here the possibilities included in the "3 that are both" are included in each of the "13 hearts" and the "12 face cards" but should only be counted once. Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A|B), and is read "the probability of A, given B". It is defined by [9]

If

then

is undefined.

Probability

155

Summary of probabilities Event

Probability

A not A A or B A and B

A given B

Theory Like other theories, the theory of probability is a representation of probabilistic concepts in formal terms—that is, in terms that can be considered separately from their meaning. These formal terms are manipulated by the rules of mathematics and logic, and any results are then interpreted or translated back into the problem domain. There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov's formulation (see probability space), sets are interpreted as events and probability itself as a measure on a class of sets. In Cox's theorem, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions. In both cases, the laws of probability are the same, except for technical details. There are other methods for quantifying uncertainty, such as the Dempster-Shafer theory or possibility theory, but those are essentially different and not compatible with the laws of probability as they are usually understood.

Applications Two major applications of probability theory in everyday life are in risk assessment and in trade on commodity markets. Governments typically apply probabilistic methods in environmental regulation where it is called "pathway analysis", often measuring well-being using methods that are stochastic in nature, and choosing projects to undertake based on statistical analyses of their probable effect on the population as a whole. A good example is the effect of the perceived probability of any widespread Middle East conflict on oil prices which have ripple effects in the economy as a whole. An assessment by a commodity trader that a war is more likely vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are not assessed independently nor necessarily very rationally. The theory of behavioral finance emerged to describe the effect of such groupthink on pricing, on policy, and on peace and conflict. It can reasonably be said that the discovery of rigorous methods to assess and combine probability assessments has had a profound effect on modern society. Accordingly, it may be of some importance to most citizens to understand how odds and probability assessments are made, and how they contribute to reputations and to decisions, especially in a democracy. Another significant application of probability theory in everyday life is reliability. Many consumer products, such as automobiles and consumer electronics, utilize reliability theory in the design of the product in order to reduce the probability of failure. The probability of failure may be closely associated with the product's warranty.

Probability

Relation to randomness In a deterministic universe, based on Newtonian concepts, there is no probability if all conditions are known. In the case of a roulette wheel, if the force of the hand and the period of that force are known, then the number on which the ball will stop would be a certainty. Of course, this also assumes knowledge of inertia and friction of the wheel, weight, smoothness and roundness of the ball, variations in hand speed during the turning and so forth. A probabilistic description can thus be more useful than Newtonian mechanics for analyzing the pattern of outcomes of repeated rolls of roulette wheel. Physicists face the same situation in kinetic theory of gases, where the system, while deterministic in principle, is so complex (with the number of molecules typically the order of magnitude of Avogadro constant 6.02·1023) that only statistical description of its properties is feasible. A revolutionary discovery of 20th century physics was the random character of all physical processes that occur at sub-atomic scales and are governed by the laws of quantum mechanics. The wave function itself evolves deterministically as long as no observation is made, but, according to the prevailing Copenhagen interpretation, the randomness caused by the wave function collapsing when an observation is made, is fundamental. This means that probability theory is required to describe nature. Others never came to terms with the loss of determinism. Albert Einstein famously remarked in a letter to Max Born: Jedenfalls bin ich überzeugt, daß der Alte nicht würfelt. (I am convinced that God does not play dice). Although alternative viewpoints exist, such as that of quantum decoherence being the cause of an apparent random collapse, at present there is a firm consensus among physicists that probability theory is necessary to describe quantum phenomena.

See also • • • • • • • • • • • • • • • • • • • • •

Black Swan theory Calculus of predispositions Chance Class membership probabilities Decision theory Equiprobable Fuzzy measure theory Game theory Gaming mathematics Information theory Important publications in probability Measure theory Negative probability Probabilistic argumentation Probabilistic logic Random fields Random variable List of scientific journals in probability List of statistical topics Stochastic process Wiener process

156

Probability

References • Kallenberg, O. (2005) Probabilistic Symmetries and Invariance Principles. Springer -Verlag, New York. 510 pp. ISBN 0-387-25115-4 • Kallenberg, O. (2002) Foundations of Modern Probability, 2nd ed. Springer Series in Statistics. 650 pp. ISBN 0-387-95313-2 • Olofsson, Peter (2005) Probability, Statistics, and Stochastic Processes, Wiley-Interscience. 504 pp ISBN 0-471-67969-0.

Quotations • Damon Runyon, "It may be that the race is not always to the swift, nor the battle to the strong - but that is the way to bet." • Pierre-Simon Laplace "It is remarkable that a science which began with the consideration of games of chance should have become the most important object of human knowledge." Théorie Analytique des Probabilités, 1812. • Richard von Mises "The unlimited extension of the validity of the exact sciences was a characteristic feature of the exaggerated rationalism of the eighteenth century" (in reference to Laplace). Probability, Statistics, and Truth, p 9. Dover edition, 1981 (republication of second English edition, 1957).

External links • Probability and Statistics EBook [10] • Edwin Thompson Jaynes. Probability Theory: The Logic of Science. Preprint: Washington University, (1996). — HTML index with links to PostScript files [11] and PDF [12] (first three chapters) • People from the History of Probability and Statistics (Univ. of Southampton) [13] • Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton) [14] • Earliest Uses of Symbols in Probability and Statistics [15] on Earliest Uses of Various Mathematical Symbols [16] • A tutorial on probability and Bayes’ theorem devised for first-year Oxford University students [17] • pdf file of An Anthology of Chance Operations (1963) [18] at UbuWeb • Probability Theory Guide for Non-Mathematicians [19] • Understanding Risk and Probability [20] with BBC raw

References [1] The Logic of Statistical Inference, Ian Hacking, 1965 [2] The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference, Ian Hacking, Cambridge University Press, 2006, ISBN 0521685575, 9780521685573 [3] The Cambridge History of Seventeenth-century Philosophy, Daniel Garber, 2003 [4] Jeffrey, R.C., Probability and the Art of Judgment, Cambridge University Press. (1992). pp. 54-55 . ISBN 0-521-39459-7 [5] Franklin, J., The Science of Conjecture: Evidence and Probability Before Pascal, Johns Hopkins University Press. (2001). pp. 22, 113, 127 [6] Olofsson, Peter. (2005) Page 8. [7] Olofsson, page 9 [8] Olofsson, page 35. [9] Olofsson, page 29. [10] http:/ / wiki. stat. ucla. edu/ socr/ index. php/ EBook [11] http:/ / omega. albany. edu:8008/ JaynesBook. html [12] http:/ / bayes. wustl. edu/ etj/ prob/ book. pdf [13] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Figures. htm [14] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Probability%20Earliest%20Uses. htm [15] http:/ / jeff560. tripod. com/ stat. html [16] http:/ / jeff560. tripod. com/ mathsym. html [17] http:/ / www. celiagreen. com/ charlesmccreery/ statistics/ bayestutorial. pdf [18] http:/ / ubu. com/ historical/ young/ index. html

157

Probability

158

[19] http:/ / probability. infarom. ro [20] http:/ / www. bbc. co. uk/ raw/ money/ express_unit_risk/

Pierre-Simon Laplace Pierre-Simon, marquis de Laplace

Pierre-Simon Laplace (1749–1827). Posthumous portrait by Madame Feytaud, 1842. Born

23 March 1749 Beaumont-en-Auge, Normandy, France

Died

5 March 1827 (aged 77) Paris, France

Nationality

 France

Fields

Astronomer and Mathematician

Institutions

École Militaire (1769–1776)

Alma mater

University of Caen

Academic advisors

Jean d'Alembert Christophe Gadbled Pierre Le Canu

Doctoral students

Siméon Denis Poisson

Known for

Work in Celestial Mechanics Laplace's equation Laplacian Laplace transform Laplace distribution Laplace's demon Laplace expansion Young–Laplace equation Laplace number Laplace limit Laplace invariant Laplace principle

Pierre-Simon, marquis de Laplace (23 March 1749 – 5 March 1827) was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste (Celestial Mechanics) (1799–1825). This work translated the geometric study of classical mechanics to one based on calculus, opening up a broader range of problems. In statistics, the so-called Bayesian interpretation of probability was mainly developed by Laplace.[1]

Pierre-Simon Laplace He formulated Laplace's equation, and pioneered the Laplace transform which appears in many branches of mathematical physics, a field that he took a leading role in forming. The Laplacian differential operator, widely used in applied mathematics, is also named after him. He restated and developed the nebular hypothesis of the origin of the solar system and was one of the first scientists to postulate the existence of black holes and the notion of gravitational collapse. He is remembered as one of the greatest scientists of all time, sometimes referred to as a French Newton or Newton of France, with a phenomenal natural mathematical faculty superior to any of his contemporaries.[2] He became a count of the First French Empire in 1806 and was named a marquis in 1817, after the Bourbon Restoration.

Early life Many details of the life of Laplace were lost when the family château burned in 1925.[3] Laplace was born in Beaumont-en-Auge, Normandy in 1749. According to W. W. Rouse Ball (A Short Account of the History of Mathematics, 4th edition, 1908), he was the son of a small cottager or perhaps a farm-labourer, and owed his education to the interest excited in some wealthy neighbours by his abilities and engaging presence. Very little is known of his early years. It would seem from a pupil he became an usher in the school at Beaumont; but, having procured a letter of introduction to d'Alembert, he went to Paris to push his fortune. However, Karl Pearson[3] is scathing about the inaccuracies in Rouse Ball's account and states, Indeed Caen was probably in Laplace's day the most intellectually active of all the towns of Normandy. It was here that Laplace was educated and was provisionally a professor. It was here he wrote his first paper published in the Mélanges of the Royal Society of Turin, Tome iv. 1766–1769, at least two years before he went at 22 or 23 to Paris in 1771. Thus before he was 20 he was in touch with Lagrange in Turin. He did not go to Paris a raw self-taught country lad with only a peasant background! In 1765 at the age of sixteen Laplace left the "School of the Duke of Orleans" in Beaumont and went to the University of Caen, where he appears to have studied for five years. The 'Ecole militaire' of Beaumont did not replace the old school until 1776. His parents were from comfortable families. His father was Pierre Laplace, and his mother was Marie-Anne Sochon. The Laplace family was involved in agriculture until at least 1750, but Pierre Laplace senior was also a cider merchant and syndic of the town of Beaumont. Pierre Simon Laplace attended a school in the village run at a Benedictine priory, his father intending that he would be ordained in the Roman Catholic Church, and at sixteen he was sent to further his father's intention at the University of Caen, reading theology.[4] At the university, he was mentored by two enthusiastic teachers of mathematics, Christophe Gadbled and Pierre Le Canu, who awoke his zeal for the subject. Laplace never graduated in theology but left for Paris with a letter of introduction from Le Canu to Jean le Rond d'Alembert.[4] According to his great-great-grandson,[3] d'Alembert received him rather poorly, and to get rid of him gave him a thick mathematics book, saying to come back when he had read it. When Laplace came back a few days later, d'Alembert was even less friendly and did not hide his opinion that it was impossible that Laplace could have read and understood the book. But upon questioning him, he realized that it was true, and from that time he took Laplace under his care. Another version is that Laplace solved overnight a problem that d'Alembert set him for submission the following week, then solved a harder problem the following night. D'Alembert was impressed and recommended him for a teaching place in the École Militaire.[5] With a secure income and undemanding teaching, Laplace now threw himself into original research and, in the next seventeen years, 1771–1787, he produced much of his original work in astronomy.[6]

159

Pierre-Simon Laplace Laplace further impressed the Marquis de Condorcet, and even in 1771 Laplace felt that he was entitled to membership in the French Academy of Sciences. However, in that year, admission went to Alexandre-Théophile Vandermonde and in 1772 to Jacques Antoine Joseph Cousin. Laplace was disgruntled, and at the beginning of 1773, d'Alembert wrote to Lagrange in Berlin to ask if a position could be found for Laplace there. However, Condorcet became permanent secretary of the Académie in February and Laplace was elected associate member on 31 March, at age 24.[7] He married Marie-Charlotte de Courty de Romanges in his late thirties and the couple had a daughter, Sophie, and a son, Charles-Émile (b. 1789).[3] [8]

Analysis, probability and astronomical stability Laplace's early published work in 1771 started with differential equations and finite differences but he was already starting to think about the mathematical and philosophical concepts of probability and statistics.[9] However, before his election to the Académie in 1773, he had already drafted two papers that would establish his reputation. The first, Mémoire sur la probabilité des causes par les événements was ultimately published in 1774 while the second paper, published in 1776, further elaborated his statistical thinking and also began his systematic work on celestial mechanics and the stability of the solar system. The two disciplines would always be interlinked in his mind. "Laplace took probability as an instrument for repairing defects in knowledge."[10] Laplace's work on probability and statistics is discussed below with his mature work on the Analytic theory of probabilities.

Stability of the solar system Sir Isaac Newton had published his Philosophiae Naturalis Principia Mathematica in 1687 in which he gave a derivation of Kepler's laws, which describe the motion of the planets, from his laws of motion and his law of universal gravitation. However, though Newton had privately developed the methods of calculus, all his published work used cumbersome geometric reasoning, unsuitable to account for the more subtle higher-order effects of interactions between the planets. Newton himself had doubted the possibility of a mathematical solution to the whole, even concluding that periodic divine intervention was necessary to guarantee the stability of the solar system. Dispensing with the hypothesis of divine intervention would be a major activity of Laplace's scientific life.[11] It is now generally regarded that Laplace's methods on their own, though vital to the development of the theory, are not sufficiently precise to demonstrate the stability of the Solar System,[12] and indeed, the Solar System is now understood to be chaotic, although it actually appears to be fairly stable. One particular problem from observational astronomy was the apparent instability whereby Jupiter's orbit appeared to be shrinking while that of Saturn was expanding. The problem had been tackled by Leonhard Euler in 1748 and Joseph Louis Lagrange in 1763 but without success.[13] In 1776, Laplace published a memoir in which he first explored the possible influences of a purported luminiferous ether or of a law of gravitation that did not act instantaneously. He ultimately returned to an intellectual investment in Newtonian gravity.[14] Euler and Lagrange had made a practical approximation by ignoring small terms in the equations of motion. Laplace noted that though the terms themselves were small, when integrated over time they could become important. Laplace carried his analysis into the higher-order terms, up to and including the cubic. Using this more exact analysis, Laplace concluded that any two planets and the sun must be in mutual equilibrium and thereby launched his work on the stability of the solar system.[15] Gerald James Whitrow described the achievement as "the most important advance in physical astronomy since Newton".[11] Laplace had a wide knowledge of all sciences and dominated all discussions in the Académie.[16] Laplace seems to have regarded analysis merely as a means of attacking physical problems, though the ability with which he invented the necessary analysis is almost phenomenal. As long as his results were true he took but little trouble to explain the steps by which he arrived at them; he never studied elegance or symmetry in his processes, and it was sufficient for him if he could by any means solve the particular question he was discussing.[6]

160

Pierre-Simon Laplace

161

On the figure of the Earth During the years 1784–1787 he published some memoirs of exceptional power. Prominent among these is one read in 1783, reprinted as Part II of Théorie du Mouvement et de la figure elliptique des planètes in 1784, and in the third volume of the Méchanique céleste. In this work, Laplace completely determined the attraction of a spheroid on a particle outside it. This is memorable for the introduction into analysis of spherical harmonics or Laplace's coefficients, and also for the development of the use of the potential, a name first used by George Green in 1828.[6]

Spherical harmonics In 1783, in a paper sent to the Académie, Adrien-Marie Legendre had introduced what are now known as associated Legendre functions.[6] If two points in a plane have polar co-ordinates (r, θ) and (r ', θ'), where r ' ≥ r, then, by elementary manipulation, the reciprocal of the distance between the points, d, can be written as:

Spherical harmonics

This expression can be expanded in powers of r/r ' using Newton's generalized binomial theorem to give:

The sequence of functions P0k(cosф) is the set of so-called "associated Legendre functions" and their usefulness arises from the fact that every function of the points on a circle can be expanded as a series of them.[6] Laplace, with scant regard for credit to Legendre, made the non-trivial extension of the result to three dimensions to yield a more general set of functions, the spherical harmonics or Laplace coefficients. The latter term is not now in common use. Every function of the points on a sphere can be expanded as a series of them.[6]

Potential theory This paper is also remarkable for the development of the idea of the scalar potential.[6] The gravitational force acting on a body is, in modern language, a vector, having magnitude and direction. A potential function is a scalar function that defines how the vectors will behave. A scalar function is computationally and conceptually easier to deal with than a vector function. Alexis Clairault had first suggested the idea in 1743 while working on a similar problem though he was using Newtonian-type geometric reasoning. Laplace described Clairault's work as being "in the class of the most beautiful mathematical productions".[17] However, Rouse Ball alleges that the idea "was appropriated from Joseph Louis Lagrange, who had used it in his memoirs of 1773, 1777 and 1780".[6] Laplace applied the language of calculus to the potential function and showed that it always satisfies the differential equation:[6]

Pierre-Simon Laplace His subsequent work on gravitational attraction was based on this result. The quantity ∇2V has been termed the concentration of and its value at any point indicates the "excess" of the value of there over its mean value in the neighbourhood of the point. Laplace's equation, a special case of Poisson's equation, appears ubiquitously in mathematical physics. The concept of a potential occurs in fluid dynamics, electromagnetism and other areas. Rouse Ball speculated that it might be seen as "the outward sign" of one the "prior forms" in Kant's theory of perception.[6] The spherical harmonics turn out to be critical to practical solutions of Laplace's equation. Laplace's equation in spherical coordinates, such as are used for mapping the sky, can be simplified, using the method of separation of variables into a radial part, depending solely on distance from the centre point, and an angular or spherical part. The solution to the spherical part of the equation can be expressed as a series of Laplace's spherical harmonics, simplifying practical computation.

Planetary and lunar inequalities Jupiter-Saturn great inequality Laplace presented a memoir on planetary inequalities in three sections, in 1784, 1785, and 1786. This dealt mainly with the identification and explanation of the perturbations now known as the "great Jupiter-Saturn inequality". Laplace solved a longstanding problem in the study and prediction of the movements of these planets. He showed by general considerations, first, that the mutual action of two planets could never cause large changes in the eccentricities and inclinations of their orbits; but then, even more importantly, that peculiarities arose in the Jupiter-Saturn system because of the near approach to commensurability of the mean motions of Jupiter and Saturn. (Commensurability, in this context, means related by ratios of small whole numbers. Two periods of Saturn's orbit around the Sun almost equal five of Jupiter's. The corresponding difference between multiples of the mean motions, (2nJ − 5nS), corresponds to a period of nearly 900 years, and it occurs as a small divisor in the integration of a very small perturbing force with this same period. As a result, the integrated perturbations with this period are disproportionately large, about 0.8° (degrees of arc in orbital longitude) for Saturn and about 0.3° for Jupiter.) Further developments of these theorems on planetary motion were given in his two memoirs of 1788 and 1789, but with the aid of Laplace's discoveries, the tables of the motions of Jupiter and Saturn could at last be made much more accurate. It was on the basis of Laplace's theory that Delambre computed his astronomical tables.[6]

Lunar inequalities Laplace also produced an analytical solution (as it turned out later, a partial solution), to a significant problem regarding the motion of the Moon. Edmond Halley had been the first to suggest, in 1695,[18] that the mean motion of the Moon was apparently getting faster, by comparison with ancient eclipse observations, but he gave no data. (It was not yet known in Halley's or Laplace's times that what is actually occurring includes a slowing-down of the Earth's rate of rotation: see also Ephemeris time - History. When measured as a function of mean solar time rather than uniform time, the effect appears as a positive acceleration.) In 1749 Richard Dunthorne confirmed Halley's suspicion after re-examining ancient records, and produced the first quantitative estimate for the size of this apparent effect:[19] a centurial rate of +10" (arcseconds) in lunar longitude (a surprisingly good result for its time, not far different from values assessed later, e.g. in 1786 by de Lalande[20] , and to compare with values from about 10" to nearly 13" being derived about century later.)[21] [22] The effect became known as the secular acceleration of the Moon, but until Laplace, its cause remained unknown. Laplace gave an explanation of the effect in 1787, showing how an acceleration arises from changes (a secular reduction) in the eccentricity of the Earth's orbit, which in turn is one of the effects of planetary perturbations on the Earth. Laplace's initial computation accounted for the whole effect, thus seeming to tie up the theory neatly with both modern and ancient observations. However, in 1853, J C Adams caused the question to be re-opened by finding an error in Laplace's computations: it turned out that only about half of the Moon's apparent acceleration could be

162

Pierre-Simon Laplace accounted for on Laplace's basis by the change in the Earth's orbital eccentricity.[23] (Adams showed that Laplace had in effect only considered the radial force on the moon and not the tangential, and the partial result hence had overstimated the acceleration, the remaining (negative), terms when accounted for, showed that Laplace's cause could not explain more than about half of the acceleration. The other half was subsequently shown to be due to tidal acceleration.[24] ) Laplace used his results concerning the lunar acceleration when completing his attempted "proof" of the stability of the whole solar system on the assumption that it consists of a collection of rigid bodies moving in a vacuum.[6] All the memoirs above alluded to were presented to the Académie des sciences, and they are printed in the Mémoires présentés par divers savants.[6]

Celestial mechanics Laplace now set himself the task to write a work which should "offer a complete solution of the great mechanical problem presented by the solar system, and bring theory to coincide so closely with observation that empirical equations should no longer find a place in astronomical tables." The result is embodied in the Exposition du système du monde and the Mécanique céleste.[6] The former was published in 1796, and gives a general explanation of the phenomena, but omits all details. It contains a summary of the history of astronomy. This summary procured for its author the honour of admission to the forty of the French Academy and is commonly esteemed one of the masterpieces of French literature, though it is not altogether reliable for the later periods of which it treats.[6] Laplace developed the nebular hypothesis of the formation of the solar system, first suggested by Emanuel Swedenborg and expanded by Immanuel Kant, a hypothesis that continues to dominate accounts of the origin of planetary systems. According to Laplace's description of the hypothesis, the solar system had evolved from a globular mass of incandescent gas rotating around an axis through its centre of mass. As it cooled, this mass contracted, and successive rings broke off from its outer edge. These rings in their turn cooled, and finally condensed into the planets, while the sun represented the central core which was still left. On this view, Laplace predicted that the more distant planets would be older than those nearer the sun.[6] [25] As mentioned, the idea of the nebular hypothesis had been outlined by Immanuel Kant in 1755,[25] and he had also suggested "meteoric aggregations" and tidal friction as causes affecting the formation of the solar system. Laplace was probably aware of this, but, like many writers of his time, he generally did not reference the work of others.[3] Laplace's analytical discussion of the solar system is given in his Méchanique céleste published in five volumes. The first two volumes, published in 1799, contain methods for calculating the motions of the planets, determining their figures, and resolving tidal problems. The third and fourth volumes, published in 1802 and 1805, contain applications of these methods, and several astronomical tables. The fifth volume, published in 1825, is mainly historical, but it gives as appendices the results of Laplace's latest researches. Laplace's own investigations embodied in it are so numerous and valuable that it is regrettable to have to add that many results are appropriated from other writers with scanty or no acknowledgement, and the conclusions – which have been described as the organized result of a century of patient toil – are frequently mentioned as if they were due to Laplace.[6] Jean-Baptiste Biot, who assisted Laplace in revising it for the press, says that Laplace himself was frequently unable to recover the details in the chain of reasoning, and, if satisfied that the conclusions were correct, he was content to insert the constantly recurring formula, "Il est aisé à voir que..." ("It is easy to see that..."). The Mécanique céleste is not only the translation of Newton's Principia into the language of the differential calculus, but it completes parts of which Newton had been unable to fill in the details. The work was carried forward in a more finely tuned form in Félix Tisserand's Traité de mécanique céleste (1889–1896), but Laplace's treatise will always remain a standard authority.[6]

163

Pierre-Simon Laplace

164

Arcueil In 1806, Laplace bought a house in Arcueil, then a village and not yet absorbed into the Paris conurbation. Claude Louis Berthollet was a near neighbour and the pair formed the nucleus of an informal scientific circle, latterly known as the Society of Arcueil. Because of their closeness to Napoleon, Laplace and Berthollet effectively controlled advancement in the scientific establishment and admission to the more prestigious offices. The Society built up a complex pyramid of patronage.[26] In 1806, he was also elected a foreign member of the Royal Swedish Academy of Sciences.

Laplace's house at Arcueil

Napoleon An account of a famous interaction between Laplace and Napoleon is provided by Rouse Ball:[6] Laplace went in state to Napoleon to accept a copy of his work, and the following account of the interview is well authenticated, and so characteristic of all the parties concerned that I quote it in full. Someone had told Napoleon that the book contained no mention of the name of God; Napoleon, who was fond of putting embarrassing questions, received it with the remark, 'M. Laplace, they tell me you have written this large book on the system of the universe, and have never even mentioned its Creator.' Laplace, who, though the most supple of politicians, was as stiff as a martyr on every point of his philosophy, drew himself up and answered bluntly, 'Je n'avais pas besoin de cette hypothèse-là.' ("I had no need of that hypothesis.") Napoleon, greatly amused, told this reply to Lagrange, who exclaimed, 'Ah! c'est une belle hypothèse; ça explique beaucoup de choses.' ("Ah, it is a fine hypothesis; it explains many things.")

Black holes Laplace also came close to propounding the concept of the black hole. He pointed out that there could be massive stars whose gravity is so great that not even light could escape from their surface (see escape velocity).[27] Laplace also speculated that some of the nebulae revealed by telescopes may not be part of the Milky Way and might actually be galaxies themselves. Thus, he anticipated Edwin Hubble's major discovery 100 years in advance.

Analytic theory of probabilities In 1812, Laplace issued his Théorie analytique des probabilités in which he laid down many fundamental results in statistics. In 1819, he published a popular account of his work on probability. This book bears the same relation to the Théorie des probabilités that the Système du monde does to the Méchanique céleste.[6]

Probability-generating function The method of estimating the ratio of the number of favourable cases, compared to the whole number of possible cases, had been previously indicated by Laplace in a paper written in 1779. It consists of treating the successive values of any function as the coefficients in the expansion of another function, with reference to a different variable. The latter is therefore called the probability-generating function of the former. Laplace then shows how, by means of interpolation, these coefficients may be determined from the generating function. Next he attacks the converse problem, and from the coefficients he finds the generating function; this is effected by the solution of a finite difference equation.[6]

Pierre-Simon Laplace

165

Least squares This treatise includes an exposition of the method of least squares, a remarkable testimony to Laplace's command over the processes of analysis. The method of least squares for the combination of numerous observations had been given empirically by Carl Friedrich Gauss (around 1794) and Legendre (in 1805), but the fourth chapter of this work contains a formal proof of it, on which the whole of the theory of errors has been since based. This was affected only by a most intricate analysis specially invented for the purpose, but the form in which it is presented is so meagre and unsatisfactory that, in spite of the uniform accuracy of the results, it was at one time questioned whether Laplace had actually gone through the difficult work he so briefly and often incorrectly indicates.[6]

Inductive probability While he conducted much research in physics, another major theme of his life's endeavours was probability theory. In his Essai philosophique sur les probabilités (1814), Laplace set out a mathematical system of inductive reasoning based on probability, which we would today recognise as Bayesian. He begins the text with a series of principles of probability, the first six being: 1) Probability is the ratio of the "favored events" to the total possible events. 2) The first principle assumed equal probabilities for all events. When this is not true, we must first determine the probabilities of each event. Then, the probability is the sum of the probabilities of all possible favored events. 3) For independent events, the probability of the occurrence of all is the probability of each multiplied together. 4) For events not independent, the probability of event B following event A (or event A causing B) is the probability of A multiplied by the probability that A and B both occur. 5) The probability that A will occur, given B has occurred, is the probability of A and B occurring divided by the probability of B. 6) Three corollaries are given for the sixth principle, which amount to Bayesian probability. Where event Ai ∈ {A1, A2, ...An} exhausts the list of possible causes for event B, Pr(B) = Pr(A1, A2, ...An). Then . One well-known formula arising from his system is the rule of succession, given as principle seven. Suppose that some trial has only two possible outcomes, labeled "success" and "failure". Under the assumption that little or nothing is known a priori about the relative plausibilities of the outcomes, Laplace derived a formula for the probability that the next trial will be a success.

where s is the number of previously observed successes and n is the total number of observed trials. It is still used as an estimator for the probability of an event if we know the event space, but only have a small number of samples. The rule of succession has been subject to much criticism, partly due to the example which Laplace chose to illustrate it. He calculated that the probability that the sun will rise tomorrow, given that it has never failed to in the past, was

where d is the number of times the sun has risen in the past. This result has been derided as absurd, and some authors have concluded that all applications of the Rule of Succession are absurd by extension. However, Laplace was fully aware of the absurdity of the result; immediately following the example, he wrote, "But this number [i.e., the probability that the sun will rise tomorrow] is far greater for him who, seeing in the totality of phenomena the principle regulating the days and seasons, realizes that nothing at the present moment can arrest the course of it."[28]

Pierre-Simon Laplace

Laplace's demon Laplace strongly believed in causal determinism, which is expressed in the following quote from the introduction to the Essai: We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes. —Pierre Simon Laplace, A Philosophical Essay on Probabilities[29] This intellect is often referred to as Laplace's Superman or Laplace's demon (in the same vein as Maxwell's demon). Note that the description of the hypothetical intellect described above by Laplace as a demon does not come from Laplace, but from later biographers: Laplace saw himself as a scientist who hoped that humanity would progress in a better scientific understanding of the world, which, if and when eventually completed, would still need a tremendous calculating power to compute it all in a single instant.

Laplace transforms As early as 1744, Euler, followed by Lagrange, had started looking for solutions of differential equations in the form:[30]

In 1785, Laplace took the key forward step in using integrals of this form in order to transform a whole difference equation, rather than simply as a form for the solution, and found that the transformed equation was easier to solve than the original.[31] [32]

Other discoveries and accomplishments Mathematics Amongst the other discoveries of Laplace in pure and applicable mathematics are: • Discussion, contemporaneously with Alexandre-Théophile Vandermonde, of the general theory of determinants, (1772);[6] • Proof that every equation of an even degree must have at least one real quadratic factor;[6] • Solution of the linear partial differential equation of the second order;[6] • He was the first to consider the difficult problems involved in equations of mixed differences, and to prove that the solution of an equation in finite differences of the first degree and the second order might be always obtained in the form of a continued fraction;[6] and • In his theory of probabilities: • Evaluation of several common definite integrals;[6] and • General proof of the Lagrange reversion theorem.[6]

166

Pierre-Simon Laplace

167

Surface tension Laplace built upon the qualitative work of Thomas Young to develop the theory of capillary action and the Young-Laplace equation.

Speed of sound Laplace in 1816 was the first to point out that the speed of sound in air depends on the heat capacity ratio. Newton's original theory gave too low a value, because it does not take account of the adiabatic compression of the air which results in a local rise in temperature and pressure. Laplace's investigations in practical physics were confined to those carried on by him jointly with Lavoisier in the years 1782 to 1784 on the specific heat of various bodies.[6]

Political ambitions According to W. W. Rouse Ball, as Napoleon's power increased Laplace begged him to give him the post of Minister of the Interior. However this is disputed by Pearson.[3] Napoleon, who desired the support of men of science, did make him Minister of the Interior in November 1799, but a little less than six weeks saw the close of Laplace's political career. Napoleon later (in his Mémoires de Sainte Hélène) wrote of his dismissal as follows:[6] Géomètre de premier rang, Laplace ne tarda pas à se montrer administrateur plus que médiocre; dès son premier travail nous reconnûmes que nous nous étions trompé. Laplace ne saisissait aucune question sous son véritable point de vue: il cherchait des subtilités partout, n'avait que des idées problématiques, et portait enfin l'esprit des `infiniment petits' jusque dans l'administration. (Geometrician of the first rank, Laplace was not long in showing himself a worse than average administrator; since his first actions in office we recognized our mistake. Laplace did not consider any question from the right angle: he sought subtleties everywhere, only conceived problems, and finally carried the spirit of "infinitesimals" into the administration.) Lucien, Napoleon's brother, was given the post. Although Laplace was removed from office, it was desirable to retain his allegiance. He was accordingly raised to the senate, and to the third volume of the Mécanique céleste he prefixed a note that of all the truths therein contained the most precious to the author was the declaration he thus made of his devotion towards the peacemaker of Europe. In copies sold after the Bourbon Restoration this was struck out. (Pearson points out that the censor would not have allowed it anyway.) In 1814 it was evident that the empire was falling; Laplace hastened to tender his services to the Bourbons, and in 1817 during the Restoration he was rewarded with the title of marquis. According to Rouse Ball, the contempt that his more honest colleagues felt for his conduct in the matter may be read in the pages of Paul Louis Courier. His knowledge was useful on the numerous scientific commissions on which he served, and probably accounts for the manner in which his political insincerity was overlooked.[6] Laplace

He died in Paris in 1827. His brain was removed by his physician, François Magendie, and kept for many years, eventually being displayed in a roving anatomical museum in Britain. It was reportedly smaller than the average brain.[3]

Pierre-Simon Laplace

Honours • Asteroid 4628 Laplace is named for him.[33] • He is one of only seventy-two people to have their name engraved on the Eiffel Tower. • The European Space Agency's working-title for the international Europa Jupiter System Mission is "Laplace".

Quotes • What we know is not much. What we do not know is immense. (attributed) • I had no need of that hypothesis. ("Je n'avais pas besoin de cette hypothèse-là", as a reply to Napoleon, who had asked why he hadn't mentioned God in his book on astronomy.) • "It is therefore obvious that ..." (frequently used in the Celestial Mechanics when he had proved something and mislaid the proof, or found it clumsy. Notorious as a signal for something true, but hard to prove.) • The weight of evidence for an extraordinary claim must be proportioned to its strangeness.[34] • "...(This simplicity of ratios will not appear astonishing if we consider that) all the effects of nature are only mathematical results of a small number of immutable laws." [29]

Bibliography By Laplace • Œuvres complètes de Laplace [35], 14 vol. (1878–1912), Paris: Gauthier-Villars (copy from Gallica in French) • Théorie du movement et de la figure elliptique des planètes (1784) Paris (not in Œuvres complètes) • Précis de l'histoire de l'astronomie [36] English translations • Bowditch, N. (trans.) (1829–1839) Mécanique céleste, 4 vols, Boston • • • • • •

• New edition by Reprint Services ISBN 078122022X — [1829–1839] (1966–1969) Celestial Mechanics, 5 vols, including the original French Pound, J. (trans.) (1809) The System of the World, 2 vols, London: Richard Phillips _ The System of the World (v.1) [37] _ The System of the World (v.2) [38] — [1809] (2007) The System of the World, vol.1, Kessinger, ISBN 1432653679 Toplis, J. (trans.) (1814) A treatise upon analytical mechanics [39] Nottingham: H. Barnett

• Truscott, F. W. & Emory, F. L. (trans.) (2007) [1902]. A Philosophical Essay on Probabilities. ISBN 1602063281., translated from the French 6th ed. (1840) • A Philosophical Essay on Probabilities (1902) [40] at the Internet Archive

About Laplace and his work • Andoyer, H. (1922). L'œuvre scientifique de Laplace. Paris: Payot. (in French) • Bigourdan, G. (1931). "La jeunesse de P.-S. Laplace" (in French). La Science moderne 9: 377–384. • Crosland, M. (1967). The Society of Arcueil: A View of French Science at the Time of Napoleon I. Cambridge MA: Harvard University Press. ISBN 043554201X. • Dale, A. I. (1982). "Bayes or Laplace? an examination of the origin and early application of Bayes' theorem". Archive for the History of the Exact Sciences 27: 23–47. • David, F. N. (1965) "Some notes on Laplace", in Neyman, J. & LeCam, L. M. (eds) Bernoulli, Bayes and Laplace, Berlin, pp30–44

168

Pierre-Simon Laplace • Deakin, M. A. B. (1981). "The development of the Laplace transform". Archive for the History of the Exact Sciences 25: 343–390. doi:10.1007/BF01395660. • — (1982). "The development of the Laplace transform". Archive for the History of the Exact Sciences 26: 351–381. doi:10.1007/BF00418754. • Dhombres, J. (1989). "La théorie de la capillarité selon Laplace: mathématisation superficielle ou étendue" (in French). Revue d'Histoire des sciences et de leurs applications 62: 43–70. • Duveen, D. & Hahn, R. (1957). "Laplace's succession to Bezout's post of Examinateur des élèves de l'artillerie". Isis 48: 416–427. doi:10.1086/348608. • Finn, B. S. (1964). "Laplace and the speed of sound". Isis 55: 7–19. doi:10.1086/349791. • Fourier, J. B. J. (1827). "Éloge historique de M. le Marquis de Laplace". Mémoires de l'Académie Royale des Sciences 10: lxxxi–cii., delivered 15 June 1829, published in 1831. (in French) • Gillispie, C. C. (1972). "Probability and politics: Laplace, Condorcet, and Turgot". Proceedings of the American Philosophical Society 116(1): 1–20. • — (1997) Pierre Simon Laplace 1749–1827: A Life in Exact Science, Princeton: Princeton University Press, ISBN 0-691-01185-0 • Grattan-Guinness, I., 2005, "'Exposition du système du monde' and 'Traité de méchanique céleste'" in his Landmark Writings in Western Mathematics. Elsevier: 242–57. • Hahn, R. (1955). "Laplace's religious views". Archives internationales d'histoire des sciences 8: 38–40. • — (1982). Calendar of the Correspondence of Pierre Simon Laplace (Berkeley Papers in the History of Science, vol.8 ed.). Berkeley, CA: University of California. • — (1994). New Calendar of the Correspondence of Pierre Simon Laplace (Berkeley Papers in the History of Science, vol.16 ed.). Berkeley, CA: University of California. • — (2005) Pierre Simon Laplace 1749–1827: A Determined Scientist, Cambridge, MA: Harvard University Press, ISBN 0-674-01892-3 • Israel, Werner (1987). "Dark stars: the evolution of an idea". in Hawking, Stephen W.; Israel, Werner. 300 Years of Gravitation. Cambridge University Press. pp. 199–276 • O'Connor, John J.; Robertson, Edmund F., "Pierre-Simon Laplace" [41], MacTutor History of Mathematics archive, University of St Andrews. (1999) • Rouse Ball, W. W. [1908] (2003) "Pierre Simon Laplace (1749–1827) [42]", in A Short Account of the History of Mathematics, 4th ed., Dover, ISBN 0486206300 • Stigler, S. M. (1975). "Napoleonic statistics: the work of Laplace" [43]. Biometrika (Biometrika, Vol. 62, No. 2) 62 (2): 503–517. doi:10.2307/2335393. • — (1978). "Laplace's early work: chronology and citations". Isis 69(2): 234–254. • Whitrow, G. J. (2001) "Laplace, Pierre-Simon, marquis de", Encyclopaedia Britannica, Deluxe CDROM edition • Whittaker, E. T. (1949a). "Laplace" [44]. Mathematical Gazette (The Mathematical Gazette, Vol. 33, No. 303) 33 (303): 1–12. doi:10.2307/3608408. • — (1949b). "Laplace". American Mathematical Monthly 56(6): 369–372. • Wilson, C. (1985). "The Great Inequality of Jupiter and Saturn: from Kepler to Laplace". Archive for the History of the Exact Sciences 33(1–3): 15–290. doi:10.1007/BF00328048. • Young, T. (1821). Elementary Illustrations of the Celestial Mechanics of Laplace: Part the First, Comprehending the First Book [45]. London: John Murray. (available from Google Books)

169

Pierre-Simon Laplace

External links • "Laplace, Pierre (1749–1827)" [46]. Eric Weisstein's World of Scientific Biography. Wolfram Research. Retrieved 2007-08-24. • "Pierre-Simon Laplace [41]" in the MacTutor History of Mathematics archive. • "Bowditch's English translation of Laplace's preface" [47]. Méchanique Céleste. The MacTutor History of Mathematics archive. Retrieved 2007-09-04. • Guide to the Pierre Simon Laplace Papers [48] at The Bancroft Library • Pierre-Simon Laplace [49] at the Mathematics Genealogy Project • English translation [50] of a large part of Laplace's work in probability and statistics, provided by Richard Pulskamp [51]

References [1] Stephen M. Stigler (1986) The history of statistics. Harvard University press. Chapter 3. [2] [Anon.] (1911) " Pierre Simon, Marquis De Laplace (http:/ / www. 1911encyclopedia. org/ Pierre_Simon,_Marquis_De_Laplace)", Encyclopaedia Britannica [3] "Laplace, being Extracts from Lectures delivered by Karl Pearson", Biometrika, vol. 21, Dec. 1929, pp. 202–16 [4] *O'Connor, John J.; Robertson, Edmund F., "Pierre-Simon Laplace" (http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace. html), MacTutor History of Mathematics archive, University of St Andrews, ., accessed 25 August 2007 [5] Gillispie (1997) pp3–4 [6] Rouse Ball (1908) [7] Gillispie (1997) pp5 [8] "Pierre-Simon Laplace". Catholic Encyclopedia. New York: Robert Appleton Company. 1913. [9] Gillispie (1989) pp7–12 [10] Gillispie (1989) pp14–15 [11] Whitrow (2001) [12] Celletti, A. & Perozzi, E. (2007). Celestial Mechanics: The Waltz of the Planets. Berlin: Springer. pp. 91–93. ISBN 0-387-30777-X. [13] Whittaker (1949b) [14] Gillispie (1989) pp29–35 [15] Gillispie (1989) pp35–36 [16] School of Mathematics and Statistics (http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace. html), University of St Andrews, Scotland. [17] Grattan-Guinness, I. (2003). Companion Encyclopedia of the History and Philosophy of the Mathematical Sciences (http:/ / books. google. com/ ?id=f5FqsDPVQ2MC& pg=PA1098& lpg=PA1098& dq=laplace+ potential+ 1784). Baltimore: Johns Hopkins University Press. pp. 1097–1098. ISBN 0801873967. . [18] E Halley (1695), "Some Account of the Ancient State of the City of Palmyra, with Short Remarks upon the Inscriptions Found there" (http:/ / rstl. royalsocietypublishing. org/ content/ 19/ 215-235/ 160. full. pdf), Phil. Trans., vol.19 (1695-1697), pages 160-175; esp. at pages 174-175. [19] Richard Dunthorne (1749), "A Letter from the Rev. Mr. Richard Dunthorne to the Reverend Mr. Richard Mason F. R. S. and Keeper of the Wood-Wardian Museum at Cambridge, concerning the Acceleration of the Moon" (http:/ / rstl. royalsocietypublishing. org/ content/ 46/ 491-496/ 162. full. pdf), Philosophical Transactions (1683-1775), Vol. 46 (1749 - 1750) #492, pp.162-172; also given in Philosophical Transactions (abridgements) (1809), vol.9 (for 1744-49), p669-675 (http:/ / www. archive. org/ stream/ philosophicaltra09royarich#page/ 669/ mode/ 2up) as "On the Acceleration of the Moon, by the Rev. Richard Dunthorne". [20] J de Lalande (1786): "Sur les equations seculaires du soleil et de la lune" (http:/ / www. academie-sciences. fr/ membres/ in_memoriam/ Lalande/ Lalande_pdf/ Mem1786_p390. pdf), Memoires de l'Academie Royale des Sciences, pp.390-397, at page 395. [21] J D North (2008), "Cosmos: an illustrated history of astronomy and cosmology", (University of Chicago Press, 2008), chapter 14, at page 454 (http:/ / books. google. com/ books?id=qq8Luhs7rTUC& pg=PA454). [22] See also P Puiseux (1879), "Sur l'acceleration seculaire du mouvement de la Lune" (http:/ / archive. numdam. org/ article/ ASENS_1879_2_8__361_0. pdf), Annales Scientifiques de l'Ecole Normale Superieure, 2nd series vol.8 (1879), pp.361-444, at pages 361-5. [23] J C Adams (1853), "On the Secular Variation of the Moon's Mean Motion" (http:/ / rstl. royalsocietypublishing. org/ content/ 143/ 397. full. pdf), in Phil. Trans. R. Soc. Lond., vol.143 (1853), pages 397-406. [24] Roy, A. E. (2005). Orbital Motion (http:/ / books. google. com/ ?id=Hzv7k2vH6PgC& pg=PA313& lpg=PA313& dq=laplace+ secular+ acceleration). London: CRC Press. pp. 313. ISBN 0750310154. . [25] Owen, T. C. (2001) "Solar system: origin of the solar system", Encyclopaedia Britannica, Deluxe CDROM edition [26] Crosland (1967) p.1 [27] See Israel (1987), sec. 7.2.

170

Pierre-Simon Laplace

171

[28] Laplace, Pierre Simon, A Philosophical Essay on Probabilities, translated from the 6th French edition by Frederick Wilson Truscott and Frederick Lincoln Emory, Dover Publications (New York, 1951) [29] Laplace, Pierre Simon, A Philosophical Essay on Probabilities, translated from the 6th French edition by Frederick Wilson Truscott and Frederick Lincoln Emory, Dover Publications (New York, 1951) pp.4 [30] Grattan-Guiness, in Gillispie (1997) p.260 [31] Grattan-Guiness, in Gillispie (1997) pp261–262 [32] Deakin (1981) [33] Schmadel, L. D. (2003). Dictionary of Minor Planet Names (5th rev. ed.). Berlin: Springer-Verlag. ISBN 3540002383. [34] A sense of place in the heartland (http:/ / www. jsonline. com/ story/ index. aspx?id=497783& format=print), The Milwaukee Journal Sentinel Online [35] http:/ / gallica. bnf. fr/ ark:/ 12148/ bpt6k775950 [36] http:/ / books. google. com/ books?id=QYpOb3N7zBMC [37] http:/ / books. google. com/ books?id=yW3nd4DSgYYC [38] http:/ / books. google. com/ books?id=f7Kv2iFUNJoC [39] http:/ / books. google. com/ books?id=c2YSAAAAIAAJ [40] http:/ / www. archive. org/ details/ philosophicaless00lapliala [41] http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace. html [42] http:/ / www. maths. tcd. ie/ pub/ HistMath/ People/ Laplace/ RouseBall/ RB_Laplace. html [43] http:/ / jstor. org/ stable/ 2335393 [44] http:/ / jstor. org/ stable/ 3608408 [45] http:/ / books. google. com/ ?id=20AJAAAAIAAJ& dq=laplace [46] http:/ / scienceworld. wolfram. com/ biography/ Laplace. html [47] [48] [49] [50] [51]

http:/ / www-history. mcs. st-andrews. ac. uk/ history/ Extras/ Laplace_mechanique_celeste. html http:/ / www. oac. cdlib. org/ findaid/ ark:/ 13030/ kt8q2nf3g7/ http:/ / genealogy. math. ndsu. nodak. edu/ id. php?id=108295 http:/ / www. cs. xu. edu/ math/ Sources/ Laplace/ index. html http:/ / www. cs. xu. edu/ math/ Sources/ index. html

Integral Integration is an important concept in mathematics and, together with differentiation, is one of the two main operations in calculus. Given a function ƒ of a real variable x and an interval [a, b] of the real line, the definite integral

A definite integral of a function can be represented as the signed area of the region bounded by its graph.

is defined informally to be the net signed area of the region in the xy-plane bounded by the graph of ƒ, the x-axis, and the vertical lines x = a and x = b. The term integral may also refer to the notion of antiderivative, a function F whose derivative is the given function ƒ. In this case, it is called an indefinite integral, while the integrals discussed in this article are termed definite

Integral integrals. Some authors maintain a distinction between antiderivatives and indefinite integrals. The principles of integration were formulated independently by Isaac Newton and Gottfried Leibniz in the late 17th century. Through the fundamental theorem of calculus, which they independently developed, integration is connected with differentiation: if ƒ is a continuous real-valued function defined on a closed interval [a, b], then, once an antiderivative F of ƒ is known, the definite integral of ƒ over that interval is given by

Integrals and derivatives became the basic tools of calculus, with numerous applications in science and engineering. A rigorous mathematical definition of the integral was given by Bernhard Riemann. It is based on a limiting procedure which approximates the area of a curvilinear region by breaking the region into thin vertical slabs. Beginning in the nineteenth century, more sophisticated notions of integrals began to appear, where the type of the function as well as the domain over which the integration is performed has been generalised. A line integral is defined for functions of two or three variables, and the interval of integration [a, b] is replaced by a certain curve connecting two points on the plane or in the space. In a surface integral, the curve is replaced by a piece of a surface in the three-dimensional space. Integrals of differential forms play a fundamental role in modern differential geometry. These generalizations of integral first arose from the needs of physics, and they play an important role in the formulation of many physical laws, notably those of electrodynamics. There are many modern concepts of integration, among these, the most common is based on the abstract mathematical theory known as Lebesgue integration, developed by Henri Lebesgue.

History Pre-calculus integration Integration can be traced as far back as ancient Egypt ca. 1800 BC, with the Moscow Mathematical Papyrus demonstrating knowledge of a formula for the volume of a pyramidal frustum. The first documented systematic technique capable of determining integrals is the method of exhaustion of Eudoxus (ca. 370 BC), which sought to find areas and volumes by breaking them up into an infinite number of shapes for which the area or volume was known. This method was further developed and employed by Archimedes and used to calculate areas for parabolas and an approximation to the area of a circle. Similar methods were independently developed in China around the 3rd century AD by Liu Hui, who used it to find the area of the circle. This method was later used in the 5th century by Chinese father-and-son mathematicians Zu Chongzhi and Zu Geng to find the volume of a sphere.[1] That same century, the Indian mathematician Aryabhata used a similar method in order to find the volume of a cube.[2] The next major step in integral calculus came in Iraq when the 11th century mathematician Ibn al-Haytham (known as Alhazen in Europe) devised what is now known as "Alhazen's problem", which leads to an equation of the fourth degree, in his Book of Optics. While solving this problem, he performed an integration in order to find the volume of a paraboloid. Using mathematical induction, he was able to generalize his result for the integrals of polynomials up to the fourth degree. He thus came close to finding a general formula for the integrals of polynomials, but he was not concerned with any polynomials higher than the fourth degree.[3] Some ideas of integral calculus are also found in the Siddhanta Shiromani, a 12th century astronomy text by Indian mathematician Bhāskara II. The next significant advances in integral calculus did not begin to appear until the 16th century. At this time the work of Cavalieri with his method of indivisibles, and work by Fermat, began to lay the foundations of modern calculus. Further steps were made in the early 17th century by Barrow and Torricelli, who provided the first hints of a connection between integration and differentiation. At around the same time, there was also a great deal of work being done by Japanese mathematicians, particularly by Seki Kōwa.[4] He made a number of contributions, namely in methods of determining areas of figures using integrals, extending the method of exhaustion.

172

Integral

Newton and Leibniz The major advance in integration came in the 17th century with the independent discovery of the fundamental theorem of calculus by Newton and Leibniz. The theorem demonstrates a connection between integration and differentiation. This connection, combined with the comparative ease of differentiation, can be exploited to calculate integrals. In particular, the fundamental theorem of calculus allows one to solve a much broader class of problems. Equal in importance is the comprehensive mathematical framework that both Newton and Leibniz developed. Given the name infinitesimal calculus, it allowed for precise analysis of functions within continuous domains. This framework eventually became modern calculus, whose notation for integrals is drawn directly from the work of Leibniz.

Formalizing integrals While Newton and Leibniz provided a systematic approach to integration, their work lacked a degree of rigour. Bishop Berkeley memorably attacked infinitesimals as "the ghosts of departed quantities". Calculus acquired a firmer footing with the development of limits and was given a suitable foundation by Cauchy in the first half of the 19th century. Integration was first rigorously formalized, using limits, by Riemann. Although all bounded piecewise continuous functions are Riemann integrable on a bounded interval, subsequently more general functions were considered, to which Riemann's definition does not apply, and Lebesgue formulated a different definition of integral, founded in measure theory (a subfield of real analysis). Other definitions of integral, extending Riemann's and Lebesgue's approaches, were proposed.

Notation Isaac Newton used a small vertical bar above a variable to indicate integration, or placed the variable inside a box. The vertical bar was easily confused with or , which Newton used to indicate differentiation, and the box notation was difficult for printers to reproduce, so these notations were not widely adopted. The modern notation for the indefinite integral was introduced by Gottfried Leibniz in 1675 (Burton 1988, p. 359; Leibniz 1899, p. 154). He adapted the integral symbol, ∫, from an elongated letter s, standing for summa (Latin for "sum" or "total"). The modern notation for the definite integral, with limits above and below the integral sign, was first used by Joseph Fourier in Mémoires of the French Academy around 1819–20, reprinted in his book of 1822 (Cajori 1929, pp. 249–250; Fourier 1822, §231).

Terminology and notation If a function has an integral, it is said to be integrable. The function for which the integral is calculated is called the integrand. The region over which a function is being integrated is called the domain of integration. Usually this domain will be an interval in which case it is enough to give the limits of that interval, which are called the limits of integration. If the integral does not have a domain of integration, it is considered indefinite (one with a domain is considered definite). In general, the integrand may be a function of more than one variable, and the domain of integration may be an area, volume, a higher dimensional region, or even an abstract space that does not have a geometric structure in any usual sense. The simplest case, the integral of a real-valued function f of one real variable x on the interval [a, b], is denoted by

The ∫ sign represents integration; a and b are the lower limit and upper limit, respectively, of integration, defining the domain of integration; f is the integrand, to be evaluated as x varies over the interval [a,b]; and dx is the variable of integration. In correct mathematical typography, the dx is separated from the integrand by a space (as shown). Some authors use an upright d (that is, dx instead of dx).

173

Integral

174

The variable of integration dx has different interpretations depending on the theory being used. For example, it can be seen as strictly a notation indicating that x is a dummy variable of integration, as a reflection of the weights in the Riemann sum, a measure (in Lebesgue integration and its extensions), an infinitesimal (in non-standard analysis) or as an independent mathematical quantity: a differential form. More complicated cases may vary the notation slightly. In so-called modern Arabic mathematical notation, which aims at pre-university levels of education in the Arab world and is written from right to left, an inverted integral symbol is used (W3C 2006).

Introduction Integrals appear in many practical situations. Consider a swimming pool. If it is rectangular with a flat bottom, then from its length, width, and depth we can easily determine the volume of water it can contain (to fill it), the area of its surface (to cover it), and the length of its edge (to rope it). But if it is oval with a rounded bottom, all of these quantities call for integrals. Practical approximations may suffice for such trivial examples, but precision engineering (of any discipline) requires exact and rigorous values for these elements. To start off, consider the curve y = f(x) between x = 0 and x = 1, with f(x) = √x. We ask: What is the area under the function f, in the interval from 0 to 1? and call this (yet unknown) area the integral of f. The notation for this integral will be

Approximations to integral of √x from 0 to 1, with ■ 5 right samples (above) and ■ 12 left samples (below)

As a first approximation, look at the unit square given by the sides x = 0 to x = 1 and y = f(0) = 0 and y = f(1) = 1. Its area is exactly 1. As it is, the true value of the integral must be somewhat less. Decreasing the width of the approximation rectangles shall give a better result; so cross the interval in five steps, using the approximation points 0, 1⁄5, 2⁄5, and so on to 1. Fit a box for each step using the right end height of each curve piece, thus √1⁄5, √2⁄5, and so on to √1 = 1. Summing the areas of these rectangles, we get a better approximation for the sought integral, namely

Notice that we are taking a sum of finitely many function values of f, multiplied with the differences of two subsequent approximation points. We can easily see that the approximation is still too large. Using more steps produces a closer approximation, but will never be exact: replacing the 5 subintervals by twelve as depicted, we will get an approximate value for the area of 0.6203, which is too small. The key idea is the transition from adding finitely many differences of approximation points multiplied by their respective function values to using infinitely many fine, or infinitesimal steps. As for the actual calculation of integrals, the fundamental theorem of calculus, due to Newton and Leibniz, is the fundamental link between the operations of differentiating and integrating. Applied to the square root curve, f(x) = x1/2, it says to look at the antiderivative F(x) = 2⁄3x3/2, and simply take F(1) − F(0), where 0 and 1 are the boundaries

Integral of the interval [0,1]. So the exact value of the area under the curve is computed formally as

(This is a case of a general rule, that for f(x) = xq, with q ≠ −1, the related function, the so-called antiderivative is F(x) = (xq+1)/(q + 1).) The notation

conceives the integral as a weighted sum, denoted by the elongated s, of function values, f(x), multiplied by infinitesimal step widths, the so-called differentials, denoted by dx. The multiplication sign is usually omitted. Historically, after the failure of early efforts to rigorously interpret infinitesimals, Riemann formally defined integrals as a limit of weighted sums, so that the dx suggested the limit of a difference (namely, the interval width). Shortcomings of Riemann's dependence on intervals and continuity motivated newer definitions, especially the Lebesgue integral, which is founded on an ability to extend the idea of "measure" in much more flexible ways. Thus the notation

refers to a weighted sum in which the function values are partitioned, with μ measuring the weight to be assigned to each value. Here A denotes the region of integration. Differential geometry, with its "calculus on manifolds", gives the familiar notation yet another interpretation. Now f(x) and dx become a differential form, ω = f(x) dx, a new differential operator d, known as the exterior derivative appears, and the fundamental theorem becomes the more general Stokes' theorem,

from which Green's theorem, the divergence theorem, and the fundamental theorem of calculus follow. More recently, infinitesimals have reappeared with rigor, through modern innovations such as non-standard analysis. Not only do these methods vindicate the intuitions of the pioneers; they also lead to new mathematics. Although there are differences between these conceptions of integral, there is considerable overlap. Thus, the area of the surface of the oval swimming pool can be handled as a geometric ellipse, a sum of infinitesimals, a Riemann integral, a Lebesgue integral, or as a manifold with a differential form. The calculated result will be the same for all.

175

Integral

176

Formal definitions There are many ways of formally defining an integral, not all of which are equivalent. The differences exist mostly to deal with differing special cases which may not be integrable under other definitions, but also occasionally for pedagogical reasons. The most commonly used definitions of integral are Riemann integrals and Lebesgue integrals.

Riemann integral The Riemann integral is defined in terms of Riemann sums of functions with respect to tagged partitions of an interval. Let [a,b] be a closed interval of the real line; then a tagged partition of [a,b] is a finite sequence

Integral approached as Riemann sum based on tagged partition, with irregular sampling positions and widths (max in red). True value is 3.76; estimate is 3.648.

This partitions the interval [a,b] into n sub-intervals [xi−1, xi] indexed by i, each of which is "tagged" with a distinguished point ti ∈ [xi−1, xi]. A Riemann sum of a function f with respect to such a tagged partition is defined as

Riemann sums converging as intervals halve, whether sampled at ■ right, ■ minimum, ■ maximum, or ■ left.

thus each term of the sum is the area of a rectangle with height equal to the function value at the distinguished point of the given sub-interval, and width the same as the sub-interval width. Let Δi = xi−xi−1 be the width of sub-interval

Integral

177

i; then the mesh of such a tagged partition is the width of the largest sub-interval formed by the partition, maxi=1…n Δi. The Riemann integral of a function f over the interval [a,b] is equal to S if: For all ε > 0 there exists δ > 0 such that, for any tagged partition [a,b] with mesh less than δ, we have

When the chosen tags give the maximum (respectively, minimum) value of each interval, the Riemann sum becomes an upper (respectively, lower) Darboux sum, suggesting the close connection between the Riemann integral and the Darboux integral.

Lebesgue integral The Riemann integral is not defined for a wide range of functions and situations of importance in applications (and of interest in theory). For example, the Riemann integral can easily integrate density to find the mass of a steel beam, but cannot accommodate a steel ball resting on it. This motivates other definitions, under which a broader assortment of functions is integrable (Rudin 1987). The Lebesgue integral, in particular, achieves great flexibility by directing attention to the weights in the weighted sum. The definition of the Lebesgue integral thus begins with a measure, μ. In the simplest case, the Lebesgue measure μ(A) of an interval A = [a,b] is its width, b − a, so that the Lebesgue integral agrees with the (proper) Riemann integral when both exist. In more complicated cases, the sets being measured can be highly fragmented, with no continuity and no resemblance to intervals. To exploit this flexibility, Lebesgue integrals reverse the approach to the weighted sum. As Folland (1984, p. 56) puts it, "To compute the Riemann integral of f, one partitions the domain [a,b] into subintervals", while in the Lebesgue integral, "one is in effect partitioning the range of f". One common approach first defines the integral of the indicator function of a measurable set A by: . This extends by linearity to a measurable simple function s, which attains only a finite number, n, of distinct non-negative values:

(where the image of Ai under the simple function s is the constant value ai). Thus if E is a measurable set one defines

Then for any non-negative measurable function f one defines that is, the integral of f is set to be the supremum of all the integrals of simple functions that are less than or equal to f. A general measurable function f, is split into its positive and negative values by defining

Integral

178

Finally, f is Lebesgue integrable if

and then the integral is defined by

When the measure space on which the functions are defined is also a locally compact topological space (as is the case with the real numbers R), measures compatible with the topology in a suitable sense (Radon measures, of which the Lebesgue measure is an example) and integral with respect to them can be defined differently, starting from the integrals of continuous functions with compact support. More precisely, the compactly supported functions form a vector space that carries a natural topology, and a (Radon) measure can be defined as any continuous linear functional on this space; the value of a measure at a compactly supported function is then also by definition the integral of the function. One then proceeds to expand the measure (the integral) to more general functions by continuity, and defines the measure of a set as the integral of its indicator function. This is the approach taken by Bourbaki (2004) and a certain number of other authors. For details see Radon measures.

Other integrals Although the Riemann and Lebesgue integrals are the most widely used definitions of the integral, a number of others exist, including: • The Riemann–Stieltjes integral, an extension of the Riemann integral. • The Lebesgue-Stieltjes integral, further developed by Johann Radon, which generalizes the Riemann–Stieltjes and Lebesgue integrals. • The Daniell integral, which subsumes the Lebesgue integral and Lebesgue-Stieltjes integral without the dependence on measures. • The Henstock-Kurzweil integral, variously defined by Arnaud Denjoy, Oskar Perron, and (most elegantly, as the gauge integral) Jaroslav Kurzweil, and developed by Ralph Henstock. • The Itō integral and Stratonovich integral, which define integration with respect to stochastic processes such as Brownian motion.

Properties Linearity • The collection of Riemann integrable functions on a closed interval [a, b] forms a vector space under the operations of pointwise addition and multiplication by a scalar, and the operation of integration

is a linear functional on this vector space. Thus, firstly, the collection of integrable functions is closed under taking linear combinations; and, secondly, the integral of a linear combination is the linear combination of the integrals,

Integral

179

• Similarly, the set of real-valued Lebesgue integrable functions on a given measure space E with measure μ is closed under taking linear combinations and hence form a vector space, and the Lebesgue integral

is a linear functional on this vector space, so that

• More generally, consider the vector space of all measurable functions on a measure space (E,μ), taking values in a locally compact complete topological vector space V over a locally compact topological field K, f : E → V. Then one may define an abstract integration map assigning to each function f an element of V or the symbol ∞,

that is compatible with linear combinations. In this situation the linearity holds for the subspace of functions whose integral is an element of V (i.e. "finite"). The most important special cases arise when K is R, C, or a finite extension of the field Qp of p-adic numbers, and V is a finite-dimensional vector space over K, and when K=C and V is a complex Hilbert space. Linearity, together with some natural continuity properties and normalisation for a certain class of "simple" functions, may be used to give an alternative definition of the integral. This is the approach of Daniell for the case of real-valued functions on a set X, generalized by Nicolas Bourbaki to functions with values in a locally compact topological vector space. See (Hildebrandt 1953) for an axiomatic characterisation of the integral.

Inequalities for integrals A number of general inequalities hold for Riemann-integrable functions defined on a closed and bounded interval [a, b] and can be generalized to other notions of integral (Lebesgue and Daniell). • Upper and lower bounds. An integrable function f on [a, b], is necessarily bounded on that interval. Thus there are real numbers m and M so that m ≤ f (x) ≤ M for all x in [a, b]. Since the lower and upper sums of f over [a, b] are therefore bounded by, respectively, m(b − a) and M(b − a), it follows that

• Inequalities between functions. If f(x) ≤ g(x) for each x in [a, b] then each of the upper and lower sums of f is bounded above by the upper and lower sums, respectively, of g. Thus

This is a generalization of the above inequalities, as M(b − a) is the integral of the constant function with value M over [a, b]. • Subintervals. If [c, d] is a subinterval of [a, b] and f(x) is non-negative for all x, then

• Products and absolute values of functions. If f and g are two functions then we may consider their pointwise products and powers, and absolute values:

If f is Riemann-integrable on [a, b] then the same is true for |f|, and

Moreover, if f and g are both Riemann-integrable then f 2, g 2, and fg are also Riemann-integrable, and

Integral

180

This inequality, known as the Cauchy–Schwarz inequality, plays a prominent role in Hilbert space theory, where the left hand side is interpreted as the inner product of two square-integrable functions f and g on the interval [a, b]. • Hölder's inequality. Suppose that p and q are two real numbers, 1 ≤ p, q ≤ ∞ with 1/p + 1/q = 1, and f and g are two Riemann-integrable functions. Then the functions |f|p and |g|q are also integrable and the following Hölder's inequality holds:

For p = q = 2, Hölder's inequality becomes the Cauchy–Schwarz inequality. • Minkowski inequality. Suppose that p ≥ 1 is a real number and f and g are Riemann-integrable functions. Then |f|p, |g|p and |f + g|p are also Riemann integrable and the following Minkowski inequality holds: An analogue of this inequality for Lebesgue integral is used in construction of Lp spaces.

Conventions In this section f is a real-valued Riemann-integrable function. The integral

over an interval [a, b] is defined if a < b. This means that the upper and lower sums of the function f are evaluated on a partition a = x0 ≤ x1 ≤ . . . ≤ xn = b whose values xi are increasing. Geometrically, this signifies that integration takes place "left to right", evaluating f within intervals [x i , x i +1] where an interval with a higher index lies to the right of one with a lower index. The values a and b, the end-points of the interval, are called the limits of integration of f. Integrals can also be defined if a > b: • Reversing limits of integration. If a > b then define

This, with a = b, implies: • Integrals over intervals of length zero. If a is a real number then

The first convention is necessary in consideration of taking integrals over subintervals of [a, b]; the second says that an integral taken over a degenerate interval, or a point, should be zero. One reason for the first convention is that the integrability of f on an interval [a, b] implies that f is integrable on any subinterval [c, d], but in particular integrals have the property that: • Additivity of integration on intervals. If c is any element of [a, b], then

With the first convention the resulting relation

is then well-defined for any cyclic permutation of a, b, and c.

Integral

181

Instead of viewing the above as conventions, one can also adopt the point of view that integration is performed of differential forms on oriented manifolds only. If M is such an oriented m-dimensional manifold, and M is the same manifold with opposed orientation and ω is an m-form, then one has:

These conventions correspond to interpreting the integrand as a differential form, integrated over a chain. In measure theory, by contrast, one interprets the integrand as a function f with respect to a measure and integrates over a subset A, without any notion of orientation; one writes

to indicate integration over a subset

A. This is a minor distinction in one dimension, but becomes subtler on higher dimensional manifolds; see Differential form: Relation with measures for details.

Fundamental theorem of calculus The fundamental theorem of calculus is the statement that differentiation and integration are inverse operations: if a continuous function is first integrated and then differentiated, the original function is retrieved. An important consequence, sometimes called the second fundamental theorem of calculus, allows one to compute integrals by using an antiderivative of the function to be integrated.

Statements of theorems • Fundamental theorem of calculus. Let f be a real-valued integrable function defined on a closed interval [a, b]. If F is defined for x in [a, b] by

then F is continuous on [a, b]. If f is continuous at x in [a, b], then F is differentiable at x, and F ′(x) = f(x). • Second fundamental theorem of calculus. Let f be a real-valued integrable function defined on a closed interval [a, b]. If F is a function such that F ′(x) = f(x) for all x in [a, b] (that is, F is an antiderivative of f), then

In particular, these are true whenever f is continuous on [a, b].

Integral

182

Extensions Improper integrals A "proper" Riemann integral assumes the integrand is defined and finite on a closed and bounded interval, bracketed by the limits of integration. An improper integral occurs when one or more of these conditions is not satisfied. In some cases such integrals may be defined by considering the limit of a sequence of proper Riemann integrals on progressively larger intervals. If the interval is unbounded, for instance at its upper end, then the improper integral is the limit as that endpoint goes to infinity.

The improper integral

has unbounded intervals for both domain and range.

If the integrand is only defined or finite on a half-open interval, for instance (a,b], then again a limit may provide a finite result.

That is, the improper integral is the limit of proper integrals as one endpoint of the interval of integration approaches either a specified real number, or ∞, or −∞. In more complicated cases, limits are required at both endpoints, or at interior points. Consider, for example, the function

integrated from 0 to ∞ (shown right). At the lower bound, as x goes to

0 the function goes to ∞, and the upper bound is itself ∞, though the function goes to 0. Thus this is a doubly improper integral. Integrated, say, from 1 to 3, an ordinary Riemann sum suffices to produce a result of . To integrate from 1 to ∞, a Riemann sum is not possible. However, any finite upper bound, say t (with t > 1), gives a well-defined result, . This has a finite limit as t goes to infinity, namely . Similarly, the integral from 1⁄3 to 1 allows a Riemann sum as well, coincidentally again producing . Replacing 1⁄3 by an arbitrary positive value s (with s < 1) is equally safe, giving

. This, too, has a finite limit as s goes to zero,

namely . Combining the limits of the two fragments, the result of this improper integral is This process does not guarantee success; a limit may fail to exist, or may be unbounded. For example, over the bounded interval 0 to 1 the integral of does not converge; and over the unbounded interval 1 to ∞ the integral of does not converge. It may also happen that an integrand is unbounded at an interior point, in which case the integral must be split at that point, and the limit integrals on both sides must exist and must be bounded. Thus

Integral

183

But the similar integral

cannot be assigned a value in this way, as the integrals above and below zero do not independently converge. (However, see Cauchy principal value.)

Multiple integration Integrals can be taken over regions other than intervals. In general, an integral over a set E of a function f is written:

Double integral as volume under a surface.

Here x need not be a real number, but can be another suitable quantity, for instance, a vector in R3. Fubini's theorem shows that such integrals can be rewritten as an iterated integral. In other words, the integral can be calculated by integrating one coordinate at a time. Just as the definite integral of a positive function of one variable represents the area of the region between the graph of the function and the x-axis, the double integral of a positive function of two variables represents the volume of the region between the surface defined by the function and the plane which contains its domain. (The same volume can be obtained via the triple integral — the integral of a function in three variables — of the constant function f(x, y, z) = 1 over the above mentioned region between the surface and the plane.) If the number of variables is higher, then the integral represents a hypervolume, a volume of a solid of more than three dimensions that cannot be graphed. For example, the volume of the cuboid of sides 4 × 6 × 5 may be obtained in two ways: • By the double integral

of the function f(x, y) = 5 calculated in the region D in the xy-plane which is the base of the cuboid. For example, if a rectangular base of such a cuboid is given via the xy inequalities 2 ≤ x ≤ 7, 4 ≤ y ≤ 9, our above

Integral

184 double integral now reads

From here, integration is conducted with respect to either x or y first; in this example, integration is first done with respect to x as the interval corresponding to x is the inner integral. Once the first integration is completed via the method or otherwise, the result is again integrated with respect to the other variable. The result will equate to the volume under the surface. • By the triple integral

of the constant function 1 calculated on the cuboid itself.

Line integrals The concept of an integral can be extended to more general domains of integration, such as curved lines and surfaces. Such integrals are known as line integrals and surface integrals respectively. These have important applications in physics, as when dealing with vector fields. A line integral (sometimes called a path integral) is an integral where the function to be integrated is evaluated along a curve. Various different line integrals are in use. In the case of a closed curve it is also called a contour integral. The function to be integrated may be a scalar field or a vector field. The value of the line integral is the sum of values of the field at all A line integral sums together elements along a points on the curve, weighted by some scalar function on the curve curve. (commonly arc length or, for a vector field, the scalar product of the vector field with a differential vector in the curve). This weighting distinguishes the line integral from simpler integrals defined on intervals. Many simple formulas in physics have natural continuous analogs in terms of line integrals; for example, the fact that work is equal to force, F, multiplied by displacement, s, may be expressed (in terms of vector quantities) as:

For an object moving along a path in a vector field

such as an electric field or gravitational field, the total work

done by the field on the object is obtained by summing up the differential work done in moving from . This gives the line integral

to

Integral

Surface integrals A surface integral is a definite integral taken over a surface (which may be a curved set in space); it can be thought of as the double integral analog of the line integral. The function to be integrated may be a scalar field or a vector field. The value of the surface integral is the sum of the field at all points on the surface. This can be achieved by splitting the surface into surface elements, which provide the partitioning for Riemann sums. For an example of applications of surface integrals, consider a vector The definition of surface integral relies on field v on a surface S; that is, for each point x in S, v(x) is a vector. splitting the surface into small surface elements. Imagine that we have a fluid flowing through S, such that v(x) determines the velocity of the fluid at x. The flux is defined as the quantity of fluid flowing through S in unit amount of time. To find the flux, we need to take the dot product of v with the unit surface normal to S at each point, which will give us a scalar field, which we integrate over the surface:

The fluid flux in this example may be from a physical fluid such as water or air, or from electrical or magnetic flux. Thus surface integrals have applications in physics, particularly with the classical theory of electromagnetism.

Integrals of differential forms A differential form is a mathematical concept in the fields of multivariable calculus, differential topology and tensors. The modern notation for the differential form, as well as the idea of the differential forms as being the wedge products of exterior derivatives forming an exterior algebra, was introduced by Élie Cartan. We initially work in an open set in Rn. A 0-form is defined to be a smooth function f. When we integrate a function f over an m-dimensional subspace S of Rn, we write it as

(The superscripts are indices, not exponents.) We can consider dx1 through dxn to be formal objects themselves, rather than tags appended to make integrals look like Riemann sums. Alternatively, we can view them as covectors, and thus a measure of "density" (hence integrable in a general sense). We call the dx1, …,dxn basic 1-forms. We define the wedge product, "∧", a bilinear "multiplication" operator on these elements, with the alternating property that for all indices a. Note that alternation along with linearity and associativity implies dxb∧dxa = −dxa∧dxb. This also ensures that the result of the wedge product has an orientation. We define the set of all these products to be basic 2-forms, and similarly we define the set of products of the form dxa∧dxb∧dxc to be basic 3-forms. A general k-form is then a weighted sum of basic k-forms, where the weights are the smooth functions f. Together these form a vector space with basic k-forms as the basis vectors, and 0-forms (smooth functions) as the field of scalars. The wedge product then extends to k-forms in the natural way. Over Rn at most n covectors can be linearly independent, thus a k-form with k > n will always be zero, by the alternating property. In addition to the wedge product, there is also the exterior derivative operator d. This operator maps k-forms to (k+1)-forms. For a k-form ω = f dxa over Rn, we define the action of d by:

185

Integral

186

with extension to general k-forms occurring linearly. This more general approach allows for a more natural coordinate-free approach to integration on manifolds. It also allows for a natural generalisation of the fundamental theorem of calculus, called Stokes' theorem, which we may state as

where ω is a general k-form, and ∂Ω denotes the boundary of the region Ω. Thus, in the case that ω is a 0-form and Ω is a closed interval of the real line, this reduces to the fundamental theorem of calculus. In the case that ω is a 1-form and Ω is a two-dimensional region in the plane, the theorem reduces to Green's theorem. Similarly, using 2-forms, and 3-forms and Hodge duality, we can arrive at Stokes' theorem and the divergence theorem. In this way we can see that differential forms provide a powerful unifying view of integration.

Summations The discrete equivalent of integration is summation. Summations and integrals can be put on the same foundations using the theory of Lebesgue integrals or time scale calculus.

Methods Computing integrals The most basic technique for computing definite integrals of one real variable is based on the fundamental theorem of calculus. Let f(x) be the function of x to be integrated over a given interval [a, b]. Then, find an antiderivative of f; that is, a function F such that F' = f on the interval. By the fundamental theorem of calculus—provided the integrand and integral have no singularities on the path of integration— The integral is not actually the antiderivative, but the fundamental theorem provides a way to use antiderivatives to evaluate definite integrals. The most difficult step is usually to find the antiderivative of f. It is rarely possible to glance at a function and write down its antiderivative. More often, it is necessary to use one of the many techniques that have been developed to evaluate integrals. Most of these techniques rewrite one integral as a different one which is hopefully more tractable. Techniques include: • • • • • • • • • •

Integration by substitution Integration by parts Changing the order of integration Integration by trigonometric substitution Integration by partial fractions Integration by reduction formulae Integration using parametric derivatives Integration using Euler's formula Differentiation under the integral sign Contour Integration

Alternate methods exist to compute more complex integrals. Many nonelementary integrals can be expanded in a Taylor series and integrated term by term. Occasionally, the resulting infinite series can be summed analytically. The method of convolution using Meijer G-functions can also be used, assuming that the integrand can be written as a product of Meijer G-functions. There are also many less common ways of calculating definite integrals; for instance,

Integral Parseval's identity can be used to transform an integral over a rectangular region into an infinite sum. Occasionally, an integral can be evaluated by a trick; for an example of this, see Gaussian integral. Computations of volumes of solids of revolution can usually be done with disk integration or shell integration. Specific results which have been worked out by various techniques are collected in the list of integrals.

Symbolic algorithms Many problems in mathematics, physics, and engineering involve integration where an explicit formula for the integral is desired. Extensive tables of integrals have been compiled and published over the years for this purpose. With the spread of computers, many professionals, educators, and students have turned to computer algebra systems that are specifically designed to perform difficult or tedious tasks, including integration. Symbolic integration presents a special challenge in the development of such systems. A major mathematical difficulty in symbolic integration is that in many cases, a closed formula for the antiderivative of a rather simple-looking function does not exist. For instance, it is known that the antiderivatives of the functions exp ( x2), xx and sin x /x cannot be expressed in the closed form involving only rational and exponential functions, logarithm, trigonometric and inverse trigonometric functions, and the operations of multiplication and composition; in other words, none of the three given functions is integrable in elementary functions. Differential Galois theory provides general criteria that allow one to determine whether the antiderivative of an elementary function is elementary. Unfortunately, it turns out that functions with closed expressions of antiderivatives are the exception rather than the rule. Consequently, computerized algebra systems have no hope of being able to find an antiderivative for a randomly constructed elementary function. On the positive side, if the 'building blocks' for antiderivatives are fixed in advance, it may be still be possible to decide whether the antiderivative of a given function can be expressed using these blocks and operations of multiplication and composition, and to find the symbolic answer whenever it exists. The Risch algorithm, implemented in Mathematica and other computer algebra systems, does just that for functions and antiderivatives built from rational functions, radicals, logarithm, and exponential functions. Some special integrands occur often enough to warrant special study. In particular, it may be useful to have, in the set of antiderivatives, the special functions of physics (like the Legendre functions, the hypergeometric function, the Gamma function, the Incomplete Gamma function and so on - see Symbolic integration for more details). Extending the Risch-Norman algorithm so that it includes these functions is possible but challenging. Most humans are not able to integrate such general formulae, so in a sense computers are more skilled at integrating highly complicated formulae. Very complex formulae are unlikely to have closed-form antiderivatives, so how much of an advantage this presents is a philosophical question that is open for debate.

Numerical quadrature The integrals encountered in a basic calculus course are deliberately chosen for simplicity; those found in real applications are not always so accommodating. Some integrals cannot be found exactly, some require special functions which themselves are a challenge to compute, and others are so complex that finding the exact answer is too slow. This motivates the study and application of numerical methods for approximating integrals, which today use floating-point arithmetic on digital electronic computers. Many of the ideas arose much earlier, for hand calculations; but the speed of general-purpose computers like the ENIAC created a need for improvements. The goals of numerical integration are accuracy, reliability, efficiency, and generality. Sophisticated methods can vastly outperform a naive method by all four measures (Dahlquist & Björck 2008; Kahaner, Moler & Nash 1989; Stoer & Bulirsch 2002). Consider, for example, the integral

187

Integral

188

which has the exact answer 94⁄25 = 3.76. (In ordinary practice the answer is not known in advance, so an important task — not explored here — is to decide when an approximation is good enough.) A “calculus book” approach divides the integration range into, say, 16 equal pieces, and computes function values.

Spaced function values x

−2.00

f(x) 2.22800 x

f(x)

−1.50

−1.00

−0.50

0.00

0.50

1.00

1.50

2.00

2.45663 2.67200 2.32475 0.64400 −0.92575 −0.94000 −0.16963 0.83600

−1.75

−1.25

−0.75

−0.25

0.25

0.75

1.25

1.75

2.33041 2.58562 2.62934 1.64019 −0.32444 −1.09159 −0.60387 0.31734

Using the left end of each piece, the rectangle method sums 16 function values and multiplies by the step width, h, here 0.25, to get an approximate value of 3.94325 for the integral. The accuracy is not impressive, but calculus formally uses pieces of infinitesimal width, so initially this may seem little cause for concern. Indeed, repeatedly doubling the number of steps eventually produces an approximation of 3.76001. However, 218 pieces are required, a great computational expense for such little accuracy; and a reach for greater accuracy can force steps so small that arithmetic precision becomes an obstacle. A better approach replaces the horizontal tops of the rectangles with slanted tops touching the function at the ends of each piece. This Numerical quadrature methods: ■ Rectangle, trapezium rule is almost as easy to calculate; it sums all 17 function ■ Trapezoid, ■ Romberg, ■ Gauss values, but weights the first and last by one half, and again multiplies by the step width. This immediately improves the approximation to 3.76925, which is noticeably more accurate. Furthermore, only 210 pieces are needed to achieve 3.76000, substantially less computation than the rectangle method for comparable accuracy. Romberg's method builds on the trapezoid method to great effect. First, the step lengths are halved incrementally, giving trapezoid approximations denoted by T(h0), T(h1), and so on, where hk+1 is half of hk. For each new step size, only half the new function values need to be computed; the others carry over from the previous size (as shown in the table above). But the really powerful idea is to interpolate a polynomial through the approximations, and extrapolate to T(0). With this method a numerically exact answer here requires only four pieces (five function values)! The Lagrange polynomial interpolating {hk,T(hk)}k=0…2 = {(4.00,6.128), (2.00,4.352), (1.00,3.908)} is 3.76+0.148h2, producing the extrapolated value 3.76 at h = 0. Gaussian quadrature often requires noticeably less work for superior accuracy. In this example, it can compute the function values at just two x positions, ±2⁄√3, then double each value and sum to get the numerically exact answer. The explanation for this dramatic success lies in error analysis, and a little luck. An n-point Gaussian method is exact for polynomials of degree up to 2n−1. The function in this example is a degree 3 polynomial, plus a term that cancels because the chosen endpoints are symmetric around zero. (Cancellation also benefits the Romberg method.) Shifting the range left a little, so the integral is from −2.25 to 1.75, removes the symmetry. Nevertheless, the trapezoid method is rather slow, the polynomial interpolation method of Romberg is acceptable, and the Gaussian method requires the least work — if the number of points is known in advance. As well, rational interpolation can use the same trapezoid evaluations as the Romberg method to greater effect.

Integral

189

Quadrature method cost comparison Method Points

Trapezoid

Romberg

Rational

Gauss

1048577

257

129

36

−6.3×10−15

8.8×10−15

3.1×10−15

Rel. Err. −5.3×10−13 Value

In practice, each method must use extra evaluations to ensure an error bound on an unknown function; this tends to offset some of the advantage of the pure Gaussian method, and motivates the popular Gauss–Kronrod quadrature formulae. Symmetry can still be exploited by splitting this integral into two ranges, from −2.25 to −1.75 (no symmetry), and from −1.75 to 1.75 (symmetry). More broadly, adaptive quadrature partitions a range into pieces based on function properties, so that data points are concentrated where they are needed most. Simpson's rule, named for Thomas Simpson (1710–1761), uses a parabolic curve to approximate integrals. In many cases, it is more accurate than the trapezoidal rule and others. The rule states that

with an error of

The computation of higher-dimensional integrals (for example, volume calculations) makes important use of such alternatives as Monte Carlo integration. A calculus text is no substitute for numerical analysis, but the reverse is also true. Even the best adaptive numerical code sometimes requires a user to help with the more demanding integrals. For example, improper integrals may require a change of variable or methods that can avoid infinite function values, and known properties like symmetry and periodicity may provide critical leverage.

See also • • • • • • • • • • •

Lists of integrals – integrals of the most common functions Multiple integral Numerical integration Integral equation Riemann integral Riemann–Stieltjes integral Henstock–Kurzweil integral Lebesgue integration Darboux integral Riemann sum Symbolic integration

Integral

References • Apostol, Tom M. (1967), Calculus, Vol. 1: One-Variable Calculus with an Introduction to Linear Algebra (2nd ed.), Wiley, ISBN 978-0-471-00005-1 • Bourbaki, Nicolas (2004), Integration I, Springer Verlag, ISBN 3-540-41129-1. In particular chapters III and IV. • Burton, David M. (2005), The History of Mathematics: An Introduction (6th ed.), McGraw-Hill, p. 359, ISBN 978-0-07-305189-5 • Cajori, Florian (1929), A History Of Mathematical Notations Volume II [5], Open Court Publishing, pp. 247–252, ISBN 978-0-486-67766-8 • Dahlquist, Germund; Björck, Åke (2008), "Chapter 5: Numerical Integration" [6], Numerical Methods in Scientific Computing, Volume I, Philadelphia: SIAM • Folland, Gerald B. (1984), Real Analysis: Modern Techniques and Their Applications (1st ed.), John Wiley & Sons, ISBN 978-0-471-80958-6 • Fourier, Jean Baptiste Joseph (1822), Théorie analytique de la chaleur [7], Chez Firmin Didot, père et fils, p. §231 Available in translation as Fourier, Joseph (1878), The analytical theory of heat [8], Freeman, Alexander (trans.), Cambridge University Press, pp. 200–201 • Heath, T. L., ed. (2002), The Works of Archimedes [9], Dover, ISBN 978-0-486-42084-4 (Originally published by Cambridge University Press, 1897, based on J. L. Heiberg's Greek version.) • Hildebrandt, T. H. (1953), "Integration in abstract spaces" [10], Bulletin of the American Mathematical Society 59 (2): 111–139, ISSN 0273-0979 • Kahaner, David; Moler, Cleve; Nash, Stephen (1989), "Chapter 5: Numerical Quadrature", Numerical Methods and Software, Prentice Hall, ISBN 978-0-13-627258-8 • Leibniz, Gottfried Wilhelm (1899), Gerhardt, Karl Immanuel, ed., Der Briefwechsel von Gottfried Wilhelm Leibniz mit Mathematikern. Erster Band [11], Berlin: Mayer & Müller • Miller, Jeff, Earliest Uses of Symbols of Calculus [12], retrieved 2009-11-22 • O’Connor, J. J.; Robertson, E. F. (1996), A history of the calculus [13], retrieved 2007-07-09 • Rudin, Walter (1987), "Chapter 1: Abstract Integration", Real and Complex Analysis (International ed.), McGraw-Hill, ISBN 978-0-07-100276-9 • Saks, Stanisław (1964), Theory of the integral [14] (English translation by L. C. Young. With two additional notes by Stefan Banach. Second revised ed.), New York: Dover • Stoer, Josef; Bulirsch, Roland (2002), "Chapter 3: Topics in Integration", Introduction to Numerical Analysis (3rd ed.), Springer, ISBN 978-0-387-95452-3. • W3C (2006), Arabic mathematical notation [15]

External links • Riemann Sum [16] by Wolfram Research

Online tools • Wolfram Integrator [17] — Free online symbolic integration with Mathematica • Mathematical Assistant on Web [18] — symbolic computations online. Allows to integrate in small steps (with hints for next step (integration by parts, substitution, partial fractions, application of formulas and others), powered by Maxima • Function Calculator [19] from WIMS [20] • Online integral calculator [21], numberempire.com • Calculus : Integrate [22], quickmath.com

190

Integral

Online books • Keisler, H. Jerome, Elementary Calculus: An Approach Using Infinitesimals [23], University of Wisconsin • Stroyan, K.D., A Brief Introduction to Infinitesimal Calculus [24], University of Iowa • Mauch, Sean, Sean's Applied Math Book [25], CIT, an online textbook that includes a complete introduction to calculus • Crowell, Benjamin, Calculus [26], Fullerton College, an online textbook • Garrett, Paul, Notes on First-Year Calculus [27] • Hussain, Faraz, Understanding Calculus [28], an online textbook • Kowalk, W.P., Integration Theory [29], University of Oldenburg. A new concept to an old problem. Online textbook • Sloughter, Dan, Difference Equations to Differential Equations [30], an introduction to calculus • Numerical Methods of Integration [31] at Holistic Numerical Methods Institute • P.S. Wang, Evaluation of Definite Integrals by Symbolic Manipulation [32] (1972) - a cookbook of definite integral techniques

References [1] Shea, Marilyn (May 2007), Biography of Zu Chongzhi (http:/ / hua. umf. maine. edu/ China/ astronomy/ tianpage/ 0014ZuChongzhi9296bw. html), University of Maine, , retrieved 9 January 2009 Katz, Victor J. (2004), A History of Mathematics, Brief Version, Addison-Wesley, pp. 125–126, ISBN 978-0-321-16193-2 [2] Victor J. Katz (1995), "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3): 163-174 [165] [3] Victor J. Katz (1995), "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3): 163–174 [165–9 & 173–4] [4] http:/ / www2. gol. com/ users/ coynerhm/ 0598rothman. html [5] http:/ / www. archive. org/ details/ historyofmathema027671mbp [6] http:/ / www. mai. liu. se/ ~akbjo/ NMbook. html [7] http:/ / books. google. com/ books?id=TDQJAAAAIAAJ [8] http:/ / www. archive. org/ details/ analyticaltheory00fourrich [9] http:/ / www. archive. org/ details/ worksofarchimede029517mbp [10] http:/ / projecteuclid. org/ euclid. bams/ 1183517761 [11] http:/ / name. umdl. umich. edu/ AAX2762. 0001. 001 [12] http:/ / jeff560. tripod. com/ calculus. html [13] http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ The_rise_of_calculus. html [14] http:/ / matwbn. icm. edu. pl/ kstresc. php?tom=7& wyd=10& jez= [15] http:/ / www. w3. org/ TR/ arabic-math/ [16] http:/ / mathworld. wolfram. com/ RiemannSum. html [17] http:/ / integrals. wolfram. com [18] http:/ / user. mendelu. cz/ marik/ maw/ index. php?lang=en& form=integral [19] http:/ / wims. unice. fr/ wims/ wims. cgi?module=tool/ analysis/ function. en [20] http:/ / wims. unice. fr [21] http:/ / www. numberempire. com/ integralcalculator. php [22] http:/ / www. quickmath. com/ webMathematica3/ quickmath/ page. jsp?s1=calculus& s2=integrate& s3=basic [23] http:/ / www. math. wisc. edu/ ~keisler/ calc. html [24] http:/ / www. math. uiowa. edu/ ~stroyan/ InfsmlCalculus/ InfsmlCalc. htm [25] http:/ / www. its. caltech. edu/ ~sean/ book/ unabridged. html [26] http:/ / www. lightandmatter. com/ calc/ [27] http:/ / www. math. umn. edu/ ~garrett/ calculus/ [28] http:/ / www. understandingcalculus. com [29] http:/ / einstein. informatik. uni-oldenburg. de/ 20910. html [30] http:/ / math. furman. edu/ ~dcs/ book [31] http:/ / numericalmethods. eng. usf. edu/ topics/ integration. html [32] http:/ / www. lcs. mit. edu/ publications/ specpub. php?id=660

191

Function (mathematics)

Function (mathematics) The mathematical concept of a function expresses the intuitive idea that one quantity (the argument of the function, also known as the input) completely determines another quantity (the value, or the output). A function assigns a unique value to each input of a specified type. The argument and the value may be real numbers, but they can also be elements from any given sets: the domain and the codomain of the function. An example of a function with the real numbers as both its domain and codomain is the function f(x) = 2x, which assigns to every real number the real number with twice its value. In this case, it is written that f(5) = 10. In addition to elementary functions on numbers, functions include maps between algebraic structures like groups and maps between geometric objects like manifolds. In the abstract set-theoretic approach, a Graph of example function, Both function is a relation between the domain and the the domain and the range in the picture are the set of real numbers codomain that associates each element in the domain between -1 and 1.5. with exactly one element in the codomain. An example of a function with domain {A,B,C} and codomain {1,2,3} associates A with 1, B with 2, and C with 3. There are many ways to describe or represent functions: by a formula, by an algorithm that computes it, by a plot or a graph. A table of values is a common way to specify a function in statistics, physics, chemistry, and other sciences. A function may also be described through its relationship to other functions, for example, as the inverse function or a solution of a differential equation. There are uncountably many different functions from the set of natural numbers to itself, most of which cannot be expressed with a formula or an algorithm. In a setting where they have numerical outputs, functions may be added and multiplied, yielding new functions. Collections of functions with certain properties, such as continuous functions and differentiable functions, usually required to be closed under certain operations, are called function spaces and are studied as objects in their own right, in such disciplines as real analysis and complex analysis. An important operation on functions, which distinguishes them from numbers, is the composition of functions.

Overview Because functions are so widely used, many traditions have grown up around their use. The symbol for the input to a function is often called the independent variable or argument and is often represented by the letter x or, if the input is a particular time, by the letter t. The symbol for the output is called the dependent variable or value and is often represented by the letter y. The function itself is most often called f, and thus the notation y = f(x) indicates that a function named f has an input named x and an output named y.

192

Function (mathematics)

193

The set of all permitted inputs to a given function is called the domain of the function. The set of all resulting outputs is called the image or range of the function. The image is often a subset of some larger set, called the codomain of a function. Thus, for example, the function f(x) = x2 could take as its domain the set of all real numbers, as its image the set of all non-negative real numbers, and as its codomain the set of all real numbers. In that case, we would describe f as a real-valued function of a real variable. Sometimes, especially in computer science, the term "range" refers to the codomain rather than the image, so care needs to be taken when using the word. It is usual practice in mathematics to introduce functions with temporary names like ƒ. For example, ƒ(x) = 2x+1, implies ƒ(3) = 7; when a name for the function is not needed, the form y = 2x+1 may be used. If a function is often used, it may be given a more permanent name as, for example,

A function ƒ takes an input, x, and returns an output ƒ(x). One metaphor describes the function as a "machine" or "black box" that converts the input into the output.

Functions need not act on numbers: the domain and codomain of a function may be arbitrary sets. One example of a function that acts on non-numeric inputs takes English words as inputs and returns the first letter of the input word as output. Furthermore, functions need not be described by any expression, rule or algorithm: indeed, in some cases it may be impossible to define such a rule. For example, the association between inputs and outputs in a choice function often lacks any fixed rule, although each input element is still associated to one and only one output. A function of two or more variables is considered in formal mathematics as having a domain consisting of ordered pairs or tuples of the argument values. For example Sum(x,y) = x+y operating on integers is the function Sum with a domain consisting of pairs of integers. Sum then has a domain consisting of elements like (3,4), a codomain of integers, and an association between the two that can be described by a set of ordered pairs like ((3,4), 7). Evaluating Sum(3,4) then gives the value 7 associated with the pair (3,4). A family of objects indexed by a set is equivalent to a function. For example, the sequence 1, 1/2, 1/3, ..., 1/n, ... can be written as the ordered sequence <1/n> where n is a natural number, or as a function f(n) = 1/n from the set of natural numbers into the set of rational numbers. Dually, a surjective function partitions its domain into disjoint sets indexed by the codomain. This partition is known as the kernel of the function, and the parts are called the fibers or level sets of the function at each element of the codomain. (A non-surjective function divides its domain into disjoint and possibly-empty subsets).

Definition One precise definition of a function is that it consists of an ordered triple of sets, which may be written as (X, Y, F). X is the domain of the function, Y is the codomain, and F is a set of ordered pairs. In each of these ordered pairs (a, b), the first element a is from the domain, the second element b is from the codomain, and every element in the domain is the first element in one and only one ordered pair. The set of all b is known as the image of the function. Some authors use the term "range" to mean the image, others to mean the codomain. The notation ƒ:X→Y indicates that ƒ is a function with domain X and codomain Y. In most practical situations, the domain and codomain are understood from context, and only the relationship between the input and output is given. Thus

is usually written as

Function (mathematics)

The graph of a function is its set of ordered pairs. Such a set can be plotted on a pair of coordinate axes; for example, (3, 9) is the point of intersection of the lines x = 3 and y = 9. A function is a special case of a more general mathematical concept, the relation, for which the restriction that each element of the domain appear as the first element in one and only one ordered pair is removed (or, in other words, the restriction that each input be associated to exactly one output). A relation is "single-valued" or "functional" when for each element of the domain set, the graph contains at most one ordered pair (and possibly none) with it as a first element. A relation is called "left-total" or simply "total" when for each element of the domain, the graph contains at least one ordered pair with it as a first element (and possibly more than one). A relation that is both left-total and single-valued is a function. In some parts of mathematics, including recursion theory and functional analysis, it is convenient to study partial functions in which some values of the domain have no association in the graph; i.e., single-valued relations. For example, the function f such that f(x) = 1/x does not define a value for x = 0, and so is only a partial function from the real line to the real line. The term total function can be used to stress the fact that every element of the domain does appear as the first element of an ordered pair in the graph. In other parts of mathematics, non-single-valued relations are similarly conflated with functions: these are called multivalued functions, with the corresponding term single-valued function for ordinary functions. Some authors (especially in set theory) define a function as simply its graph f, with the restriction that the graph should not contain two distinct ordered pairs with the same first element. Indeed, given such a graph, one can construct a suitable triple by taking the set of all first elements as the domain and the set of all second elements as the codomain: this automatically causes the function to be total and surjective . However, most authors in advanced mathematics outside of set theory prefer the greater power of expression afforded by defining a function as an ordered triple of sets. Many operations in set theory—such as the power set—have the class of all sets as their domain, therefore, although they are informally described as functions, they do not fit the set-theoretical definition above outlined.

Vocabulary A specific input in a function is called an argument of the function. For each argument value x, the corresponding unique y in the codomain is called the function value at x, output of ƒ for an argument x, or the image of x under ƒ. The image of x may be written as ƒ(x) or as y. The graph of a function ƒ is the set of all ordered pairs (x, ƒ(x)), for all x in the domain X. If X and Y are subsets of R, the real numbers, then this definition coincides with the familiar sense of "graph" as a picture or plot of the function, with the ordered pairs being the Cartesian coordinates of points. A function can also be called a map or a mapping. Some authors, however, use the terms "function" and "map" to refer to different types of functions. Other specific types of functions include functionals and operators.

194

Function (mathematics)

Notation Formal description of a function typically involves the function's name, its domain, its codomain, and a rule of correspondence. Thus we frequently see a two-part notation, an example being

where the first part is read: • "ƒ is a function from N to R" (one often writes informally "Let ƒ: X → Y" to mean "Let ƒ be a function from X to Y"), or • "ƒ is a function on N into R", or • "ƒ is an R-valued function of an N-valued variable", and the second part is read: •

maps to

Here the function named "ƒ" has the natural numbers as domain, the real numbers as codomain, and maps n to itself divided by π. Less formally, this long form might be abbreviated

where f(n) is read as "f as function of n" or "f of n". There is some loss of information: we no longer are explicitly given the domain N and codomain R. It is common to omit the parentheses around the argument when there is little chance of confusion, thus: sin x; this is known as prefix notation. Writing the function after its argument, as in x ƒ, is known as postfix notation; for example, the factorial function is customarily written n!, even though its generalization, the gamma function, is written Γ(n). Parentheses are still used to resolve ambiguities and denote precedence, though in some formal settings the consistent use of either prefix or postfix notation eliminates the need for any parentheses.

Functions with multiple inputs and outputs The concept of function can be extended to an object that takes a combination of two (or more) argument values to a single result. This intuitive concept is formalized by a function whose domain is the Cartesian product of two or more sets. For example, consider the function that associates two integers to their product: ƒ(x, y) = x·y. This function can be defined formally as having domain Z×Z , the set of all integer pairs; codomain Z; and, for graph, the set of all pairs ((x,y), x·y). Note that the first component of any such pair is itself a pair (of integers), while the second component is a single integer. The function value of the pair (x,y) is ƒ((x,y)). However, it is customary to drop one set of parentheses and consider ƒ(x,y) a function of two variables, x and y. Functions of two variables may be plotted on the three-dimensional Cartesian as ordered triples of the form (x,y,f(x,y)). The concept can still further be extended by considering a function that also produces output that is expressed as several variables. For example, consider the function swap(x, y) = (y, x) with domain R×R and codomain R×R as well. The pair (y, x) is a single value in the codomain seen as a Cartesian product.

195

Function (mathematics) Currying An alternative approach to handling functions with multiple arguments is to transform them into a chain of functions that each takes a single argument. For instance, one can interpret Add(3,5) to mean "first produce a function that adds 3 to its argument, and then apply the 'Add 3' function to 5". This transformation is called currying: Add 3 is curry(Add) applied to 3. There is a bijection between the function spaces CA×B and (CB)A. When working with curried functions it is customary to use prefix notation with function application considered left-associative, since juxtaposition of multiple arguments—as in (ƒ x y)—naturally maps to evaluation of a curried function. Conversely, the → and ⟼ symbols are considered to be right-associative, so that curried functions may be defined by a notation such as ƒ: Z → Z → Z = x ⟼ y ⟼ x·y Binary operations The familiar binary operations of arithmetic, addition and multiplication, can be viewed as functions from R×R to R. This view is generalized in abstract algebra, where n-ary functions are used to model the operations of arbitrary algebraic structures. For example, an abstract group is defined as a set X and a function ƒ from X×X to X that satisfies certain properties. Traditionally, addition and multiplication are written in the infix notation: x+y and x×y instead of +(x, y) and ×(x, y).

Injective and surjective functions Three important kinds of function are the injections (or one-to-one functions), which have the property that if ƒ(a) = ƒ(b) then a must equal b; the surjections (or onto functions), which have the property that for every y in the codomain there is an x in the domain such that ƒ(x) = y; and the bijections, which are both one-to-one and onto. This nomenclature was introduced by the Bourbaki group. When the definition of a function by its graph only is used, since the codomain is not defined, the "surjection" must be accompanied with a statement about the set the function maps onto. For example, we might say ƒ maps onto the set of all real numbers.

196

Function (mathematics)

197

Function composition The function composition of two or more functions takes the output of one or more functions as the input of others. The functions ƒ: X → Y and g: Y → Z can be composed by first applying ƒ to an argument x to obtain y = ƒ(x) and then applying g to y to obtain z = g(y). The composite function formed in this way from general ƒ and g may be written

A composite function g(f(x)) can be visualized as the combination of two "machines". The first takes input x and outputs f(x). The second takes f(x) and outputs g(f(x)).

This notation follows the form such that

The function on the right acts first and the function on the left acts second, reversing English reading order. We remember the order by reading the notation as "g of ƒ". The order is important, because rarely do we get the same result both ways. For example, suppose ƒ(x) = x2 and g(x) = x+1. Then g(ƒ(x)) = x2+1, while ƒ(g(x)) = (x+1)2, which is x2+2x+1, a different function. In a similar way, the function given above by the formula y = 5x−20x3+16x5 can be obtained by composing several functions, namely the addition, negation, and multiplication of real numbers. An alternative to the colon notation, convenient when functions are being composed, writes the function name above the arrow. For example, if ƒ is followed by g, where g produces the complex number eix, we may write

A more elaborate form of this is the commutative diagram.

Function (mathematics)

Identity function The unique function over a set X that maps each element to itself is called the identity function for X, and typically denoted by idX. Each set has its own identity function, so the subscript cannot be omitted unless the set can be inferred from context. Under composition, an identity function is "neutral": if ƒ is any function from X to Y, then

Restrictions and extensions Informally, a restriction of a function ƒ is the result of trimming its domain. More precisely, if ƒ is a function from a X to Y, and S is any subset of X, the restriction of ƒ to S is the function ƒ|S from S to Y such that ƒ|S(s) = ƒ(s) for all s in S. If g is a restriction of ƒ, then it is said that ƒ is an extension of g. The overriding of f: X → Y by g: W → Y (also called overriding union) is an extension of g denoted as (f ⊕ g): (X ∪ W) → Y. Its graph is the set-theoretical union of the graphs of g and f|X \ W. Thus, it relates any element of the domain of g to its image under g, and any other element of the domain of f to its image under f. Overriding is an associative operation; it has the empty function as an identity element. If f|X ∩ W and g|X ∩ W are pointwise equal (e.g., the domains of f and g are disjoint), then the union of f and g is defined and is equal to their overriding union. This definition agrees with the definition of union for binary relations.

Inverse function If ƒ is a function from X to Y then an inverse function for ƒ, denoted by ƒ−1, is a function in the opposite direction, from Y to X, with the property that a round trip (a composition) returns each element to itself. Not every function has an inverse; those that do are called invertible. The inverse function exists if and only if ƒ is a bijection. As a simple example, if ƒ converts a temperature in degrees Celsius C to degrees Fahrenheit F, the function converting degrees Fahrenheit to degrees Celsius would be a suitable ƒ−1.

The notation for composition is similar to multiplication; in fact, sometimes it is denoted using juxtaposition, gƒ, without an intervening circle. With this analogy, identity functions are like the multiplicative identity, 1, and inverse functions are like reciprocals (hence the notation). For functions that are injections or surjections, generalized inverse functions can be defined, called left and right inverses respectively. Left inverses map to the identity when composed to the left; right inverses when composed to the right.

Image of a set The concept of the image can be extended from the image of a point to the image of a set. If A is any subset of the domain, then ƒ(A) is the subset of im ƒ consisting of all images of elements of A. We say the ƒ(A) is the image of A under f. Use of ƒ(A) to denote the image of a subset A⊆X is consistent so long as no subset of the domain is also an element of the domain. In some fields (e.g., in set theory, where ordinals are also sets of ordinals) it is convenient or even necessary to distinguish the two concepts; the customary notation is ƒ[A] for the set { ƒ(x): x ∈ A }; some authors write ƒ`x instead of ƒ(x), and ƒ``A instead of ƒ[A]. Notice that the image of ƒ is the image ƒ(X) of its domain, and that the image of ƒ is a subset of its codomain.

198

Function (mathematics) Inverse image The inverse image (or preimage, or more precisely, complete inverse image) of a subset B of the codomain Y under a function ƒ is the subset of the domain X defined by

So, for example, the preimage of {4, 9} under the squaring function is the set {−3,−2,2,3}. In general, the preimage of a singleton set (a set with exactly one element) may contain any number of elements. For example, if ƒ(x) = 7, then the preimage of {5} is the empty set but the preimage of {7} is the entire domain. Thus the preimage of an element in the codomain is a subset of the domain. The usual convention about the preimage of an element is that ƒ−1(b) means ƒ−1({b}), i.e

In the same way as for the image, some authors use square brackets to avoid confusion between the inverse image and the inverse function. Thus they would write ƒ−1[B] and ƒ−1[b] for the preimage of a set and a singleton. The preimage of a singleton set is sometimes called a fiber. The term kernel can refer to a number of related concepts.

Specifying a function A function can be defined by any mathematical condition relating each argument to the corresponding output value. If the domain is finite, a function ƒ may be defined by simply tabulating all the arguments x and their corresponding function values ƒ(x). More commonly, a function is defined by a formula, or (more generally) an algorithm — a recipe that tells how to compute the value of ƒ(x) given any x in the domain. There are many other ways of defining functions. Examples include piecewise definitions, induction or recursion, algebraic or analytic closure, limits, analytic continuation, infinite series, and as solutions to integral and differential equations. The lambda calculus provides a powerful and flexible syntax for defining and combining functions of several variables.

Computability Functions that send integers to integers, or finite strings to finite strings, can sometimes be defined by an algorithm, which gives a precise description of a set of steps for computing the output of the function from its input. Functions definable by an algorithm are called computable functions. For example, the Euclidean algorithm gives a precise process to compute the greatest common divisor of two positive integers. Many of the functions studied in the context of number theory are computable. Fundamental results of computability theory show that there are functions that can be precisely defined but are not computable. Moreover, in the sense of cardinality, almost all functions from the integers to integers are not computable. The number of computable functions from integers to integers is countable, because the number of possible algorithms is. The number of all functions from integers to integers is higher: the same as the cardinality of the real numbers. Thus most functions from integers to integers are not computable. Specific examples of uncomputable functions are known, including the busy beaver function and functions related to the halting problem and other undecidable problems.

199

Function (mathematics)

200

Function spaces The set of all functions from a set X to a set Y is denoted by X → Y, by [X → Y], or by YX. The latter notation is motivated by the fact that, when X and Y are finite and of size |X| and |Y|, then the number of functions X → Y is |YX| = |Y||X|. This is an example of the convention from enumerative combinatorics that provides notations for sets based on their cardinalities. Other examples are the multiplication sign X×Y used for the Cartesian product, where |X×Y| = |X|·|Y|; the factorial sign X!, used for the set of permutations where |X!| = |X|!; and the binomial coefficient sign , used for the set of n-element subsets where If ƒ: X → Y, it may reasonably be concluded that ƒ ∈ [X → Y].

Pointwise operations If ƒ: X → R and g: X → R are functions with a common domain of X and common codomain of a ring R, then the sum function ƒ + g: X → R and the product function ƒ ⋅ g: X → R can be defined as follows:

for all x in X. This turns the set of all such functions into a ring. The binary operations in that ring have as domain ordered pairs of functions, and as codomain functions. This is an example of climbing up in abstraction, to functions of more complex types. By taking some other algebraic structure A in the place of R, we can turn the set of all functions from X to A into an algebraic structure of the same type in an analogous way.

Other properties There are many other special classes of functions that are important to particular branches of mathematics, or particular applications. Here is a partial list: •

bijection, injection and surjection, or individually:

• •

• injective, surjective, and bijective function continuous differentiable, integrable

• • • • •

linear, polynomial, rational algebraic, transcendental trigonometric fractal odd or even

• • • •

convex, monotonic, unimodal holomorphic, meromorphic, entire vector-valued computable

History Functions prior to Leibniz Historically, some mathematicians can be regarded as having foreseen and come close to a modern formulation of the concept of function. Among them is Oresme (1323-1382) . . . In his theory, some general ideas about independent and dependent variable quantities seem to be present.[1] [2] Ponte further notes that "The emergence of a notion of function as an individualized mathematical entity can be traced to the beginnings of infinitesimal calculus".[1]

Function (mathematics) The notion of "function" in analysis As a mathematical term, "function" was coined by Gottfried Leibniz, in a 1673 letter, to describe a quantity related to a curve, such as a curve's slope at a specific point.[3] [4] The functions Leibniz considered are today called differentiable functions. For this type of function, one can talk about limits and derivatives; both are measurements of the output or the change in the output as it depends on the input or the change in the input. Such functions are the basis of calculus. Johann Bernoulli "by 1718, had come to regard a function as any expression made up of a variable and some constants",[5] and Leonhard Euler during the mid-18th century used the word to describe an expression or formula involving variables and constants e.g., x2+3x+2.[6] Alexis Claude Clairaut (in approximately 1734) and Euler introduced the familiar notation " f(x) ".[6] At first, the idea of a function was rather limited. Joseph Fourier, for example, claimed that every function had a Fourier series, something no mathematician would claim today. By broadening the definition of functions, mathematicians were able to study "strange" mathematical objects such as continuous functions that are nowhere differentiable. These functions were first thought to be only theoretical curiosities, and they were collectively called "monsters" as late as the turn of the 20th century. However, powerful techniques from functional analysis have shown that these functions are, in a precise sense, more common than differentiable functions. Such functions have since been applied to the modeling of physical phenomena such as Brownian motion. During the 19th century, mathematicians started to formalize all the different branches of mathematics. Weierstrass advocated building calculus on arithmetic rather than on geometry, which favoured Euler's definition over Leibniz's (see arithmetization of analysis). Dirichlet and Lobachevsky are traditionally credited with independently giving the modern "formal" definition of a function as a relation in which every first element has a unique second element. Eves asserts that "the student of mathematics usually meets the Dirichlet definition of function in his introductory course in calculus,[7] but Dirichlet's claim to this formalization is disputed by Imre Lakatos: There is no such definition in Dirichlet's works at all. But there is ample evidence that he had no idea of this concept. In his [1837], for instance, when he discusses piecewise continuous functions, he says that at points of discontinuity the function has two values: ... (Proofs and Refutations, 151, Cambridge University Press 1976.) In the context of "the Differential Calculus" George Boole defined (circa 1849) the notion of a function as follows: "That quantity whose variation is uniform . . . is called the independent variable. That quantity whose variation is referred to the variation of the former is said to be a function of it. The Differential calculus enables us in every case to pass from the function to the limit. This it does by a certain Operation. But in the very Idea of an Operation is . . . the idea of an inverse operation. To effect that inverse operation in the present instance is the business of the Int[egral] Calculus."[8] The logician's "function" prior to 1850 Logicians of this time were primarily involved with analyzing syllogisms (the 2000 year-old Aristotelian forms and otherwise), or as Augustus De Morgan (1847) stated it: "the examination of that part of reasoning which depends upon the manner in which inferences are formed, and the investigation of general maxims and rules for constructing arguments".[9] At this time the notion of (logical) "function" is not explicit, but at least in the work of De Morgan and George Boole it is implied: we see abstraction of the argument forms, the introduction of variables, the introduction of a symbolic algebra with respect to these variables, and some of the notions of set theory. De Morgan's 1847 "FORMAL LOGIC OR, The Calculus of Inference, Necessary and Probable" observes that "[a] logical truth depends upon the structure of the statement, and not upon the particular matters spoken of"; he wastes no time (preface page i) abstracting: "In the form of the proposition, the copula is made as absract as the terms". He

201

Function (mathematics) immediately (p. 1) casts what he calls "the proposition" (present-day propositional function or relation) into a form such as "X is Y", where the symbols X, "is", and Y represent, respectively, the subject, copula, and predicate. While the word "function" does not appear, the notion of "abstraction" is there, "variables" are there, the notion of inclusion in his symbolism “all of the Δ is in the О” (p. 9) is there, and lastly a new symbolism for logical analysis of the notion of "relation" (he uses the word with respect to this example " X)Y " (p. 75) ) is there: " A1 X)Y To take an X it is necessary to take a Y" [or To be an X it is necessary to be a Y] " A1 Y)X To take an Y it is sufficient to take a X" [or To be a Y it is sufficient to be an X], etc. In his 1848 The Nature of Logic Boole asserts that "logic . . . is in a more especial sense the science of reasoning by signs", and he briefly discusses the notions of "belonging to" and "class": "An individual may possess a great variety of attributes and thus belonging to a great variety of different classes" .[10] Like De Morgan he uses the notion of "variable" drawn from analysis; he gives an example of "represent[ing] the class oxen by x and that of horses by y and the conjunction and by the sign + . . . we might represent the aggregate class oxen and horses by x + y".[11]

The logicians' "function" 1850-1950 Eves observes "that logicians have endeavored to push down further the starting level of the definitional development of mathematics and to derive the theory of sets, or classes, from a foundation in the logic of propositions and propositional functions".[12] But by the late 19th century the logicians' research into the foundations of mathematics was undergoing a major split. The direction of the first group, the Logicists, can probably be summed up best by Bertrand Russell 1903:9 -- "to fulfil two objects, first, to show that all mathematics follows from symbolic logic, and secondly to discover, as far as possible, what are the principles of symbolic logic itself." The second group of logicians, the set-theorists, emerged with Georg Cantor's "set theory" (1870–1890) but were driven forward partly as a result of Russell's discovery of a paradox that could be derived from Frege's conception of "function", but also as a reaction against Russell's proposed solution.[13] Zermelo's set-theoretic response was his 1908 Investigations in the foundations of set theory I -- the first axiomatic set theory; here too the notion of "propositional function" plays a role. George Boole's The Laws of Thought 1854; John Venn's Symbolic Logic 1881 In his An Investigation into the laws of thought Boole now defined a function in terms of a symbol x as follows: "8. Definition.-- Any algebraic expression involving symbol x is termed a function of x, and may be represented by the abbreviated form f(x)"[14] Boole then used algebraic expressions to define both algebraic and logical notions, e.g., 1−x is logical NOT(x), xy is the logical AND(x,y), x + y is the logical OR(x, y), x(x+y) is xx+xy, and "the special law" xx = x2 = x.[15] In his 1881 Symbolic Logic Venn was using the words "logical function" and the contemporary symbolism ( x = f(y), y = f−1(x), cf page xxi) plus the circle-diagrams historically associated with Venn to describe "class relations",[16] the notions "'quantifying' our predicate", "propositions in respect of their extension", "the relation of inclusion and exclusion of two classes to one another", and "propositional function" (all on p. 10), the bar over a variable to indicate not-x (page 43), etc. Indeed he equated unequivocally the notion of "logical function" with "class" [modern "set"]: "... on the view adopted in this book, f(x) never stands for anything but a logical class. It may be a compound class aggregated of many simple classes; it may be a class indicated by certain inverse logical operations, it may be composed of two groups of classes equal to one another, or what is the same thing, their difference declared equal to zero, that is, a logical equation. But however composed or derived, f(x) with us will never be anything else than a general expression for such logical classes of things as may fairly find a place in ordinary Logic".[17]

202

Function (mathematics) Frege's Begriffsschrift 1879 Gottlob Frege's Begriffsschrift (1879) preceded Giuseppe Peano (1889), but Peano had no knowledge of Frege 1879 until after he had published his 1889.[18] Both writers strongly influenced Bertrand Russell (1903). Russell in turn influenced much of 20th-century mathematics and logic through his Principia Mathematica (1913) jointly authored with Alfred North Whitehead. At the outset Frege abandons the traditional "concepts subject and predicate", replacing them with argument and function respectively, which he believes "will stand the test of time. It is easy to see how regarding a content as a function of an argument leads to the formation of concepts. Furthermore, the demonstration of the connection between the meanings of the words if, and, not, or, there is, some, all, and so forth, deserves attention".[19] Frege begins his discussion of "function" with an example: Begin with the expression[20] "Hydrogen is lighter than carbon dioxide". Now remove the sign for hydrogen (i.e., the word "hydrogen") and replace it with the sign for oxygen (i.e., the word "oxygen"); this makes a second statement. Do this again (using either statement) and substitute the sign for nitrogen (i.e., the word "nitrogen") and note that "This changes the meaning in such a way that "oxygen" or "nitrogen" enters into the relations in which "hydrogen" stood before".[21] There are three statements: • "Hydrogen is lighter than carbon dioxide." • "Oxygen is lighter than carbon dioxide." • "Nitrogen is lighter than carbon dioxide." Now observe in all three a "stable component, representing the totality of [the] relations";[22] call this the function, i.e., "... is lighter than carbon dioxide", is the function. Frege calls the argument of the function "[t]he sign [e.g., hydrogen, oxygen, or nitrogen], regarded as replaceable by others that denotes the object standing in these relations".[23] He notes that we could have derived the function as "Hydrogen is lighter than . . .." as well, with an argument position on the right; the exact observation is made by Peano (see more below). Finally, Frege allows for the case of two (or more arguments). For example, remove "carbon dioxide" to yield the invariant part (the function) as: • "... is lighter than ... " The one-argument function Frege generalizes into the form Φ(A) where A is the argument and Φ( ) represents the function, whereas the two-argument function he symbolizes as Ψ(A, B) with A and B the arguments and Ψ( , ) the function and cautions that "in general Ψ(A, B) differs from Ψ(B, A)". Using his unique symbolism he translates for the reader the following symbolism: "We can read |--- Φ(A) as "A has the property Φ. |--- Ψ(A, B) can be translated by "B stands in the relation Ψ to A" or "B is a result of an application of the procedure Ψ to the object A".[24] Peano 1889 The Principles of Arithmetic 1889 Peano defined the notion of "function" in a manner somewhat similar to Frege, but without the precision.[25] First Peano defines the sign "K means class, or aggregate of objects",[26] the objects of which satisfy three simple equality-conditions,[27] a = a, (a = b) = (b = a), IF ((a = b) AND (b = c)) THEN (a = c). He then introduces φ, "a sign or an aggregate of signs such that if x is an object of the class s, the expression φx denotes a new object". Peano adds two conditions on these new objects: First, that the three equality-conditions hold for the objects φx; secondly, that "if x and y are objects of class s and if x = y, we assume it is possible to deduce φx = φy".[28] Given all these conditions are met, φ is a "function presign". Likewise he identifies a "function postsign". For example if φ is the function presign a+, then φx yields a+x, or if φ is the function postsign +a then xφ yields x+a.[29]

203

Function (mathematics) Bertrand Russell's The Principles of Mathematics 1903 While the influence of Cantor and Peano was paramount,[30] in Appendix A "The Logical and Arithmetical Doctrines of Frege" of The Principles of Mathematics, Russell arrives at a discussion of Frege's notion of function, "...a point in which Frege's work is very important, and requires careful examination".[31] In response to his 1902 exchange of letters with Frege about the contradiction he discovered in Frege's Begriffsschrift Russell tacked this section on at the last moment. For Russell the bedeviling notion is that of "variable": "6. Mathematical propositions are not only characterized by the fact that they assert implications, but also by the fact that they contain variables. The notion of the variable is one of the most difficult with which logic has to deal. For the present, I openly wish to make it plain that there are variables in all mathematical propositions, even where at first sight they might seem to be absent. . . . We shall find always, in all mathematical propositions, that the words any or some occur; and these words are the marks of a variable and a formal implication".[32] As expressed by Russell "the process of transforming constants in a proposition into variables leads to what is called generalization, and gives us, as it were, the formal essence of a proposition ... So long as any term in our proposition can be turned into a variable, our proposition can be generalized; and so long as this is possible, it is the business of mathematics to do it";[33] these generalizations Russell named propositional functions".[34] Indeed he cites and quotes from Frege's Begriffsschrift and presents a vivid example from Frege's 1891 Function und Begriff: That "the essence of the arithmetical function 2*x3+x is what is left when the x is taken away, i.e., in the above instance 2*( )3 + ( ). The argument x does not belong to the function but the two taken together make the whole".[31] Russell agreed with Frege's notion of "function" in one sense: "He regards functions -- and in this I agree with him -- as more fundamental than predicates and relations" but Russell rejected Frege's "theory of subject and assertion", in particular "he thinks that, if a term a occurs in a proposition, the proposition can always be analysed into a and an assertion about a".[31] Evolution of Russell's notion of "function" 1908-1913 Russell would carry his ideas forward in his 1908 Mathematical logical as based on the theory of types and into his and Whitehead's 1910-1913 Principia Mathematica. By the time of Principia Mathematica Russell, like Frege, considered the propositional function fundamental: "Propositional functions are the fundamental kind from which the more usual kinds of function, such as “sin ‘’x’’ or log x or "the father of x" are derived. These derivative functions . . . are called “descriptive functions". The functions of propositions . . . are a particular case of propositional functions".[35] Propositional functions: Because his terminology is different from the contemporary, the reader may be confused by Russell's "propositional function". An example may help. Russell writes a propositional function in its raw form, e.g., as φŷ: "ŷ is hurt". (Observe the circumflex or "hat" over the variable y). For our example, we will assign just 4 values to the variable ŷ: "Bob", "This bird", "Emily the rabbit", and "y". Substitution of one of these values for variable ŷ yields a proposition; this proposition is called a "value" of the propositional function. In our example there are four values of the propositional function, e.g., "Bob is hurt", "This bird is hurt", "Emily the rabbit is hurt" and "y is hurt." A proposition, if it is significant—i.e., if its truth is determinate—has a truth-value of truth or falsity. If a proposition's truth value is "truth" then the variable's value is said to satisfy the propositional function. Finally, per Russell's definition, "a class [set] is all objects satisfying some propositional function" (p. 23). Note the word "all'" -- this is how the contemporary notions of "For all ∀" and "there exists at least one instance ∃" enter the treatment (p. 15). To continue the example: Suppose (from outside the mathematics/logic) one determines that the propositions "Bob is hurt" has a truth value of "falsity", "This bird is hurt" has a truth value of "truth", "Emily the rabbit is hurt" has an indeterminate truth value because "Emily the rabbit" doesn't exist, and "y is hurt" is ambiguous as to its truth value because the argument y itself is ambiguous. While the two propositions "Bob is hurt" and "This bird is hurt" are

204

Function (mathematics) significant (both have truth values), only the value "This bird" of the variable ŷ satisfies' the propositional function φŷ: "ŷ is hurt". When one goes to form the class α: φŷ: "ŷ is hurt", only "This bird" is included, given the four values "Bob", "This bird", "Emily the rabbit" and "y" for variable ŷ and their respective truth-values: falsity, truth, indeterminate, ambiguous. Russell defines functions of propositions with arguments, and truth-functions f(p).[36] For example, suppose one were to form the "function of propositions with arguments" p1: "NOT(p) AND q" and assign its variables the values of p: "Bob is hurt" and q: "This bird is hurt". (We are restricted to the logical linkages NOT, AND, OR and IMPLIES, and we can only assign "significant" propositions to the variables p and q). Then the "function of propositions with arguments" is p1: NOT("Bob is hurt") AND "This bird is hurt"). To determine the truth value of this "function of propositions with arguments" we submit it to a "truth function", e.g., f(p1): f( NOT("Bob is hurt") AND "This bird is hurt") ), which yields a truth value of "truth". The notion of a "many-one" functional relation": Russell first discusses the notion of "identity", then defines a descriptive function (pages 30ff) as the unique value ιx that satisfies the (2-variable) propositional function (i.e., "relation") φŷ. N.B. The reader should be warned here that the order of the variables are reversed! y is the independent variable and x is the dependent variable, e.g., x = sin(y).[37] Russell symbolizes the descriptive function as "the object standing in relation to y": R'y =DEF (ιx)(x R y). Russell repeats that "R'y is a function of y, but not a propositional function [sic]; we shall call it a descriptive function. All the ordinary functions of mathematics are of this kind. Thus in our notation "sin y" would be written " sin 'y ", and "sin" would stand for the relation sin 'y has to y".[38] Hardy 1908 Hardy 1908, pp. 26–28 defined a function as a relation between two variables x and y such that "to some values of x at any rate correspond values of y." He neither required the function to be defined for all values of x nor to associate each value of x to a single value of y. This broad definition of a function encompasses more relations than are ordinarily considered functions in contemporary mathematics.

The Formalist's "function": David Hilbert's axiomatization of mathematics (1904-1927) David Hilbert set himself the goal of "formalizing" classical mathematics "as a formal axiomatic theory, and this theory shall be proved to be consistent, i.e., free from contradiction" .[39] In his 1927 The Foundations of Mathematics Hilbert frames the notion of function in terms of the existence of an "object": 13. A(a) --> A(ε(A)) Here ε(A) stands for an object of which the proposition A(a) certainly holds if it holds of any object at all; let us call ε the logical ε-function".[40] [The arrow indicates “implies”.] Hilbert then illustrates the three ways how the ε-function is to be used, firstly as the "for all" and "there exists" notions, secondly to represent the "object of which [a proposition] holds", and lastly how to cast it into the choice function. Recursion theory and computability: But the unexpected outcome of Hilbert's and his student Bernays's effort was failure; see Gödel's incompleteness theorems of 1931. At about the same time, in an effort to solve Hilbert's Entscheidungsproblem, mathematicians set about to define what was meant by an "effectively calculable function" (Alonzo Church 1936), i.e., "effective method" or "algorithm", that is, an explicit, step-by-step procedure that would succeed in computing a function. Various models for algorithms appeared, in rapid succession, including Church's lambda calculus (1936), Stephen Kleene's μ-recursive functions(1936) and Allan Turing's (1936-7) notion of replacing human "computers" with utterly-mechanical "computing machines" (see Turing machines). It was shown that all of these models could compute the same class of computable functions. Church's thesis holds that this class of functions exhausts all the number-theoretic functions that can be calculated by an algorithm. The outcomes of these efforts were vivid demonstrations that, in Turing's words, "there can be no general process for determining

205

Function (mathematics) whether a given formula U of the functional calculus K [Principia Mathematica] is provable";[41] see more at Independence (mathematical logic) and Computability theory.

Development of the set-theoretic definition of "function" Set theory began with the work of the logicians with the notion of "class" (modern "set") for example De Morgan (1847), Jevons (1880), Venn 1881, Frege 1879 and Peano (1889). It was given a push by Georg Cantor's attempt to define the infinite in set-theoretic treatment(1870–1890) and a subsequent discovery of an antinomy (contradiction, paradox) in this treatment (Cantor's paradox), by Russell's discovery (1902) of an antinomy in Frege's 1879 (Russell's paradox), by the discovery of more antinomies in the early 20th century (e.g., the 1897 Burali-Forti paradox and the 1905 Richard paradox), and by resistance to Russell's complex treatment of logic[42] and dislike of his axiom of reducibility[43] (1908, 1910–1913) that he proposed as a means to evade the antinomies. Russell's paradox 1902 In 1902 Russell sent a letter to Frege pointing out that Frege's 1879 Begriffsschrift allowed a function to be an argument of itself: "On the other hand, it may also be that the argument is determinate and the function indeterminate . . .."[44] From this unconstrained situation Russell was able to form a paradox: "You state ... that a function, too, can act as the indeterminate element. This I formerly believed, but now this view seems doubtful to me because of the following contradiction. Let w be the predicate: to be a predicate that cannot be predicated of itself. Can w be predicated of itself?"[45] Frege responded promptly that "Your discovery of the contradiction caused me the greatest surprise and, I would almost say, consternation, since it has shaken the basis on which I intended to build arithmetic".[46] From this point forward development of the foundations of mathematics became an exercise in how to dodge "Russell's paradox", framed as it was in "the bare [set-theoretic] notions of set and element".[47] Zermelo's set theory (1908) modified by Skolem (1922) The notion of "function" appears as Zermelo's axiom III—the Axiom of Separation (Axiom der Aussonderung). This axiom constrains us to use a propositional function Φ(x) to "separate" a subset MΦ from a previously formed set M: "AXIOM III. (Axiom of separation). Whenever the propositional function Φ(x) is definite for all elements of a set M, M possesses a subset MΦ containing as elements precisely those elements x of M for which Φ(x) is true".[48] As there is no universal set—sets originate by way of Axiom II from elements of (non-set) domain B -- "...this disposes of the Russell antinomy so far as we are concerned".[49] But Zermelo's "definite criterion" is imprecise, and is fixed by Weyl, Fraenkel, Skolem, and von Neumann.[50] In fact Skolem in his 1922 referred to this "definite criterion" or "property" as a "definite proposition": "... a finite expression constructed from elementary propositions of the form a ε b or a = b by means of the five operations [logical conjunction, disjunction, negation, universal quantification, and existential quantification].[51] van Heijenoort summarizes: "A property is definite in Skolem's sense if it is expressed . . . by a well-formed formula in the simple predicate calculus of first order in which the sole predicate constants are ε and possibly, =. ... Today an axiomatization of set theory is usually embedded in a logical calculus, and it is Weyl's and Skolem's approach to the formulation of the axiom of separation that is generally adopted.[52] In this quote the reader may observe a shift in terminology: nowhere is mentioned the notion of "propositional function", but rather one sees the words "formula", "predicate calculus", "predicate", and "logical calculus." This shift in terminology is discussed more in the section that covers "function" in contemporary set theory.

206

Function (mathematics) The Wiener–Hausdorff–Kuratowski "ordered pair" definition 1914–1921 The history of the notion of "ordered pair" is not clear. As noted above, Frege (1879) proposed an intuitive ordering in his definition of a two-argument function Ψ(A, B). Norbert Wiener in his 1914 (see below) observes that his own treatment essentially "revert(s) to Schröder's treatment of a relation as a class of ordered couples".[53] Russell (1903) considered the definition of a relation (such as Ψ(A, B)) as a "class of couples" but rejected it: "There is a temptation to regard a relation as definable in extension as a class of couples. This is the formal advantage that it avoids the necessity for the primitive proposition asserting that every couple has a relation holding between no other pairs of terms. But it is necessary to give sense to the couple, to distinguish the referent [domain] from the relatum [converse domain]: thus a couple becomes essentially distinct from a class of two terms, and must itself be introduced as a primitive idea. . . . It seems therefore more correct to take an intensional view of relations, and to identify them rather with class-concepts than with classes."[54] By 1910-1913 and Principia Mathematica Russell had given up on the requirement for an intensional definition of a relation, stating that "mathematics is always concerned with extensions rather than intensions" and "Relations, like classes, are to be taken in extension".[55] To demonstrate the notion of a relation in extension Russell now embraced the notion of ordered couple: "We may regard a relation ... as a class of couples ... the relation determined by φ(x, y) is the class of couples (x, y) for which φ(x, y) is true".[56] In a footnote he clarified his notion and arrived at this definition: "Such a couple has a sense, i.e., the couple (x, y) is different from the couple (y, x) unless x = y. We shall call it a "couple with sense," ... it may also be called an ordered couple.[56] But he goes on to say that he would not introduce the ordered couples further into his "symbolic treatment"; he proposes his "matrix" and his unpopular axiom of reducibility in their place. An attempt to solve the problem of the antinomies led Russell to propose his "doctrine of types" in an appendix B of his 1903 The Principles of Mathematics.[57] In a few years he would refine this notion and propose in his 1908 The Theory of Types two axioms of reducibility, the purpose of which were to reduce (single-variable) propositional functions and (dual-variable) relations to a "lower" form (and ultimately into a completely extensional form); he and Alfred North Whitehead would carry this treatment over to Principia Mathematica 1910-1913 with a further refinement called "a matrix".[58] The first axiom is *12.1; the second is *12.11. To quote Wiener the second axiom *12.11 "is involved only in the theory of relations".[59] Both axioms, however, were met with skepticism and resistance; see more at Axiom of reducibility. By 1914 Norbert Wiener, using Whitehead and Russell's symbolism, eliminated axiom *12.11 (the "two-variable" (relational) version of the axiom of reducibility) by expressing a relation as an ordered pair "using the null set. At approximately the same time, Hausdorff (1914, p. 32) gave the definition of the ordered pair (a, b) as { {a,1}, {b, 2} }. A few years later Kuratowski (1921) offered a definition that has been widely used ever since, namely { {a, b}, {a} }".[60] As noted by Suppes (1960) "This definition . . . was historically important in reducing the theory of relations to the theory of sets.[61] Observe that while Wiener "reduced" the relational *12.11 form of the axiom of reducibility he did not reduce nor otherwise change the propositional-function form *12.1; indeed he declared this "essential to the treatment of identity, descriptions, classes and relations".[62] Schönfinkel's notion of "function" as a many-one "correspondence" 1924 Where exactly the general notion of "function" as a many-one relationship derives from is unclear. Russell in his 1920 Introduction to Mathematical Philosophy states that "It should be observed that all mathematical functions result form one-many [sic -- contemporary usage is many-one] relations . . . Functions in this sense are descriptive functions".[63] A reasonable possibility is the Principia Mathematica notion of "descriptive function" -- R 'y =DEF (ιx)(x R y): "the singular object that has a relation R to y". Whatever the case, by 1924, Moses Schonfinkel expressed the notion, claiming it to be "well known":

207

Function (mathematics) "As is well known, by function we mean in the simplest case a correspondence between the elements of some domain of quantities, the argument domain, and those of a domain of function values ... such that to each argument value there corresponds at most one function value".[64] According to Willard Quine, Schönfinkel's 1924 "provide[s] for ... the whole sweep of abstract set theory. The crux of the matter is that Schönfinkel lets functions stand as arguments. ¶ For Schönfinkel, substantially as for Frege, classes are special sorts of functions. They are propositional functions, functions whose values are truth values. All functions, propositional and otherwise, are for Schönfinkel one-place functions".[65] Remarkably, Schönfinkel reduces all mathematics to an extremely compact functional calculus consisting of only three functions: Constancy, fusion (i.e., composition), and mutual exclusivity. Quine notes that Haskell Curry (1958) carried this work forward "under the head of combinatory logic".[66] von Neumann's set theory 1925 By 1925 Abraham Fraenkel (1922) and Thoralf Skolem (1922) had amended Zermelo's set theory of 1908. But von Neumann was not convinced that this axiomatization could not lead to the antinomies.[67] So he proposed his own theory, his 1925 An axiomatization of set theory. It explicitly contains a "contemporary", set-theoretic version of the notion of "function": "[Unlike Zermelo's set theory] [w]e prefer, however, to axiomatize not "set" but "function". The latter notion certainly includes the former. (More precisely, the two notions are completely equivalent, since a function can be regarded as a set of pairs, and a set as a function that can take two values.)".[68] His axiomatization creates two "domains of objects" called "arguments" (I-objects) and "functions" (II-objects); where they overlap are the "argument functions" (I-II objects). He introduces two "universal two-variable operations" -- (i) the operation [x, y]: ". . . read 'the value of the function x for the argument y) and (ii) the operation (x, y): ". . . (read 'the ordered pair x, y'") whose variables x and y must both be arguments and that itself produces an argument (x,y)". To clarify the function pair he notes that "Instead of f(x) we write [f,x] to indicate that f, just like x, is to be regarded as a variable in this procedure". And to avoid the "antinomies of naive set theory, in Russell's first of all . . . we must forgo treating certain functions as arguments".[69] He adopts a notion from Zermelo to restrict these "certain functions"[70]

Since 1950 Notion of "function" in contemporary set theory Both axiomatic and naive forms of Zermelo's set theory as modified by Fraenkel (1922) and Skolem (1922) define "function" as a relation, define a relation as a set of ordered pairs, and define an ordered pair as a set of two "dissymetric" sets. While the reader of Suppes (1960) Axiomatic Set Theory or Halmos (1970) Naive Set Theory observes the use of function-symbolism in the axiom of separation, e.g., φ(x) (in Suppes) and S(x) (in Halmos), they will see no mention of "proposition" or even "first order predicate calculus". In their place are "expressions of the object language", "atomic formulae", "primitive formulae", and "atomic sentences". Kleene 1952 defines the words as follows: "In word languages, a proposition is expressed by a sentence. Then a 'predicate' is expressed by an incomplete sentence or sentence skeleton containing an open place. For example, "___ is a man" expresses a predicate ... The predicate is a propositional function of one variable. Predicates are often called 'properties' ... The predicate calculus will treat of the logic of predicates in this general sense of 'predicate', i.e., as propositional function".[71] The reason for the disappearance of the words "propositional function" e.g., in Suppes (1960), and Halmos (1970), is explained by Alfred Tarski 1946 together with further explanation of the terminology:

208

Function (mathematics)

209

"An expression such as x is an integer, which contains variables and, on replacement of these variables by constants becomes a sentence, is called a SENTENTIAL [i.e., propositional cf his index] FUNCTION. But mathematicians, by the way, are not very fond of this expression, because they use the term "function" with a different meaning. ... sentential functions and sentences composed entirely of mathematical symbols (and not words of everyday languange), such as: x + y = 5 are usually referred to by mathematicians as FORMULAE. In place of "sentential function" we shall sometimes simply say "sentence" --- but only in cases where there is no danger of any misunderstanding".[72] For his part Tarski calls the relational form of function a "FUNCTIONAL RELATION or simply a FUNCTION" .[73] After a discussion of this "functional relation" he asserts that: "The concept of a function which we are considering now differs essentially from the concepts of a sentential [propositional] and of a designatory function .... Strictly speaking ... [these] do not belong to the domain of logic or mathematics; they denote certain categories of expressions which serve to compose logical and mathematical statements, but they do not denote things treated of in those statements... . The term "function" in its new sense, on the other hand, is an expression of a purely logical character; it designates a certain type of things dealt with in logic and mathematics."[74] See more about "truth under an interpretation" at Alfred Tarski. Further developments The idea of structure-preserving functions, or homomorphisms, led to the abstract notion of morphism, the key concept of category theory. More recently, the concept of functor has been used as an analogue of a function in category theory.[75]

See also • • • •

List of mathematical functions Functional predicate Function composition Functional

• • • •

Functional decomposition Functor Generalized function Implicit function

• • • •

Parametric equation Plateau Proportionality Vertical line test

References • • • • • • • •



Anton, Howard (1980), Calculus with Analytical Geometry, Wiley, ISBN 978-0-471-03248-9 Bartle, Robert G. (1976), The Elements of Real Analysis (2nd ed.), Wiley, ISBN 978-0-471-05464-1 Husch, Lawrence S. (2001), Visual Calculus [11], University of Tennessee, retrieved 2007-09-27 Katz, Robert (1964), Axiomatic Analysis, D. C. Heath and Company. Ponte, João Pedro (1992), "The history of the concept of function and some educational implications" [76], The Mathematics Educator 3 (2): 3–8, ISSN 1062-9017 Thomas, George B.; Finney, Ross L. (1995), Calculus and Analytic Geometry (9th ed.), Addison-Wesley, ISBN 978-0-201-53174-9 Youschkevitch, A. P. (1976), "The concept of function up to the middle of the 19th century", Archive for History of Exact Sciences 16 (1): 37–85, doi:10.1007/BF00348305. Monna, A. F. (1972), "The concept of function in the 19th and 20th centuries, in particular with regard to the discussions between Baire, Borel and Lebesgue", Archive for History of Exact Sciences 9 (1): 57–84, doi:10.1007/BF00348540. Kleiner, Israel (1989), "Evolution of the Function Concept: A Brief Survey" [77], The College Mathematics Journal (Mathematical Association of America) 20 (4): 282–300, doi:10.2307/2686848.

Function (mathematics) • Ruthing, D. (1984), "Some definitions of the concept of function from Bernoulli, Joh. to Bourbaki, N.", Mathematical Intelligencer 6 (4): 72–77. • Dubinsky, Ed; Harel, Guershon (1992), The Concept of Function: Aspects of Epistemology and Pedagogy, Mathematical Association of America, ISBN 0883850818. • Malik, M. A. (1980), "Historical and pedagogical aspects of the definition of function", International Journal of Mathematical Education in Science and Technology 11 (4): 489–492, doi:10.1080/0020739800110404. • Boole, George (1854), An Investigation into the Laws of Thought on which are founded the Laws of Thought and Probabilities", Walton and Marberly, London UK; Macmillian and Company, Cambridge UK. Republished as a googlebook. • Eves, Howard. (1990), Fundations and Fundamental Concepts of Mathematics: Third Edition, Dover Publications, Inc. Mineola, NY, ISBN 0-486-69609-X (pbk) • Frege, Gottlob. (1879), Begriffsschrift: eine der arithmetischen nachgebildete Formelsprache des reinen Denkens, Halle • Grattan-Guinness, Ivor and Bornet, Gérard (1997), George Boole: Selected Manuscripts on Logic and its Philosophy, Springer-Verlag, Berlin, ISBN 3-7643-5456-9 (Berlin...) • Halmos, Paul R. (1970) Naive Set Theory, Springer-Verlag, New York, ISBN 0-387-90092-6. • Hardy, Godfrey Harold (1908), A Course of Pure Mathematics, Cambridge University Press (published 1993), ISBN 978-0-521-09227-2 • Reichenbach, Hans (1947) Elements of Symbolic Logic, Dover Publishing Inc., New York NY, ISBN 0-486-24004-5. • Russell, Bertrand (1903) The Principles of Mathematics: Vol. 1, Cambridge at the University Press, Cambridge, UK, republished as a googlebook. • Russell, Bertrand (1920) Introduction to Mathematical Philosophy (second edition), Dover Publishing Inc., New York NY, ISBN 0-486-27724-0 (pbk). • Suppes, Patrick (1960) Axiomatic Set Theory, Dover Publications, Inc, New York NY, ISBN 0-486-61630-4. cf his Chapter 1 Introduction. • Tarski, Alfred (1946) Introduction to Logic and to the Methodolgy of Deductive Sciences, republished 1195 by Dover Publications, Inc., New York, NY ISBN 0-486-28462-x • Venn, John (1881) Symbolic Logic, Macmillian and Co., London UK. Republished as a googlebook. • van Heijenoort, Jean (1967, 3rd printing 1976), From Frege to Godel: A Source Book in Mathematical Logic, 1879-1931, Harvard University Press, Cambridge, MA, ISBN 0-674-32449-8 (pbk) • Gottlob Frege (1879) Begriffsschrift, a formula language, modeled upon that of arithmetic, for pure thought with commentary by van Heijenoort, pages 1–82 • Giuseppe Peano (1889) The principles of arithmetic, presented by a new method with commentary by van Heijenoort, pages 83–97 • Bertrand Russell (1902) Letter to Frege with commentary by van Heijenoort, pages 124-125. Wherein Russell announces his discovery of a "paradox" in Frege's work. • Gottlob Frege (1902) Letter to Russell with commentary by van Heijenoort, pages 126-128. • David Hilbert (1904) On the foundations of logic and arithmetic, with commentary by van Heijenoort, pages 129-138. • Jules Richard (1905) The principles of mathematics and the problem of sets, with commentary by van Heijenoort, pages 142-144. The Richard paradox. • Bertrand Russell (1908a) Mathematical logic as based on the theory of types, with commentary by Willard Quine, pages 150-182. • Ernst Zermelo (1908) A new proof of the possibility of a well-ordering, with commentary by van Heijenoort, pages 183-198. Wherein Zermelo rales against Poincaré's (and therefore Russell's) notion of impredicative definition.

210

Function (mathematics) • Ernst Zermelo (1908a) Investigations in the foundations of set theory I, with commentary by van Heijenoort, pages 199-215. Wherein Zermelo attempts to solve Russell's paradox by structuring his axioms to restrict the universal domain B (from which objects and sets are pulled by definite properties) so that it itself cannot be a set, i.e., his axioms disallow a universal set. • Norbert Wiener (1914) A simplification of the logic of relations, with commentary by van Heijenoort, pages 224-227 • Thoralf Skolem (1922) Some remarks on axiomatized set theory, with commentary by van Heijenoort, pages 290-301. Wherein Skolem defines Zermelo's vague "definite property". • Moses Schönfinkel (1924) On the building blocks of mathematical logic, with commentary by Willard Quine, pages 355-366. The start of combinatory logic. • John von Neumann (1925) An axiomatization of set theory, with commentary by van Heijenoort , pages 393-413. Wherein von Neumann creates "classes" as distinct from "sets" (the "classes" are Zermelo's "definite properties"), and now there is a universal set, etc. • David Hilbert (1927) The foundations of mathematics by van Heijenoort, with commentary, pages 464-479. • Whitehead, Alfred North and Russell, Bertrand (1913, 1962 edition), Principia Mathematica to *56, Cambridge at the University Press, London UK, no ISBN or US card catalog number.

External links • • • • • • •

The Wolfram Functions Site [78] gives formulae and visualizations of many mathematical functions. Shodor: Function Flyer [79], interactive Java applet for graphing and exploring functions. xFunctions [80], a Java applet for exploring functions graphically. Draw Function Graphs [81], online drawing program for mathematical functions. Functions [82] from cut-the-knot. Function at ProvenMath [83]. Comprehensive web-based function graphing & evaluation tool [84]

References [1] The history of the function concept in mathematics (http:/ / www. educ. fc. ul. pt/ docentes/ jponte/ docs-uk/ 92 Ponte (Functions). doc) J.P.Ponte, 1992 [2] Another short but useful history is found in Eves 1990 pages 234-235 [3] Thompson, S.P; Gardner, M; Calculus Made Easy. 1998. Page 10-11. ISBN 0312185480. [4] Eves dates Leibniz's first use to the year 1694 and also similarly relates the usage to "as a term to denote any quantity connected with a curve, such as the coordinates of a point on the curve, the slope of the curve, and so on" (Eves 1990:234). [5] Eves 1990:234 [6] Eves 1990:235 [7] Eves asserts that Dirichlet "arrived at the following formulation: "[The notion of] a variable is a symbol that represents any one of a set of numbers; if two variables x and y are so related that whenever a value is assigned to x there is automatically assigned, by some rule or correspondence, a value to y, then we say y is a (single-valued) function of x. The variable x . . . is called the independent variable and the variable y is called the dependent variable. The permissible values that x may assume constitute the domain of definition of the function, and the values taken on by y constitute the range of values of the function . . . it stresses the basic idea of a relationship between two sets of numbers" Eves 1990:235. [8] Boole circa 1849 Elementary Treatise on Logic not mathematical including philosophy of mathematical reasoning in Grattan-Guiness and Bornet 1997:40 [9] De Morgan 1847:1 [10] Boole 1848 in Grattan-Guiness and Bornet 1997:1, 2 [11] Boole 1848 in Grattan-Guiness and Bornet 1997:6 [12] Eves 1990:222 [13] Some of this criticism is intense: see the introduction by Willard Quine preceding Russell 1908 Mathematical logic as based on the theory of types in van Heijenoort 1967:151. See also von Neumann's introduction to his 1925 Axiomatization of Set Theory in van Heijenoort 1967:395 [14] Boole 1854:86 [15] cf Boole 1854:31-34. Boole discusses this "special law" with its two algebraic roots x = 0 or 1, on page 37.

211

Function (mathematics) [16] Although he gives others credit, cf Venn 1881:6 [17] Venn 1881: 86-87 [18] cf van Heijenoort's introduction to Peano 1889 in van Heijenoort 1967. For most of his logical symbolism and notions of propositions Peano credits "many writers, especially Boole". In footnote 1 he credits Boole 1847, 1848, 1854, Schröder 1877, Peirce 1880, Jevons 1883, MacColl 1877, 1878, 1878a, 1880; cf van Heijenoort 1967:86). [19] Frege 1879 in van Heijenoort 1967:7 [20] Frege's exact words are "expressed in our formula language" and "expression", cf Frege 1879 in van Heijenoort 1967:21-22. [21] This example is from Frege 1879 in van Heijenoort 1967:21-22 [22] Frege 1879 in van Heijenoort 1967:21-22 [23] Frege cautions that the function will have "argument places" where the argument should be placed as distinct from other places where the same sign might appear. But he does not go deeper into how to signify these positions and Russell 1903 observes this. [24] Gottlob Frege (1879) in van Heijenoort 1967:21-24 [25] "...Peano intends to cover much more ground than Frege does in his Begriffsschrift and his subsequent works, but he does not till that ground to any depth comparable to what Frege does in his self-allotted field", van Heijenoort 1967:85 [26] van Heijenoort 1967:89. [27] van Heijenoort 1967:91. [28] All symbols used here are from Peano 1889 in van Heijenoort 1967:91). [29] cf van Heijenoort 1967:91 [30] "In Mathematics, my chief obligations, as is indeed evident, are to Georg Cantor and Professor Peano. If I had become acquainted sooner with the work of Professor Frege, I should have owed a great deal to him, but as it is I arrived independently at many results which he had already established", Russell 1903:viii. He also highlights Boole's 1854 Laws of Thought and Ernst Schröder's three volumes of "non-Peanesque methods" 1890, 1891, and 1895 cf Russell 1903:10 [31] Russell 1903:505 [32] Russell 1903:5-6 [33] Russell 1903:7 [34] Russell 1903:19 [35] Russell 1910-1913:15 [36] Whitehead and Russell 1910-1913:6, 8 respectively [37] Something similar appears in Tarski 1946. Tarski refers to a "relational function" as a "ONE-MANY [sic!] or FUNCTIONAL RELATION or simply a FUNCTION". Tarski comments about this reversal of variables on page 99. [38] Whitehead and Russell 1910-1913:31. This paper is important enough that van Heijenoort reprinted it as Whitehead and Russell 1910 Incomplete symbols: Descriptions with commentary by W. V. Quine in van Heijenoort 1967:216-223 [39] Kleene 1952:53 [40] Hilbert in van Heijenoort 1967:466 [41] Turing 1936-7 in Martin Davis The Undecidable 1965:145 [42] cf Kleene 1952:45 [43] "The nonprimitive and arbitrary character of this axiom drew forth severe criticism, and much of subsequent refinement of the logistic program lies in attempts to devise some method of avoiding the disliked axiom of reducibility" Eves 1990:268. [44] Frege 1879 in van Heijenoort 1967:23 [45] Russell (1902) Letter to Frege in van Heijenoort 1967:124 [46] Frege (1902) Letter to Russell in van Heijenoort 1967:127 [47] van Heijenoort's commentary to Russell's Letter to Frege in van Heijenoort 1967:124 [48] The original uses an Old High German symbol in place of Φ cf Zermelo 1908a in van Heijenoort 1967:202 [49] Zermelo 1908a in van Heijenoort 1967:203 [50] cf van Heijenoort's commentary before Zermelo 1908 Investigations in the foundations of set theory I in van Heijenoort 1967:199 [51] Skolem 1922 in van Heijenoort 1967:292-293 [52] van Heijenoort's introduction to Abraham Fraenkel's The notion "definite" and the independence of the axiom of choice in van Heijenoort 1967:285. [53] But Wiener offers no date or reference cf Wiener 1914 in van Heijenoort 1967:226 [54] Russell 1903:99 [55] both quotes from Whitehead and Russell 1913:26 [56] Whitehead and Russell 1913:26 [57] Russell 1903:523-529 [58] *12 The Hierarchy of Types and the axiom of Reducibility in Principia Mathematica 1913:161 [59] Wiener 1914 in van Heijenoort 1967:224 [60] commentary by van Heijenoort preceding Norbert Wiener's (1914) A simplification of the logic of relations in van Heijenoort 1967:224. [61] Suppes 1960:32. This same point appears in van Heijenoort's commentary before Wiener (1914) in van Heijenoort 1967:224. [62] Wiener 1914 in van Heijeoort 1967:224 [63] Russell 1920:46

212

Function (mathematics) [64] Schönfinkel (1924) On the building blocks of mathematical logic in van Heijenoort 1967:359 [65] commentary by W. V. Quine preceding Schönfinkel (1924) On the building blocks of mathematical logic in van Heijenoort 1967:356. [66] cf Curry and Feys 1958; Quine in van Heijenoort 1967:357. [67] von Neumann's critique of the history observes the split between the logicists (e.g., Russell et. al.) and the set-theorists (e.g., Zermelo et. al.) and the formalists (e.g., Hilbert), cf von Neumann 1925 in van Heijenoort 1967:394-396. [68] von Neumann 1925 in van Heijenoort 1967:396 [69] All quotes from von Neumann 1925 in van Heijenoort 1967:397-398 [70] This notion is not easy to summarize; see more at van Heijenoort 1967:397. [71] Kleene 1952:143-145 [72] Tarski 1946:5 [73] Tarski 1946:98 [74] Tarski 1946:102 [75] John C. Baez; James Dolan (1998). Categorification (http:/ / arxiv. org/ abs/ math/ 9802029). . [76] http:/ / www. math. tarleton. edu/ Faculty/ Brawner/ 550%20MAED/ History%20of%20functions. pdf [77] http:/ / jstor. org/ stable/ 2686848 [78] http:/ / functions. wolfram. com/ [79] http:/ / www. shodor. org/ interactivate/ activities/ FunctionFlyer/ [80] http:/ / math. hws. edu/ xFunctions/ [81] http:/ / rechneronline. de/ function-graphs/ [82] http:/ / www. cut-the-knot. org/ do_you_know/ FunctionMain. shtml [83] http:/ / www. apronus. com/ provenmath/ cartesian. htm [84] http:/ / sporkforge. com/ math/ fcn_graph_eval. php

Calculus Calculus (Latin, calculus, a small stone used for counting) is a branch in mathematics focused on limits, functions, derivatives, integrals, and infinite series. This subject constitutes a major part of modern mathematics education. It has two major branches, differential calculus and integral calculus, which are related by the fundamental theorem of calculus. Calculus is the study of change[1] , in the same way that geometry is the study of shape and algebra is the study of operations and their application to solving equations. A course in calculus is a gateway to other, more advanced courses in mathematics devoted to the study of functions and limits, broadly called mathematical analysis. Calculus has widespread applications in science, economics, and engineering and can solve many problems for which algebra alone is insufficient. Historically, calculus was called "the calculus of infinitesimals", or "infinitesimal calculus". More generally, calculus (plural calculi) may refer to any method or system of calculation guided by the symbolic manipulation of expressions. Some examples of other well-known calculi are propositional calculus, variational calculus, lambda calculus, pi calculus, and join calculus.

213

Calculus

214

History Ancient The ancient period introduced some of the ideas of integral calculus, but does not seem to have developed these ideas in a rigorous or systematic way. Calculating volumes and areas, the basic function of integral calculus, can be traced back to the Egyptian Moscow papyrus (c. 1820 BC), in which an Egyptian successfully calculated the volume of a pyramidal frustum.[2] [3] From the school of Greek mathematics, Eudoxus (c. 408−355 BC) used the method of exhaustion, which prefigures the concept of the limit, to calculate areas and volumes while Archimedes (c. 287−212 BC) developed this idea further, inventing heuristics which resemble integral calculus.[4] The method of exhaustion was later reinvented in China by Liu Hui in the 3rd century AD in order to find the area of a circle[5] . In the 5th century AD, Zu Chongzhi established a method which would later be called Cavalieri's principle to find the volume of a sphere.[6]

Medieval

Isaac Newton is one of the most famous contributors to

the development of calculus, with, among other things, Around AD 1000, the Islamic mathematician Ibn al-Haytham the use of calculus in his laws of motion and (Alhacen) was the first to derive the formula for the sum of the gravitation. fourth powers of an arithmetic progression, using a method that is readily generalizable to finding the formula for the sum of any higher integral powers, which he used to perform an integration.[7] In the 11th century, the Chinese polymath Shen Kuo developed 'packing' equations that dealt with integration. In the 12th century, the Indian mathematician, Bhāskara II, developed an early derivative representing infinitesimal change, and he described an early form of Rolle's theorem.[8] Also in the 12th century, the Persian mathematician Sharaf al-Dīn al-Tūsī discovered the derivative of cubic polynomials, an important result in differential calculus.[9] In the 14th century, Indian mathematician Madhava of Sangamagrama, along with other mathematician-astronomers of the Kerala school of astronomy and mathematics, described special cases of Taylor series,[10] which are treated in the text Yuktibhasa.[11] [12] [13]

Modern In Europe, the foundational work was a treatise due to Bonaventura Cavalieri, who argued that volumes and areas should be computed as the sums of the volumes and areas of infinitesimal thin cross-sections. The ideas were similar to Archimedes' in The Method, but this treatise was lost until the early part of the twentieth century. Cavalieri's work was not well respected since his methods can lead to erroneous results, and the infinitesimal quantities he introduced were disreputable at first. The formal study of calculus combined Cavalieri's infinitesimals with the calculus of finite differences developed in Europe at around the same time. The combination was achieved by John Wallis, Isaac Barrow, and James Gregory, the latter two proving the second fundamental theorem of calculus around 1675. The product rule and chain rule, the notion of higher derivatives, Taylor series, and analytical functions were introduced by Isaac Newton in an idiosyncratic notation which he used to solve problems of mathematical physics. In his publications, Newton rephrased his ideas to suit the mathematical idiom of the time, replacing calculations

Calculus

215

with infinitesimals by equivalent geometrical arguments which were considered beyond reproach. He used the methods of calculus to solve the problem of planetary motion, the shape of the surface of a rotating fluid, the oblateness of the earth, the motion of a weight sliding on a cycloid, and many other problems discussed in his Principia Mathematica. In other work, he developed series expansions for functions, including fractional and irrational powers, and it was clear that he understood the principles of the Taylor series. He did not publish all these discoveries, and at this time infinitesimal methods were still considered disreputable. These ideas were systematized into a true calculus of infinitesimals by Gottfried Wilhelm Leibniz, who was originally accused of plagiarism by Newton. [14] He is now regarded as an independent inventor of and contributor to calculus. His contribution was to provide a clear set of rules for manipulating infinitesimal quantities, allowing the computation of second and higher derivatives, and providing the product rule and chain rule, in their differential and integral forms. Unlike Newton, Leibniz paid a lot of attention to the formalism—he often spent days determining appropriate symbols for concepts.

Gottfried Wilhelm Leibniz was originally accused of plagiarizing Sir Isaac Newton's unpublished work (only in Britain, not in continental Europe), but is now regarded as an independent inventor of and contributor to calculus.

Leibniz and Newton are usually both credited with the invention of calculus. Newton was the first to apply calculus to general physics and Leibniz developed much of the notation used in calculus today. The basic insights that both Newton and Leibniz provided were the laws of differentiation and integration, second and higher derivatives, and the notion of an approximating polynomial series. By Newton's time, the fundamental theorem of calculus was known.

When Newton and Leibniz first published their results, there was great controversy over which mathematician (and therefore which country) deserved credit. Newton derived his results first, but Leibniz published first. Newton claimed Leibniz stole ideas from his unpublished notes, which Newton had shared with a few members of the Royal Society. This controversy divided English-speaking mathematicians from continental mathematicians for many years, to the detriment of English mathematics. A careful examination of the papers of Leibniz and Newton shows that they arrived at their results independently, with Leibniz starting first with integration and Newton with differentiation. Today, both Newton and Leibniz are given credit for developing calculus independently. It is Leibniz, however, who gave the new discipline its name. Newton called his calculus "the science of fluxions". Since the time of Leibniz and Newton, many mathematicians have contributed to the continuing development of calculus. In the 19th century, calculus was put on a much more rigorous footing by mathematicians such as Cauchy, Riemann, and Weierstrass (see (ε, δ)-definition of limit). It was also during this period that the ideas of calculus were generalized to Euclidean space and the complex plane. Lebesgue generalized the notion of the integral so that virtually any function has an integral, while Laurent Schwartz extended differentiation in much the same way. Calculus is a ubiquitous topic in most modern high schools and universities around the world.[15]

Calculus

Significance While some of the ideas of calculus were developed earlier in Egypt, Greece, China, India, Iraq, Persia, and Japan, the modern use of calculus began in Europe, during the 17th century, when Isaac Newton and Gottfried Wilhelm Leibniz built on the work of earlier mathematicians to introduce its basic principles. The development of calculus was built on earlier concepts of instantaneous motion and area underneath curves. Applications of differential calculus include computations involving velocity and acceleration, the slope of a curve, and optimization. Applications of integral calculus include computations involving area, volume, arc length, center of mass, work, and pressure. More advanced applications include power series and Fourier series. Calculus can be used to compute the trajectory of a shuttle docking at a space station or the amount of snow in a driveway. Calculus is also used to gain a more precise understanding of the nature of space, time, and motion. For centuries, mathematicians and philosophers wrestled with paradoxes involving division by zero or sums of infinitely many numbers. These questions arise in the study of motion and area. The ancient Greek philosopher Zeno gave several famous examples of such paradoxes. Calculus provides tools, especially the limit and the infinite series, which resolve the paradoxes.

Foundations In mathematics, foundations refers to the rigorous development of a subject from precise axioms and definitions. Working out a rigorous foundation for calculus occupied mathematicians for much of the century following Newton and Leibniz and is still to some extent an active area of research today. There is more than one rigorous approach to the foundation of calculus. The usual one today is via the concept of limits defined on the continuum of real numbers. An alternative is nonstandard analysis, in which the real number system is augmented with infinitesimal and infinite numbers, as in the original Newton-Leibniz conception. The foundations of calculus are included in the field of real analysis, which contains full definitions and proofs of the theorems of calculus as well as generalizations such as measure theory and distribution theory.

Principles Limits and infinitesimals Calculus is usually developed by manipulating very small quantities. Historically, the first method of doing so was by infinitesimals. These are objects which can be treated like numbers but which are, in some sense, "infinitely small". An infinitesimal number dx could be greater than 0, but less than any number in the sequence 1, 1/2, 1/3, ... and less than any positive real number. Any integer multiple of an infinitesimal is still infinitely small, i.e., infinitesimals do not satisfy the Archimedean property. From this point of view, calculus is a collection of techniques for manipulating infinitesimals. This approach fell out of favor in the 19th century because it was difficult to make the notion of an infinitesimal precise. However, the concept was revived in the 20th century with the introduction of non-standard analysis and smooth infinitesimal analysis, which provided solid foundations for the manipulation of infinitesimals. In the 19th century, infinitesimals were replaced by limits. Limits describe the value of a function at a certain input in terms of its values at nearby input. They capture small-scale behavior, just like infinitesimals, but use the ordinary real number system. In this treatment, calculus is a collection of techniques for manipulating certain limits. Infinitesimals get replaced by very small numbers, and the infinitely small behavior of the function is found by taking the limiting behavior for smaller and smaller numbers. Limits are the easiest way to provide rigorous foundations for calculus, and for this reason they are the standard approach.

216

Calculus

Differential calculus Differential calculus is the study of the definition, properties, and applications of the derivative of a function. The process of finding the derivative is called differentiation. Given a function and a point in the domain, the derivative at that point is a way of encoding the small-scale behavior of the function near that point. By finding the derivative of a function at every point in its domain, it is possible to produce a new function, called the derivative function or just the derivative of the original function. In mathematical jargon, the derivative is a Tangent line at (x, f(x)). The derivative f′(x) of a curve at a point is the slope (rise linear operator which inputs a function and over run) of the line tangent to that curve at that point. outputs a second function. This is more abstract than many of the processes studied in elementary algebra, where functions usually input a number and output another number. For example, if the doubling function is given the input three, then it outputs six, and if the squaring function is given the input three, then it outputs nine. The derivative, however, can take the squaring function as an input. This means that the derivative takes all the information of the squaring function—such as that two is sent to four, three is sent to nine, four is sent to sixteen, and so on—and uses this information to produce another function. (The function it produces turns out to be the doubling function.) The most common symbol for a derivative is an apostrophe-like mark called prime. Thus, the derivative of the function of f is f′, pronounced "f prime." For instance, if f(x) = x2 is the squaring function, then f′(x) = 2x is its derivative, the doubling function. If the input of the function represents time, then the derivative represents change with respect to time. For example, if f is a function that takes a time as input and gives the position of a ball at that time as output, then the derivative of f is how the position is changing in time, that is, it is the velocity of the ball. If a function is linear (that is, if the graph of the function is a straight line), then the function can be written y = mx + b, where:

This gives an exact value for the slope of a straight line. If the graph of the function is not a straight line, however, then the change in y divided by the change in x varies. Derivatives give an exact meaning to the notion of change in output with respect to change in input. To be concrete, let f be a function, and fix a point a in the domain of f. (a, f(a)) is a point on the graph of the function. If h is a number close to zero, then a + h is a number close to a. Therefore (a + h, f(a + h)) is close to (a, f(a)). The slope between these two points is

This expression is called a difference quotient. A line through two points on a curve is called a secant line, so m is the slope of the secant line between (a, f(a)) and (a + h, f(a + h)). The secant line is only an approximation to the behavior of the function at the point a because it does not account for what happens between a and a + h. It is not possible to discover the behavior at a by setting h to zero because this would require dividing by zero, which is impossible. The derivative is defined by taking the limit as h tends to zero, meaning that it considers the behavior of f

217

Calculus for all small values of h and extracts a consistent value for the case when h equals zero:

Geometrically, the derivative is the slope of the tangent line to the graph of f at a. The tangent line is a limit of secant lines just as the derivative is a limit of difference quotients. For this reason, the derivative is sometimes called the slope of the function f. Here is a particular example, the derivative of the squaring function at the input 3. Let f(x) = x2 be the squaring function.

The derivative f′(x) of a curve at a point is the slope of the line tangent to that curve at that point. This slope is determined by considering the limiting value of the slopes of secant lines. Here the function involved (drawn in red) is f(x) = x3 − x. The tangent line (in green) which passes through the point (−3/2, −15/8) has a slope of 23/4. Note that the vertical and horizontal scales in this image are different.

The slope of tangent line to the squaring function at the point (3,9) is 6, that is to say, it is going up six times as fast as it is going to the right. The limit process just described can be performed for any point in the domain of the squaring function. This defines the derivative function of the squaring function, or just the derivative of the squaring function for short. A similar computation to the one above shows that the derivative of the squaring function is the doubling function.

218

Calculus

Leibniz notation A common notation, introduced by Leibniz, for the derivative in the example above is

In an approach based on limits, the symbol dy/dx is to be interpreted not as the quotient of two numbers but as a shorthand for the limit computed above. Leibniz, however, did intend it to represent the quotient of two infinitesimally small numbers, dy being the infinitesimally small change in y caused by an infinitesimally small change dx applied to x. We can also think of d/dx as a differentiation operator, which takes a function as an input and gives another function, the derivative, as the output. For example:

In this usage, the dx in the denominator is read as "with respect to x". Even when calculus is developed using limits rather than infinitesimals, it is common to manipulate symbols like dx and dy as if they were real numbers; although it is possible to avoid such manipulations, they are sometimes notationally convenient in expressing operations such as the total derivative.

Integral calculus Integral calculus is the study of the definitions, properties, and applications of two related concepts, the indefinite integral and the definite integral. The process of finding the value of an integral is called integration. In technical language, integral calculus studies two related linear operators. The indefinite integral is the antiderivative, the inverse operation to the derivative. F is an indefinite integral of f when f is a derivative of F. (This use of upper- and lower-case letters for a function and its indefinite integral is common in calculus.) The definite integral inputs a function and outputs a number, which gives the area between the graph of the input and the x-axis. The technical definition of the definite integral is the limit of a sum of areas of rectangles, called a Riemann sum. A motivating example is the distances traveled in a given time.

If the speed is constant, only multiplication is needed, but if the speed changes, then we need a more powerful method of finding the distance. One such method is to approximate the distance traveled by breaking up the time into many short intervals of time, then multiplying the time elapsed in each interval by one of the speeds in that interval, and then taking the sum (a Riemann sum) of the approximate distance traveled in each interval. The basic idea is that if only a short time elapses, then the speed will stay more or less the same. However, a Riemann sum only gives an approximation of the distance traveled. We must take the limit of all such Riemann sums to find the exact distance traveled.

219

Calculus

220 If f(x) in the diagram on the left represents speed as it varies over time, the distance traveled (between the times represented by a and b) is the area of the shaded region s.

To approximate that area, an intuitive method would be to divide up the distance between a and b into a number of equal segments, the length of each segment represented by the symbol Δx. For each small segment, we can choose one value of the function f(x). Call that value h. Then the area of the rectangle with base Δx and height h gives the distance (time Δx multiplied by speed h) traveled in that segment. Associated with each segment is the average value of the function above it, f(x)=h. The sum Integration can be thought of as measuring the area under a curve, defined by of all such rectangles gives an approximation of f(x), between two points (here a and b). the area between the axis and the curve, which is an approximation of the total distance traveled. A smaller value for Δx will give more rectangles and in most cases a better approximation, but for an exact answer we need to take a limit as Δx approaches zero. The symbol of integration is

, an elongated S (the S stands for "sum"). The definite integral is written as:

and is read "the integral from a to b of f-of-x with respect to x." The Leibniz notation dx is intended to suggest dividing the area under the curve into an infinite number of rectangles, so that their width Δx becomes the infinitesimally small dx. In a formulation of the calculus based on limits, the notation

is to be understood as an operator that takes a function as an input and gives a number, the area, as an output; dx is not a number, and is not being multiplied by f(x). The indefinite integral, or antiderivative, is written:

Functions differing by only a constant have the same derivative, and therefore the antiderivative of a given function is actually a family of functions differing only by a constant. Since the derivative of the function y = x² + C, where C is any constant, is y′ = 2x, the antiderivative of the latter is given by:

An undetermined constant like C in the antiderivative is known as a constant of integration.

Calculus

Fundamental theorem The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of antiderivatives to definite integrals. Because it is usually easier to compute an antiderivative than to apply the definition of a definite integral, the Fundamental Theorem of Calculus provides a practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration. The Fundamental Theorem of Calculus states: If a function f is continuous on the interval [a, b] and if F is a function whose derivative is f on the interval (a, b), then

Furthermore, for every x in the interval (a, b),

This realization, made by both Newton and Leibniz, who based their results on earlier work by Isaac Barrow, was key to the massive proliferation of analytic results after their work became known. The fundamental theorem provides an algebraic method of computing many definite integrals—without performing limit processes—by finding formulas for antiderivatives. It is also a prototype solution of a differential equation. Differential equations relate an unknown function to its derivatives, and are ubiquitous in the sciences.

Applications Calculus is used in every branch of the physical sciences, actuarial science, computer science, statistics, engineering, economics, business, medicine, demography, and in other fields wherever a problem can be mathematically modeled and an optimal solution is desired. It allows one to go from (non-constant) rates of change to the total change or vice versa, and many times in studying a problem we know one and are trying to find the other. Physics makes particular use of calculus; all concepts in classical mechanics and electromagnetism are interrelated through calculus. The The logarithmic spiral of the Nautilus shell is a mass of an object of known density, the moment of inertia of objects, classical image used to depict the growth and change related to calculus as well as the total energy of an object within a conservative field can be found by the use of calculus. An example of the use of calculus in mechanics is Newton's second law of motion: historically stated it expressly uses the term "rate of change" which refers to the derivative saying The rate of change of momentum of a body is equal to the resultant force acting on the body and is in the same direction. Commonly expressed today as Force = Mass × acceleration, it involves differential calculus because acceleration is the time derivative of velocity or second time derivative of trajectory or spatial position. Starting from knowing how an object is accelerating, we use calculus to derive its path. Maxwell's theory of electromagnetism and Einstein's theory of general relativity are also expressed in the language of differential calculus. Chemistry also uses calculus in determining reaction rates and radioactive decay. In biology, population dynamics starts with reproduction and death rates to model population changes. Calculus can be used in conjunction with other mathematical disciplines. For example, it can be used with linear algebra to find the "best fit" linear approximation for a set of points in a domain. Or it can be used in probability theory to determine the probability of a continuous random variable from an assumed density function. In analytic geometry, the study of graphs of functions, calculus is used to find high points and low points (maxima and minima), slope, concavity and inflection points.

221

Calculus Green's Theorem, which gives the relationship between a line integral around a simple closed curve C and a double integral over the plane region D bounded by C, is applied in an instrument known as a planimeter which is used to calculate the area of a flat surface on a drawing. For example, it can be used to calculate the amount of area taken up by an irregularly shaped flower bed or swimming pool when designing the layout of a piece of property. In the realm of medicine, calculus can be used to find the optimal branching angle of a blood vessel so as to maximize flow. From the decay laws for a particular drug's elimination from the body, it's used to derive dosing laws. In nuclear medicine, it's used to build models of radiation transport in targeted tumor therapies. In economics, calculus allows for the determination of maximal profit by providing a way to easily calculate both marginal cost and marginal revenue. Calculus is also used to find approximate solutions to equations; in practice it's the standard way to solve differential equations and do root finding in most applications. Examples are methods such as Newton's method, fixed point iteration, and linear approximation. For instance, spacecraft use a variation of the Euler method to approximate curved courses within zero gravity environments.

See also Lists • • • •

List of differentiation identities List of calculus topics Publications in calculus Table of integrals

Related topics • • • • • • • • • • • • • • • • • • •

Calculus of finite differences Calculus with polynomials Complex analysis Differential equation Differential geometry Elementary calculus Fourier series Integral equation Mathematical analysis Mathematics Multivariable calculus Non-classical analysis Non-standard analysis Non-standard calculus Precalculus (mathematical education) Product Integrals Stochastic calculus Taylor series Time-scale calculus

222

Calculus

References Books • Larson, Ron, Bruce H. Edwards (2010). "Calculus", 9th ed., Brooks Cole Cengage Learning. ISBN 9780547167022 • McQuarrie, Donald A. (2003). Mathematical Methods for Scientists and Engineers, University Science Books. ISBN 9781891389245 • Stewart, James (2008). Calculus: Early Transcendentals, 6th ed., Brooks Cole Cengage Learning. ISBN 9780495011668 • Thomas, George B., Maurice D. Weir, Joel Hass, Frank R. Giordano (2008), "Calculus", 11th ed., Addison-Wesley. ISBN 0-321-48987-X

Other resources Further reading • Courant, Richard ISBN 978-3540650584 Introduction to calculus and analysis 1. • Edmund Landau. ISBN 0-8218-2830-4 Differential and Integral Calculus, American Mathematical Society. • Robert A. Adams. (1999). ISBN 978-0-201-39607-2 Calculus: A complete course. • Albers, Donald J.; Richard D. Anderson and Don O. Loftsgaarden, ed. (1986) Undergraduate Programs in the Mathematics and Computer Sciences: The 1985-1986 Survey, Mathematical Association of America No. 7. • John Lane Bell: A Primer of Infinitesimal Analysis, Cambridge University Press, 1998. ISBN 978-0-521-62401-5. Uses synthetic differential geometry and nilpotent infinitesimals. • Florian Cajori, "The History of Notations of the Calculus." Annals of Mathematics, 2nd Ser., Vol. 25, No. 1 (Sep., 1923), pp. 1–46. • Leonid P. Lebedev and Michael J. Cloud: "Approximating Perfection: a Mathematician's Journey into the World of Mechanics, Ch. 1: The Tools of Calculus", Princeton Univ. Press, 2004. • Cliff Pickover. (2003). ISBN 978-0-471-26987-8 Calculus and Pizza: A Math Cookbook for the Hungry Mind. • Michael Spivak. (September 1994). ISBN 978-0-914098-89-8 Calculus. Publish or Perish publishing. • Tom M. Apostol. (1967). ISBN 9780471000051 Calculus, Volume 1, One-Variable Calculus with an Introduction to Linear Algebra. Wiley. • Tom M. Apostol. (1969). ISBN 9780471000075 Calculus, Volume 2, Multi-Variable Calculus and Linear Algebra with Applications. Wiley. • Silvanus P. Thompson and Martin Gardner. (1998). ISBN 978-0-312-18548-0 Calculus Made Easy. • Mathematical Association of America. (1988). Calculus for a New Century; A Pump, Not a Filter, The Association, Stony Brook, NY. ED 300 252. • Thomas/Finney. (1996). ISBN 978-0-201-53174-9 Calculus and Analytic geometry 9th, Addison Wesley. • Weisstein, Eric W. "Second Fundamental Theorem of Calculus." [16] From MathWorld—A Wolfram Web Resource.

223

Calculus

Online books • Crowell, B. (2003). "Calculus" Light and Matter, Fullerton. Retrieved 6 May 2007 from http://www. lightandmatter.com/calc/calc.pdf [17] • Garrett, P. (2006). "Notes on first year calculus" University of Minnesota. Retrieved 6 May 2007 from http://www.math.umn.edu/~garrett/calculus/first_year/notes.pdf [18] • Faraz, H. (2006). "Understanding Calculus" Retrieved 6 May 2007 from Understanding Calculus, URL http:// www.understandingcalculus.com/ [19] (HTML only) • Keisler, H. J. (2000). "Elementary Calculus: An Approach Using Infinitesimals" Retrieved 29 August 2010 from http://www.math.wisc.edu/~keisler/calc.html [23] • Mauch, S. (2004). "Sean's Applied Math Book" California Institute of Technology. Retrieved 6 May 2007 from http://www.cacr.caltech.edu/~sean/applied_math.pdf [20] • Sloughter, Dan (2000). "Difference Equations to Differential Equations: An introduction to calculus". Retrieved 17 March 2009 from http://synechism.org/drupal/de2de/ [21] • Stroyan, K.D. (2004). "A brief introduction to infinitesimal calculus" University of Iowa. Retrieved 6 May 2007 from http://www.math.uiowa.edu/~stroyan/InfsmlCalculus/InfsmlCalc.htm [24] (HTML only) • Strang, G. (1991). "Calculus" Massachusetts Institute of Technology. Retrieved 6 May 2007 from http://ocw. mit.edu/ans7870/resources/Strang/strangtext.htm [22] • Smith, William V. (2001). "The Calculus" Retrieved 4 July 2008 [23] (HTML only).

External links • • • • • • • • • • • • • • •

Weisstein, Eric W., "Calculus [24]" from MathWorld. Topics on Calculus [25] at PlanetMath. Calculus Made Easy (1914) by Silvanus P. Thompson [26] Full text in PDF Calculus [27] on In Our Time at the BBC. (listen now [28]) Calculus.org: The Calculus page [29] at University of California, Davis – contains resources and links to other sites COW: Calculus on the Web [30] at Temple University – contains resources ranging from pre-calculus and associated algebra Earliest Known Uses of Some of the Words of Mathematics: Calculus & Analysis [31] Online Integrator (WebMathematica) [32] from Wolfram Research The Role of Calculus in College Mathematics [33] from ERICDigests.org OpenCourseWare Calculus [34] from the Massachusetts Institute of Technology Infinitesimal Calculus [35] – an article on its historical development, in Encyclopaedia of Mathematics, Michiel Hazewinkel ed. . Elements of Calculus I [36] and Calculus II for Business [37], OpenCourseWare from the University of Notre Dame with activities, exams and interactive applets. Calculus for Beginners and Artists [38] by Daniel Kleitman, MIT Calculus Problems and Solutions [39] by D. A. Kouba Solved problems in calculus [40]

224

Calculus

References [1] Latorre, Donald R.; Kenelly, John W.; Reed, Iris B.; Biggers, Sherry (2007), Calculus Concepts: An Applied Approach to the Mathematics of Change (http:/ / books. google. com/ books?id=bQhX-3k0LS8C), Cengage Learning, p. 2, ISBN 0-618-78981-2, , Chapter 1, p 2 (http:/ / books. google. com/ books?id=bQhX-3k0LS8C& pg=PA2) [2] There is no exact evidence on how it was done; some, including Morris Kline (Mathematical thought from ancient to modern times Vol. I) suggest trial and error. [3] Helmer Aslaksen. Why Calculus? (http:/ / www. math. nus. edu. sg/ aslaksen/ teaching/ calculus. html) National University of Singapore. [4] Archimedes, Method, in The Works of Archimedes ISBN 978-0-521-66160-7 [5] Dun, Liu; Fan, Dainian; Cohen, Robert Sonné (1966). A comparison of Archimdes' and Liu Hui's studies of circles (http:/ / books. google. com/ books?id=jaQH6_8Ju-MC). Chinese studies in the history and philosophy of science and technology. 130. Springer. p. 279. ISBN 0-792-33463-9. ., Chapter , p. 279 (http:/ / books. google. com/ books?id=jaQH6_8Ju-MC& pg=PA279) [6] Zill, Dennis G.; Wright, Scott; Wright, Warren S. (2009). Calculus: Early Transcendentals (http:/ / books. google. com/ books?id=R3Hk4Uhb1Z0C) (3 ed.). Jones & Bartlett Learning. p. xxvii. ISBN 0-763-75995-3. ., Extract of page 27 (http:/ / books. google. com/ books?id=R3Hk4Uhb1Z0C& pg=PR27) [7] Victor J. Katz (1995). "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3), pp. 163-174. [8] Ian G. Pearce. Bhaskaracharya II. (http:/ / turnbull. mcs. st-and. ac. uk/ ~history/ Projects/ Pearce/ Chapters/ Ch8_5. html) [9] J. L. Berggren (1990). "Innovation and Tradition in Sharaf al-Din al-Tusi's Muadalat", Journal of the American Oriental Society 110 (2), pp. 304-309. [10] "Madhava" (http:/ / www-gap. dcs. st-and. ac. uk/ ~history/ Biographies/ Madhava. html). Biography of Madhava. School of Mathematics and Statistics University of St Andrews, Scotland. . Retrieved 2006-09-13. [11] "An overview of Indian mathematics" (http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ Indian_mathematics. html). Indian Maths. School of Mathematics and Statistics University of St Andrews, Scotland. . Retrieved 2006-07-07. [12] "Science and technology in free India" (http:/ / www. kerala. gov. in/ keralcallsep04/ p22-24. pdf) (PDF). Government of Kerala — Kerala Call, September 2004. Prof.C.G.Ramachandran Nair. . Retrieved 2006-07-09. [13] Charles Whish (1834), "On the Hindu Quadrature of the circle and the infinite series of the proportion of the circumference to the diameter exhibited in the four Sastras, the Tantra Sahgraham, Yucti Bhasha, Carana Padhati and Sadratnamala", Transactions of the Royal Asiatic Society of Great Britain and Ireland (Royal Asiatic Society of Great Britain and Ireland) 3 (3): 509–523, doi:10.1017/S0950473700001221, JSTOR 25581775 [14] Leibniz, Gottfried Wilhelm. The Early Mathematical Manuscripts of Leibniz. Cosimo, Inc., 2008. Page 228. [ Copy (http:/ / books. google. com/ books?hl=en& lr=& id=7d8_4WPc9SMC& oi=fnd& pg=PA3& dq=Gottfried+ Wilhelm+ Leibniz+ accused+ of+ plagiarism+ by+ Newton& ots=09h9BdTlbE& sig=hu5tNKpBJxHcpj8U3kR_T2bZqrY#v=onepage& q=plagairism& f=false|Online)] [15] UNESCO-World Data on Education (http:/ / nt5. scbbs. com/ cgi-bin/ om_isapi. dll?clientID=137079235& infobase=iwde. nfo& softpage=PL_frame) [16] http:/ / mathworld. wolfram. com/ SecondFundamentalTheoremofCalculus. html [17] http:/ / www. lightandmatter. com/ calc/ calc. pdf [18] http:/ / www. math. umn. edu/ ~garrett/ calculus/ first_year/ notes. pdf [19] http:/ / www. understandingcalculus. com/ [20] http:/ / www. cacr. caltech. edu/ ~sean/ applied_math. pdf [21] http:/ / synechism. org/ drupal/ de2de/ [22] http:/ / ocw. mit. edu/ ans7870/ resources/ Strang/ strangtext. htm [23] http:/ / www. math. byu. edu/ ~smithw/ Calculus/ [24] http:/ / mathworld. wolfram. com/ Calculus. html [25] http:/ / planetmath. org/ encyclopedia/ TopicsOnCalculus. html [26] http:/ / djm. cc/ library/ Calculus_Made_Easy_Thompson. pdf [27] http:/ / www. bbc. co. uk/ programmes/ b00mrfwq [28] http:/ / www. bbc. co. uk/ iplayer/ console/ b00mrfwq/ In_Our_Time_Calculus [29] http:/ / www. calculus. org [30] http:/ / cow. math. temple. edu/ [31] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Calculus%20and%20Analysis%20Earliest%20Uses. htm [32] http:/ / integrals. wolfram. com/ [33] http:/ / www. ericdigests. org/ pre-9217/ calculus. htm [34] http:/ / ocw. mit. edu/ OcwWeb/ Mathematics/ index. htm [35] http:/ / eom. springer. de/ I/ i050950. htm [36] http:/ / ocw. nd. edu/ mathematics/ elements-of-calculus-i [37] http:/ / ocw. nd. edu/ mathematics/ calculus-ii-for-business [38] http:/ / math. mit. edu/ ~djk/ calculus_beginners/ [39] http:/ / www. math. ucdavis. edu/ ~kouba/ ProblemsList. html [40] http:/ / calculus. solved-problems. com/

225

Average

Average In mathematics, an average, or central tendency[1] of a data set is a measure of the "middle" value of the data set. There are many different descriptive statistics that can be chosen as a measurement of the central tendency of the data items. These include arithmetic mean, the median and the mode. Other statistical measures such as the standard deviation and the range are called measures of spread and describe how spread out the data is. An average is a single value that is meant to typify a list of values. If all the numbers in the list are the same, then this number should be used. If the numbers are not the same, an easy way to get a representative value from a list is to randomly pick any number from the list. In the latter case, the average is calculated by combining the values from the set in a specific way and computing a single number as being the average of the set. The most common method is the arithmetic mean but there are many other types of central tendency, such as median (which is used most often when the distribution of the values is skewed with some small numbers of very high values, as seen with house prices or incomes).[2]

Calculation Arithmetic mean If n numbers are given, each number denoted by ai, where i = 1, ..., n, the arithmetic mean is the [sum] of the ai's divided by n or

The arithmetic mean, often simply called the mean, of two numbers, such as 2 and 8, is obtained by finding a value A such that 2 + 8 = A + A. One may find that A = (2 + 8)/2 = 5. Switching the order of 2 and 8 to read 8 and 2 does not change the resulting value obtained for A. The mean 5 is not less than the minimum 2 nor greater than the maximum 8. If we increase the number of terms in the list for which we want an average, we get, for example, that the arithmetic mean of 2, 8, and 11 is found by solving for the value of A in the equation 2 + 8 + 11 = A + A + A. One finds that A = (2 + 8 + 11)/3 = 7. Changing the order of the three members of the list does not change the result: A = (8 + 11 + 2)/3 = 7 and that 7 is between 2 and 11. This summation method is easily generalized for lists with any number of elements. However, the mean of a list of integers is not necessarily an integer. "The average family has 1.7 children" is a jarring way of making a statement that is more appropriately expressed by "the average number of children in the collection of families examined is 1.7".

Geometric mean The geometric mean of n numbers is obtained by multiplying them all together and then taking the nth root. In algebraic terms, the geometric mean of a1, a2, ..., an is defined as

Geometric mean can be thought of as the antilog of the arithmetic mean of the logs of the numbers. Example: Geometric mean of 2 and 8 is

226

Average

Harmonic mean Harmonic mean for a set of numbers a1, a2, ..., an is defined as the reciprocal of the arithmetic mean of the reciprocals of ai's:

One example where it is useful is calculating the average speed. For example, if the speed for going from point A to B was 60 km/h, and the speed for returning from B to A was 40 km/h, then the average speed is given by

Inequality concerning AM, GM, and HM A well known inequality concerning arithmetic, geometric, and harmonic means for any set of positive numbers is

It is easy to remember noting that the alphabetical order of the letters A, G, and H is preserved in the inequality. See Inequality of arithmetic and geometric means.

Mode and median The most frequently occurring number in a list is called the mode. The mode of the list (1, 2, 2, 3, 3, 3, 4) is 3. The mode is not necessarily well defined, the list (1, 2, 2, 3, 3, 5) has the two modes 2 and 3. The mode can be subsumed under the general method of defining averages by understanding it as taking the list and setting each member of the list equal to the most common value in the list if there is a most common value. This list is then equated to the resulting list with all values replaced by the same value. Since they are already all the same, this does not require any change. The mode is more meaningful and potentially useful if there are many numbers in the list, and the frequency of the numbers progresses smoothly (e.g., if out of a group of 1000 people, 30 people weigh 61 kg, 32 weigh 62 kg, 29 weigh 63 kg, and all the other possible weights occur less frequently, then 62 kg is the mode). The mode has the advantage that it can be used with non-numerical data (e.g., red cars are most frequent), while other averages cannot. The median is the middle number of the group when they are ranked in order. (If there are an even number of numbers, the mean of the middle two is taken.) Thus to find the median, order the list according to its elements' magnitude and then repeatedly remove the pair consisting of the highest and lowest values until either one or two values are left. If exactly one value is left, it is the median; if two values, the median is the arithmetic mean of these two. This method takes the list 1, 7, 3, 13 and orders it to read 1, 3, 7, 13. Then the 1 and 13 are removed to obtain the list 3, 7. Since there are two elements in this remaining list, the median is their arithmetic mean, (3 + 7)/2 = 5.

Average Percentage Return The average percentage return is a type of average used in finance. It is an example of a geometric mean. For example, if we are considering a period of two years, and the investment return in the first year is −10% and the return in the second year is +60%, then the average percentage return, R, can be obtained by solving the equation: (1 − 10%) × (1 + 60%) = (1 − 0.1) × (1 + 0.6) = (1 + R) × (1 + R). The value of R that makes this equation true is 0.2, or 20%. Note that changing the order to find the average percentage returns of +60% and −10% gives the same result as the average percentage returns of −10% and +60%. This method can be generalized to examples in which the periods are not all of one-year duration. Average percentage of a set of returns is a variation on the geometric average that provides the intensive property of a return

227

Average

228

per year corresponding to a list of percentage returns. For example, consider a period of a half of a year for which the return is −23% and a period of two and one half years for which the return is +13%. The average percentage return for the combined period is the single year return, R, that is the solution of the following equation: (1 − 0.23)0.5 × (1 + 0.13)2.5 = (1 + R)0.5+2.5, giving an average percentage return R of 0.0600 or 6.00%.

Types The table of mathematical symbols explains the symbols used below. Name

Equation or description

Arithmetic mean

Median

The middle value that separates the higher half from the lower half of the data set

Geometric median

A rotation invariant extension of the median for points in Rn

Mode

The most frequent value in the data set

Geometric mean

Harmonic mean

Quadratic mean (or RMS) Generalized mean Weighted mean

Truncated mean

The arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded

Interquartile mean

A special case of the truncated mean, using the interquartile range

Midrange Winsorized mean Similar to the truncated mean, but, rather than deleting the extreme values, they are set equal to the largest and smallest values that remain Annualization

Average

229

Solutions to variational problems Several measures of central tendency can be characterized as solving a variational problem, in the sense of the calculus of variations, namely minimizing variation from the center. That is, given a measure of statistical dispersion, one asks for a measure of central tendency that minimizes variation: such that variation from the center is minimal among all choices of center. In a quip, "dispersion precedes location". In the sense of Lp spaces, the correspondence is: Lp

dispersion

central tendency

L1 average absolute deviation median L2 standard deviation

mean

L∞ maximum deviation

midrange

Thus standard deviation about the mean is lower than standard deviation about any other point, and the maximum deviation about the midrange is lower than the maximum deviation about any other point. The uniqueness of this characterization of mean follows from convex optimization. Indeed, for a given (fixed) data set x, the function represents the dispersion about a constant value c relative to the L2 norm. Because the function ƒ2 is a strictly convex coercive function, the minimizer exists and is unique. Note that the median in this sense is not in general unique, and in fact any point between the two central points of a discrete distribution minimizes average absolute deviation. The dispersion in the L1 norm, given by

is not strictly convex, whereas strict convexity is needed to ensure uniqueness of the minimizer. In spite of this, the minimizer is unique for the L∞ norm.

Miscellaneous types Other more sophisticated averages are: trimean, trimedian, and normalized mean. One can create one's own average metric using generalized f-mean:

where f is any invertible function. The harmonic mean is an example of this using f(x) = 1/x, and the geometric mean is another, using f(x) = log x. Another example, expmean (exponential mean) is a mean using the function f(x) = ex, and it is inherently biased towards the higher values. However, this method for generating means is not general enough to capture all averages. A more general method for defining an average, y, takes any function of a list g(x1, x2, ..., xn), which is symmetric under permutation of the members of the list, and equates it to the same function with the value of the average replacing each member of the list: g(x1, x2, ..., xn) = g(y, y, ..., y). This most general definition still captures the important property of all averages that the average of a list of identical elements is that element itself. The function g(x1, x2, ..., xn) =x1+x2+ ...+ xn provides the arithmetic mean. The function g(x1, x2, ..., xn) =x1·x2· ...· xn provides the geometric mean. The function g(x1, x2, ..., xn) =x1−1+x2−1+ ...+ xn−1 provides the harmonic mean. (See John Bibby (1974) “Axiomatisations of the average and a further generalisation of monotonic sequences,” Glasgow Mathematical Journal, vol. 15, pp. 63–65.)

Average

In data streams The concept of an average can be applied to a stream of data as well as a bounded set, the goal being to find a value about which recent data is in some way clustered. The stream may be distributed in time, as in samples taken by some data acquisition system from which we want to remove noise, or in space, as in pixels in an image from which we want to extract some property. An easy-to-understand and widely used application of average to a stream is the simple moving average in which we compute the arithmetic mean of the most recent N data items in the stream. To advance one position in the stream, we add 1/N times the new data item and subtract 1/N times the data item N places back in the stream.

Averages of functions The concept of average can be extended to functions.[3] In calculus, the average value of an integrable function ƒ on an interval [a,b] is defined by

Etymology An early meaning (c. 1500) of the word average is "damage sustained at sea". The root is found in Arabic as awar, in Italian as avaria, in French as avarie and in Dutch as averij. Hence an average adjuster is a person who assesses an insurable loss. Marine damage is either particular average, which is borne only by the owner of the damaged property, or general average, where the owner can claim a proportional contribution from all the parties to the marine venture. The type of calculations used in adjusting general average gave rise to the use of "average" to mean "arithmetic mean". However, according to the Oxford English Dictionary, the earliest usage in English (1489 or earlier) appears to be an old legal term for a tenant's day labour obligation to a sheriff, probably anglicised from "avera" found in the English Domesday Book (1085). This pre-existing term thus lay to hand when an equivalent for avarie was wanted.

References • Hardy, G.H.; Littlewood, J.E.; Pólya, G. (1988), Inequalities (2nd ed.), Cambridge University Press, ISBN 978-0521358804

External links • Median as a weighted arithmetic mean of all Sample Observations [4] • Calculations and comparison between arithmetic and geometric mean of two values [2]

References [1] In statistics, the term central tendency is used in some fields of empirical research to refer to what statisticians sometimes call "location". [2] An axiomatic approach to averages is provided by John Bibby (1974) "Axiomatisations of the average and a further generalization of monotonic sequences", Glasgow Mathematical Journal, vol. 15, pp. 63–65. [3] G. H. Hardy, J. E. Littlewood, and G. Pólya. Inequalities (2nd ed.), Cambridge University Press, ISBN 978-0521358804, 1988. [4] http:/ / economicsbulletin. vanderbilt. edu/ 2004/ volume3/ EB-04C10011A. pdf

230

Article Sources and Contributors

Article Sources and Contributors Arithmetic mean  Source: http://en.wikipedia.org/w/index.php?oldid=386193838  Contributors: 142.165.116.xxx, 217.5.141.xxx, 4.54.210.xxx, AGToth, Adeliine, AdjustShift, Adoniscik, Ahruman, Alai, Amirab, Anclation, Anonymous Dissident, Arcadian, Arienh4, Arithmonic, AugPi, AxelBoldt, BenFrantzDale, Bennybp, Berland, Bh3u4m, Bobo192, BozMo, Bwechner, Calvin 1998, Can't sleep, clown will eat me, Chalome, Charles Matthews, Ciemo, Conversion script, Cronholm144, Cryptfiend64, Da monster under your bed, Dark Charles, Den fjättrade ankan, DerHexer, Derek Ross, Dhanish007, Dick Beldin, Digitat, Dmcq, Dmn, Epbr123, FJPB, Fresheneesz, Frungi, G716, GPHemsley, Gap, Giftlite, Gracenotes, Graham87, HenningThielemann, Igny, Impdog, IntrplnetSarah, Iridescent, Isnow, J.delanoy, JForget, Jamesooders, Jfitzg, Jitse Niesen, Jleedev, Jmath666, Kaboldy, LC, Larry Sanger, Lendu, Lindberg G Williams Jr, MER-C, Maksim-e, MarkSweep, Markhurd, Mejor Los Indios, Melcombe, Mgsloan, Michael Hardy, Mikael Häggström, Mindfrieze, Mormegil, MrOllie, Mschlindwein, Musiphil, Naki, Nanshu, NawlinWiki, Nbarth, Nemhun, Nneonneo, Nsaa, Octahedron80, Oleg Alexandrov, Oli Filth, Otisjimmy1, Paul August, Poor Yorick, Qwfp, RA0808, RaitisMath, Raoulharris, Rbj, Revipm, Rich Farmbrough, Rickproser, Rjwilmsi, Rob Hooft, Rocket71048576, Ryguasu, SUNNY.SALIL, SaltyBoatr, Sannse, Shirulashem, Snoyes, SpeedyGonsales, Spoon!, TPK, Thingg, Timwi, TomasBat, Tomi, Tommy2010, Triskell, WadeSimMiser, Waggers, Warman2100, WookieInHeat, Youandme, Youssefsan, ZeroOne, Zigger, Лев Дубовой, 193 anonymous edits Statistics  Source: http://en.wikipedia.org/w/index.php?oldid=387289244  Contributors: 16@r, 2005, 205.188.199.xxx, 217.70.229.xxx, 3mta3, ABF, AGToth, APH, Acerperi, Adamjslund, Addshore, Adoniscik, Afv2006, Agatecat2700, Ahoerstemeier, Airumel, Aitambong, Aitias, Alansohn, Aldaron, Alex43223, Alexius08, AllenDowney, Amazins490, Amire80, Ampre, Ancheta Wis, AndriuZ, Andycjp, Animum, Anomenat, Anonymi, Antandrus, Anthony Appleyard, Antonwg, Aomarks, Ap, Apdevries, Apollo, Arauzo, Archimerged, Arfgab, Arodb, Arpabr, Art LaPella, Art10, Asarko, AustinZ, Avenue, Avicennasis, Barneca, Beaker342, Beetstra, Ben-Zin, BenB4, BevRowe, Big iron, BlaiseFEgan, Bluemask, Bmilicevic, Bobby H. Heffley, Bobo192, Boffob, Borislav, Bornsommer, Boromir123, Boud, Boxplot, Br1z, BradBeattie, Brazuca, BrendanH, Brennan41292, Brian0918, Brion VIBBER, Brotherbobby, Brougham96, Bwana2009, Bychan, CRGreathouse, CSTAR, Caesarjbsquitti, Camembert, Can't sleep, clown will eat me, CanadianLinuxUser, Captain-tucker, Carbonite, Careercornerstone, Celestianpower, CesarB, Chadmbol, Chamal N, Chaos, Chendy, Chickenman78, Chocolateluvr88, Chocolatier, Chris53516, Chrisrayner, Christian List, Christian75, Christopher Connor, Chun-hian, CiaPan, Ckatz, Class ruiner, Clear range, Cliff smith, Closedmouth, CommodiCast, ConMan, Cswrye, Ctacmo, Curps, Cutler, D. Wu, DVD R W, Dainis, Daniel5127, Daphne A, Dave6, DavidCBryant, DavidWBrooks, Db099221, Dbenzvi, Dcljr, Ddr, Debresser, Decstop, Dee539, Dekisugi, Delaszk, Demmy, Derek farn, DesertAngel, Deviathan, Dexter inside, Dhochron, Dhodges, Diaa abdelmoneim, Dibowen5, Digisus, Discospinster, Dlohcierekim's sock, Dmb000006, DocKrin, Donwarnersaklad, Dpr, Dragoneye776, Drchris, Drf5n, Drilnoth, Drivi86, Dwayne, Dycedarg, Dysprosia, DéRahier, ECEstats, ERcheck, Eclecticology, Ed Fitzgerald, Edward, Eequor, Ekalin, El C, ElinorD, Elitropia, Emesee, Enchanter, Englishnerd, Epbr123, Epolk, Escape Orbit, Everyking, Evil Monkey, Faradayplank, Favonian, Filelakeshoe, Finn krogstad, FinnMan, Fishiface, Florendobe, FrancoGG, Fred Bauder, Free Software Knight, FreplySpang, Fschoonj, Funandtrvl, Fyyer, G716, GABaker, GB fan, Gaia Octavia Agrippa, Galoubet, Galwhaa, Gandalf61, Gary King, Gianlu, Gidonb, Giftlite, Gimboid13, Giusippe, Gjd001, Glass Sword, Gobbleswoggler, Gogo Dodo, GraemeL, Gsociology, Guaka, Guanaco, GumbyProf, Gundersen53, Gurch, Gwernol, Gzkn, Gökhan, Hallway916, Harryboyles, Hefaistos, Helixweb, Hemanshu, Henrygb, Hereticam, Heron, Herreradavid33, Hgberman, Hiamy, Hike395, Hinaaa, Hingenivrutti, Honza Záruba, Howardjp, Htim, Hu12, Hubbardaie, Hve, HyDeckar, Hydrargyrum, Hydrogen Iodide, IamHope, Ianml, Icairns, Igoldste, Ilya, IlyaHaykinson, Ireas, Isolani, It Is Me Here, Iwaterpolo, J.delanoy, JA(000)Davidson, JD554, JDPhD, JForget, JJL, JaGa, Jacob Lundberg, Jake Nelson, Jakohn, Jb-adder, Jeff Dahl, Jeremykemp, Jfdwolff, Jfitzg, Jim, Jim.henderson, Jimmaths, Jitse Niesen, Jmlk17, JoergenB, John ellenberger, John254, Johnbibby, Johndarrington, Johnjohn83, Jordan.brayanov, Jorjulio, Jorunn, Joseph Solis in Australia, Jrl306, Jt512, Jusdafax, Kanie, Karlheinz037, Karnesky, Katonal, Kbh3rd, Kcordina, Keegan, Keepitup.sid, Kellylautt, Kendroche, Kenneth M Burke, Kenz0402, Kiefer.Wolfowitz, Kiril Simeonovski, Kku, Klnorman, Kolmorogoff, Koolkao, Kuru, Kwanesum, L353a1, LOL, Lambiam, Lapaz, Latka, Ldc, LeeG, Lethe, Levineps, Lexor, Lights, Lilac Soul, Linas, Lindsay658, Livius3, LizardJr8, Locos epraix, Loizbec, Loom91, Looxix, Lostintherush, Luna Santin, Luntertun, M C Y 1008, Maarten van Vliet, Mack2, MagnaMopus, Maha ts, Malhonen, Mangesh.dashpute, Maniac2910, Manik762007, Manop, Marekan, MarkSweep, Markbold, Markmagdy, MarsRover, Mathieumcguire, Mathinik, Mats Kindahl, Matthew Stannard, Mattisse, Maurreen, Mausy5043, Mav, Maximus Rex, Mayumashu, Mct mht, Mdb, Mdd, Megaman en m, Melcombe, Memset, Mentifisto, Mets501, Meursault2004, Mgnbar, Mhmolitor, Michael Hardy, Michal Jurosz, Miguel, Minghong, Modify, Morten Münchow, Mosca, Moverton, Mr Anthem, MrFish, MrOllie, Msh210, Mthibault, MuZemike, Muzzle, Mxn, Myasuda, Mysidia, N.j.hansen, N5iln, Nagytibi, Nbeltz, Neelix, NeilN, Netoholic, Netsnipe, Neurolysis, Nevvers, Nivix, Nixdorf, Nnp, Noeckel, Noschool3, Notizy1251, Notteln, Nrcprm2026, Numbo3, O18, ONEder Boy, Odie5533, Oleg Alexandrov, Olivier, OllieFury, Oneiros, Onore Baka Sama, Optim, OrgasGirl, Oxymoron83, P Carn, PFHLai, Passw0rd, Paul August, Pax:Vobiscum, Paxcoder, Pete.Hurd, Peterlin, Phantomsteve, PhilKnight, Philip Trueman, Photoleif, Piano non troppo, PierreAnoid, Pinethicket, Piotrus, Possum, PrBeacon, Precanalytics, Proofreader77, Quux, Qwfp, Qxz, Ranajeet, RandomCritic, Ranger2006, RayAYang, Rbellin, Rebecca, Recurring dreams, Requestion, RexNL, Rgclegg, Rich Farmbrough, Richard redfern, Richard001, RichardF, Richardelainechambers, Rjwilmsi, Rlsheehan, RobertG, Robertdamron, Robth, Roderickmunro, Ron shelf, Ronz, Roozbeh, RossA, RoyBoy, Rsabbatini, Rstatx, Rumping, Rustyfence, Ruzihm, Rwilli13, SLC1, SWAdair, Saikat, Sak11sl, Salix alba, Salsa Shark, Sam Hocevar, Samuel, Sandman888, Santa Sangre, [email protected], Sbarnard, Schissel, Schneelocke, Seanstock, Seaphoto, Secretlondon, Sengkang, Shadowjams, Shadowpsi, Shaile, ShawnAGaddy, Shyamal, Silly rabbit, Simesa, SimonP, Simoneau, Sina2, SiobhanHansa, Sj, Skagedal, Skew-t, Slack---line, Snoyes, SoLando, Soumyasch, Sourishdas, Spetzznaz, Spiel496, Stathope17, Stephen Gilbert, Stephenb, Stevertigo, Stpasha, Stux, Stynyr, Sunur7, Sweeraha, Symane, TCrossland, THEN WHO WAS PHONE?, Tailpig, Talgalili, Tannin, Tanyawade, Tarquin, Tastewrong1234, Tautologist, Tayste, TeH nOmInAtOr, Templatehater, Terry Oldberg, The Anome, The Transhumanist, The Transhumanist (AWB), TheSeven, Thefellswooper, Tide rolls, Tigershrike, Tim Ivorson, Tiptoety, Toddst1, Tom harrison, Tomi, Tommy2010, Tomsega, Tophcito, Triwbe, Trugster, Turbojet, Tuxedo junction, Ulyssesmsu, Uncle Milty, Unschool, Updatehelper, VX, Veinor, Vimalp, Waabu, Wakka, Waveguy, Wavelength, WeijiBaikeBianji, Wernher, Whisky drinker, Whouk, Wiki alf, Wikiborg, Wikidan829, Wikieditor06, Wikiklrsc, Wikilibrarian, Wikipediatoperfection, Wikisamh, Wikiwilly, Wile E. Heresiarch, Willetjo, Willworkforicecream, WinterSpw, XDanielx, Xerxes minor, Yannick56, YellowMonkey, Yhkhoo, Youkbam, Youssefsan, Yuanfangdelang, Zappernapper, Zeamays, Zenohockey, Zero0w, ZimZalaBim, Zondor, Zven, Île flottante, 1285 anonymous edits Mathematics  Source: http://en.wikipedia.org/w/index.php?oldid=386630969  Contributors: -- April, -jmac-, 0612, 0na01der, 12.254.243.xxx, 130.182.173.xxx, 158-152-12-77, 203.31.48.xxx, 20em89.01, 2boobies, 3Nigma, 4C, 4pario, 5150pacer, 5300abc, 62 (number), 62.8.212.xxx, 68.38.192.xxx, APH, Aaronthegr8, Abc30, Abhilaa, Abtract, Academic Challenger, Aceleo, Adamsan, Adashiel, Addshore, Aditya Kabir, Aetherealize, Agatecat2700, Agroose, Ahoerstemeier, Ais523, Aitias, Akamad, AlanBarrett, Alansohn, Albert Einsteins pipe, Aldermalhir, Alexius08, Alexturse, Alfio, Alientraveller, Alink, Alireza Hashemi, Alliashax, Alpboyraz, Alphachimp, Alphax, Amire80, Amplitude101, Amystreet, AnOddName, Anaraug, Andersmusician, Andreas2001, AndrewKepert, Andrewferrier, Andrewlp1991, Andrewm1986, Andris, Andy pyro, Andy85719, AndySimpson, AndyZ, Angela, Angr, Animalalley12895, AnimeFan7, Anmol9999, Ann Logsdon, Anonymous Dissident, Antandrus, Anthony Appleyard, Antiwiki, Anubhav29, Ap, Apokryltaros, Apparition11, Archaeopteryx, Archfalhwyl, Archmagusrm, Arcturus, ArglebargleIV, Arkuski, Armeria, Armindo, Arne List, ArnoldReinhold, Art LaPella, Arunta007, Arvindn, Asdfdsa, Asterion, Asyndeton, Athenean, AussieOzborn au, Avala, Avenue, Avnjay, Avraham, AwesomeHersh, AxG, AxelBoldt, AxiomShell, Ayman, AzaToth, BD2412, Baa, Babaloulou, BadKarma14, Balaam42, Bandaidboy, Bangvang, BanyanTree, Barak Sh, Barneca, Baronnet, Bart v M, Batmanfan77, Battoe19, Bbatsell, Bcameron54, Beanai, Bedsandbellies, Beland, Ben D., Ben Standeven, Ben2then, BenKovitz, Bencherlite, Benjiboi, Bentong Isles, BertSeghers, Bethnim, Bevo, Bezking, Bfinn, BiT, BillC, Blackmail, Blaxthos, Blobglob, Blondeychck7, Bluemask, Bmeguru, Bobblewik, Bobet, Bobo192, Bogdangiusca, Bongwarrior, Boogster, Bookandcoffee, Booyabazooka, BorgHunter, Boris Allen, Borislav, Bovineone, BozMo, Br77rino, Braindamage3, BrentLeah, Brian G. Wilson, Brianjd, Brion VIBBER, Briséis, Brotown3, Brucevdk, Brunnock, Bryan Derksen, Bsadowski1, Bubba hotep, Buggi22, Bupsiij, Bwfrank, C. Lee, C.Fred, C.lettingaAV, C1932, CALR, CANUTELOOL2, CANUTELOOL3, CBOrgatrope, CRGreathouse, CSTAR, Cactus.man, Caesarjbsquitti, Caleb7693, Calltech, Cam275, Camembert, Can't sleep, clown will eat me, CanadianCaesar, CanadianLinuxUser, Canadianism, Candy-Panda, Canthusus, Cantras, Canyonsupreme, Cap601, CapitalSasha, Capitalist, Capricorn42, CaptainIron555, Captmog, Carcharoth, Careercornerstone, Cartiod, Cdt laurence, Cedars, Celendin, Centrx, CesarB, Cfrehr, Chalst, Chamal N, Chance Jeong, Charles Matthews, Charlielee111, CharlotteWebb, Charvest, Chas zzz brown, Chendy, Chickenclucker, Chief Heath, Chill doubt, Chingchangriceball, Chocoforfriends, Chocolatepizza, Chrislk02, Christian List, Christopher Parham, Christopher denman, ChristopherWillis, Chun-hian, Cjnm, Ckatz, Clawed, Claygate, CloudNine, Cmurphy au, Coastergeekperson04, Codycash33, Colonies Chris, Cometstyles, Commander Shepard, Connectonline, Conversion script, Correogsk, CraigDesjardins, Crazybobson, Cremepuff222, Cronholm144, CryptoDerk, Crystallina, Curps, Cursive, Cutler, Cvaneg, Cyp, CzarB, D, D6, DFRussia, DJTrickyM, DONZOR, DVD R W, DVokes, Da monster under your bed, DabMachine, Dabomb87, Damicatz, Dandy mandy, Daniel, Daniel C. Boyer, Daniel5127, DanielCD, Danielkwalsh, Danski14, Dark Load, DarkAudit, DarkFalls, Darkhero77, Darkmyst932, Daryl7569, Daven200520, David Little, David R. Ingham, David spector, DavidCBryant, DavidLevinson, Db099221, Dbenbenn, Dboerstl, Dbtfz, Dcljr, Death blaze, Deathiscomin90919, Demmy, Den fjättrade ankan, DennyColt, Deus Ex, Deverenn, Devonboy69, Dfrg.msc, Dgrant, Digby Tantrum, Diggyba, Dillydumdum, Dina, Diocles, Discospinster, Djfeldman, Dlohcierekim's sock, Dmharvey, Dmn, Dmytro, Dna-webmaster, Doktor Who, Dominus, Donarreiskoffer, Donhalcon, Donkeyboya, Doug Alford, Doug Bell, Dozing, Dr. Klim, Dragomiloff, Drumbeatsofeden, DryaUnda, Durova, Dylan Lake, Dysprosia, ERcheck, ESkog, Easton12, Ed g2s, Eduardoporcher, EdwardLockhart, Eeekster, Eequor, Eighty, Ekilfeather, El C, El3m3nt09, ElBenevolente, ElNuevoEinstein, Elapsed, Elektron, Elfguy, Eloquence, Elroch, Elvisfan2095, Emo kid you?, Emote, Emre D., Enormousdude, Enviroboy, Epbr123, EstebanF, Estoy Aquí, EugeneZelenko, Evanbrown326, Evercat, Everyking, Evil saltine, Excirial, Experiment123, FCYTravis, Fabiform, Falconleaf, Family400005, Faty148, Faustnh, Favonian, Fennec, FiP, Fibonacci, Finell, Fir0002, Firebirth, Floccinocin123, Flockmeal, Flowersofnight, FlyingToaster, Font, Fox, Fplay, Francs2000, Frank2710, Franklin.vp, Fredrik, FreplySpang, Frogjim, Func, Funwithbig, Fuzheado, Fuzzyhair2, FvdP, Fvw, G. Campbell, GHe, GPdB, GabrielAPetrie, Galoubet, Gamahucher, Gandalf61, Gareth Owen, Gareth Wyn, GarnetRChaney, Gary King, Garzo, Gawaxay, Geoffspear, Geoking66, Geologician, Geometry guy, Georg Muntingh, Gff, Ghyll, Giant toaster, Giftlite, Gimboid13, Gingerninja12, Gioto, Glen, Glenn, Go for it!, Golbez, Goldencako, Googl, GoonerDP, Gott wisst, GraemeL, Graham87, Greatal386, Green caterpillar, Greenjuice, Greenjuice3.0, Greenjuice4, Gregbard, Grg222, Grifter84, Grizzly, GrumpyTroll, Grunt, Gscshoyru, Guaka, Guanaco, Gubbubu, Gugtup, Guppy, Gurch, Gutza, Gzkn, Hadal, Hamletö, Har56, Harmil, Harryboyles, Hdante, Hdt83, HeadCase, Headforaheadeyeforaneye, HeikoEvermann, Heimstern, Helldude99, Hellonicole, Henry Delforn, HenryLi, Henrygb, Herve661, Hetar, Heyheyhey99, Hhjk, Honeyspots3121, HoodedMan, Hopper5, Hotmedal, Hrishikesh.24889, Hu, Huerlisi, Hut 8.5, Huzefahamid, Hwalee76, IIR, Iapetus, IceUnshattered, Iced Kola, Icey, Ichudov, Igoldste, Ilya (usurped), Ilya Voyager, Inferno, Lord of Penguins, Inkypaws, Inter, Iridescent, Isaac25, Ishap, Island, Isomorphic, Itsfrankie1221, Itsmine, Ivelnaps, Iwnbap, Ixfd64, J.delanoy, J0equ1nn, JDQuimby, JDoorjam, JDspeeder1, JForget, JIP, JJL, JLaTondre, JPD, JRM, JYolkowski, JackLumber, Jackol, Jacob Lundberg, Jacob.jose, Jacob501, Jagged 85, Jagginess, Jaichander, JakeVortex, JamesBWatson, Jamesontai, Jaranda, Jat99, Jatrius, Jazriel, Jcw69, Jeff G., Jeremybub, Jerry, Jersey Devil, Jester-Tester, Jet57, Jetsboy101, Jeyarathan, Jiddisch, Jitse Niesen, Jklin, Jmundo, Jni, JoanneB, JoeTrumpet, Joejill67, Johann Wolfgang, John Foley, John-Haggerty, John254, JohnBlackburne, JohnOwens, Johnbibby, Johnferrer, Jon Awbrey, Jon Cates, Jonathunder, Jonik, Jorend, Jorgen W, Jose77, Joseph Solis in Australia, Josh Parris, Joshua Boniface, Joshuagross, Joshurtree, Jossi, Jpark3591, Jpgordon, Jschwa1, Jtir, Jtkiefer, Juan Marquez, Juansempere, Jubeidono, Juliancolton, Jung

231

Article Sources and Contributors dalglish, Jusjih, Just James, Jwy, J•A•K, KConWiki, Kan8eDie, Kaobear, Karada, Karlscherer3, Kasparov, Katoa, Katzmik, Keegan, Keesiewonder, Kelly Martin, Kennethduncan, Kevin Baas, Kevlar992, Kiefer.Wolfowitz, Kieff, Kilo-Lima, Kilva, Kingturtle, Kipholbeck, Kku, Knowledge Seeker, KnowledgeOfSelf, Knuckles sonic8, Koyaanis Qatsi, KrazyKosbyKidz, Kreemy, Krellis, Ksucemfof, Kubigula, Kumioko, Kungfuadam, Kungming2, Kuru, Kwsn, Kz8, LA2, LC, Labongo, Lambiam, Latka, LavosBacons, Leafyplant, Lectonar, Lethe, LevenBoy, Lexor, Liface, Ligulem, Limaner, Linas, Little Mountain 5, Little guru, LittleDan, Loadmaster, Locos epraix, Lordthees, Lotje, Loudsox, Lowellian, Lumos3, Luna Santin, Lunch, Lupin, Lupo, Lyght, M a s, MC10, MCrawford, MER-C, MFH, MJSkia1, MKoltnow, MONGO, Madir, Maelor, Magister Mathematicae, Magnus Manske, Malhonen, Malo, Man vyi, MarSch, Marc van Leeuwen, MarcusVox, MarilynCP, Mark Krueger, Mark Renier, MarkS, Markjdb, Markus Krötzsch, Marlonbraga, MarsRover, Marysunshine, Masparasol, Masshaj, Masterjamie, Math hater, Matheor, Mathsmad, Matt Crypto, Matt1314, Mattguzy, Mattyboy500, Matusz, Matěj Grabovský, Maurice Carbonaro, Maurreen, Mav, Maximus Rex, Maxwahrhaftig, Mayumashu, McKay, McVities, Mccready, Mcmillin24, MeatJustice, Meb43, Melchoir, Melcombe, Memorymentor, Meno25, Merlissimo, Merovingian, Mets501, Meznaric, Mfishergt, Mgnbar, MhordeXsnipa, MiNombreDeGuerra, Mic, Michael C Price, Michael Hardy, Michael Slone, Michael Snow, Michael.Urban, Miguel, Mike Schwartz, Mike92591, Mikemoral, Mikez, Milogardner, Minesweeper, Mipadi, Miquonranger03, Misos, MisterSheik, Misza13, Mkns, Mlk, Mmmready, Mnemeson, Mo0, Mofeed.sawan, Mongreilf, Monkeynuts27, Moocow444, Moohahaha, Mosesroses, Mother69, Mpatel, Mr Stephen, Mr magnolias, Mr. Billion, MrFish, Msh210, Mspraveen, Muchness, Mudcower, Muffin, Muriel Gottrop, Musse-kloge, Muéro, Mviergujerghs89fhsdifds, Mvsmith, Mxn, My Cat inn, MyNamesLogan, Mycatiscool, Mysdaao, Mysidia, Nacrha, Nakon, Narxysus, Nastor, Natural Philosopher, NawlinWiki, Nczempin, Neko85, NerdyNSK, Netesq, Netoholic, Nev1, Nevada, NewEnglandYankee, Nickm4c, Nickmuller, Nigholith, Nikola Smolenski, NikolaiLobachevsky, Nilamdoc, Nirvana888, Nishkid64, Niteowlneils, Nitya Dharma, Nixdorf, Nlu, No Guru, Noahschultz, Nobs01, Nocklas, Node ue, Noetica, Nohat, Nomoneynotime, Noosfractal, Notheruser, Notinasnaid, Nov ialiste, Novacatz, Nscheffey, Ntmatter, Nukemason4, Numbo3, Ocolon, Ohnoitsjamie, Oleg Alexandrov, Olga Raskolnikova, Olivier, Omicronpersei8, Omnieiunium, Onefive15, Onestone, Oo7565, Ooswesthoesbes, Opabinia regalis, Optakeover, Orionix, Orz, Oskar Sigvardsson, OwenX, Oxymoron83, PDH, Page Up, Paolo.dL, Papep, Pascal.Tesson, Passw0rd, Paul August, Paul13, Pawl Kennedy, Pbroks13, Peak, Pentasyllabic, PeregrineAY, Petes2176, Pethan, Pgk, Phanu9000, Phoenix1177, PhotoBox, Phys, Piano non troppo, PierreAbbat, Piet Delport, Pikminiman, Pilotguy, Pinkadelica, Piolinfax, Pizza Puzzle, PizzaMargherita, Pjb14, Pjvpjv, Planb 89, Plastikspork, Pleasantville, Pm11189, Pmanderson, Point-set topologist, Politepunk, Polymerbringer, Polyvios, Poor Yorick, Pooryorick, Porton, Possum, Potatoscrub, Poweroid, Prateekrr, PrestonH, PrimeFan, Profvk, Protonk, Prozo3190, Pruneau, Python eggs, Qef, Qtoktok, Quadell, Quanticle, Quintessent, Quixotex, Quuxplusone, Qxz, RANDP, RC-0722, RJASE1, RK, RTFVerterra, RW Marloe, Rade Kutil, Raiden09, RainbowOfLight, Rajasekaran Deepak, Rajpaj, Ralesk, Ram-Man, Random account 47, Randomblue, Rdsmith4, RedWolf, Reedy, Remember the dot, Remi0o, Remus John Lupin, Requestion, Retiono Virginian, Rettetast, RexNL, ReyBrujo, Rgclegg, Riana, Rich Farmbrough, Richard Woods, RichardF, Richardcraig, Richfife, Rick Norwood, Riekuh, Ringleader1489, Rje, Rjwilmsi, Rl, Rmt2m, RobHar, RobertG, Robin S, Robinh, Robomonster, Roccorossi, Rock69, RockMFR, Roland Deschain, Romanm, Ronz, Rossami, Rotem Dan, Rotje66, RoyBoy, Royalguard11, Rrburke, Rubentimothy, RunOrDie, Rurik3, Ruud Koot, Ruwanraj, RyanCross, RyanEberhart, Ryeterrell, Ryulong, SFC9394, SMIE SMIE, Sacre, Saga City, Sakowski, Salix alba, Salsa Shark, Salt Yeung, Sam Hocevar, Sam Korn, Samlyn.josfyn, Samsara, SandyGeorgia, Sango123, Sannse, Sardonicone, Sarregouset, Savagepine, Schapel, Scheinwerfermann, Sean Kelly, Seba5618, Seberle, Secretlondon, Selfworm, Selket, Senator Palpatine, Sengkang, Sethdoe92, Sfisher, Shahab, Shanes, Shmm70, Short Verses, Sidasta, Silly rabbit, Silverfish, SimonMorgan, Simonkoldyk, Sir Nicholas de Mimsy-Porpington, Sir Vicious, Siroxo, Sketchmoose, Skizzik, Skullfission, Slac, Slayerteez, Sleeping123, SlightlyMad, Smack, Smeira, Smithpith, Smitz, Smkumar0, Smyth, Snailwalker, Snoyes, Snozzer, Soccermaster3112, Soliloquial, Soltras, Some P. Erson, Somebodyreallycool, Sp3000, Spacemonkey, Special-T, Spellchecker, Spencerallyn, Spiff, Spliffy, Spundun, SquidSK, Squiddy, Srikeit, Srinath555, Ssd, Staecker, Staffwaterboy, Staples, Stemonitis, Stephen B Streater, Stephen G. Brown, Stephen MUFC, Stephen j omalley, Stephenb, Sternkampf, Stevertigo, Storm Rider, Sundar, Supersmashballs123, Sverdrup, Swpb, Sylent, Symane, Syphertext, Systemlover, Sześćsetsześćdziesiątsześć, Sławomir Biały, T.M.M. Dowd, TAU710, THEN WHO WAS PHONE?, Tachyon01, TachyonP, TakuyaMurata, Tamillimat, Tangotango, Taop, Tarquin, Tarret, Tarten5, Taskualads, Tatarian, Tawker, Taylorhewitt, TeH nOmInAtOr, Tearlach, Teller33, TenOfAllTrades, Teorth, Terence, ThaddeusB, The Anome, The Catcher in The Rye D:, The Transhumanist, The Transhumanist (AWB), The wub, TheEmaciatedStilson, TheGerm, TheKMan, TheKoG, TheSeven, Thedudester, Thefutureschannel, Themantheman, Thenub314, Thesilverbail, Thingg, Thirty-seven, Thomas H. Larsen, Thomasmeeks, Thomaswgc, Tigershrike, Tim Retout, Tim1988, Timgregg96, Timir2, Timo3, Timothy Clemans, Timwi, Tiptoety, Titoxd, Tiyoringo, Tobby72, Toby Bartels, Tom harrison, Tomaxer, Tompw, TonyBallioni, Tonywalton, Tpbradbury, Traroth, Trd89, Tregoweth, Trehansiddharth, Trevor MacInnis, Triage, Triwbe, Triwikanto, Trovatore, Tseay11, Tsirel, Tuluat, Tungsten, Twospoonfuls, Tzurvah MeRabannan, Uarrin, Ucanlookitup, Ugur Basak, Ukabia, Ukexpat, Ulises Sarry, Unknown 987, Unlockitall, Unterdenlinden, Utcursch, Vancouverguy, Vanished User 0001, Vargenau, Vary, VegaDark, Versus22, Vesal, Vesta, Vgy7ujm, Vianello, VictorPorton, Vikvik, Vinsfan368, Vishi-vie, Vladimir m, Voicework, Volcom5347, Wafulz, Wakka, Walor, Warhawkhalo101, Washburnmav, Washington8785, Wavelength, Wayp123, Welshleprechaun, Wereon, West Brom 4ever, Where, White Cat, White wolf753, WhiteDragon, Wigren, Wiki alf, Wikibobspider, Wikiklrsc, Wikiwhat?, Wikiworkerindividual***, Wknight94, Wolfkeeper, Wolfrock, Wood Thrush, Woohookitty, Wootwootwoot, Wrathchild, Wulfric1, XJamRastafire, Xen 1986, Xenon54, Xiong Chiamiov, Xyzaxis, Yadar677, Yahel Guhan, Yakuzai, Yamamoto Ichiro, Yansa, Yarnalgo, YellowMonkey, Yhkhoo, Yobmod, Youssefsan, Yoyoyo9, Yurei-eggtart, Zachorious, Zero0000, Zeuron, Zhentmdfan, Zhurovai, Zhymkus, Zippy, Zodon, Zoltan808, Zondor, Zootsuits, Zoz, Zsynopsis, Zundark, Zzyzx11, Ævar Arnfjörð Bjarmason, Милан Јелисавчић, 1789 anonymous edits Median  Source: http://en.wikipedia.org/w/index.php?oldid=382029221  Contributors: 16@r, 4johnny, AGToth, Acebulf, Adam78, Adamjslund, Afa86, Airplaneman, Alejo2083, Almuayyad, AlphaEta, Ancheta Wis, Andrew c, Andrewpmk, Antandrus, Apanag, Arcadian, Art LaPella, Atlant, Avenged Eightfold, Avjoska, AxelBoldt, B4hand, Baccyak4H, Bfinn, Bhudson, Bjcairns, Blanchardb, Blindman shady, Bluap, Brain40, Brianjd, BrotherE, Bshort, Bth, Capricorn42, CharlotteWebb, Chire, Comic1, Connelly, Conversion script, Cryptor3, Cvaneg, Cybercobra, Cybersavior, DL5MDA, David Eppstein, Dcljr, Dcoetzee, Den fjättrade ankan, Dick Beldin, Dima373, Dirkbb, Discospinster, Donarreiskoffer, Dreadstar, Dricherby, Dswallen, Dtuinhof, Epbr123, Eric Kvaalen, Eseijo, Evil saltine, Explicit, FreplySpang, Fruggo, G716, GIScope, Gatewaycat, Giftlite, Glane23, Gomm, Graham5571, GreenGourd, Grossu, Hadleywickham, Haham hanuka, HamburgerRadio, Heimstern, Henrygb, Herbee, Hirak 99, Hottentot, Hut 8.5, IMpbt, Impdog, Incnis Mrsi, Insanity Incarnate, J Hill, J.delanoy, JForget, JackOL31, Janneok, Jason Quinn, Javawizard, Jesdisciple, Jfpierce, Jitse Niesen, Jklin, JodyB, Jose Ramos, JoshuaSchaeffer, Katalaveno, Kateshortforbob, Kiefer.Wolfowitz, Kpjas, Kuru, LGF1992UK, LeaveSleaves, Lirion, LivingFont, Lythanhphu, MER-C, MacMed, Maddie!, Male1979, Manop, Mapley, Marco Pellegrino, MarkSweep, Martarius, Melcombe, Mentifisto, Michael Hardy, Michaelas10, Minesweeper, Mishrasknehu, Mormegil, MrOllie, Nakon, Nascar1996, Ncmvocalist, Ndenison, NerdyPunk2ML, Nishkid64, Nixdorf, Noctibus, Obradovic Goran, Octahedron80, Oskar Sigvardsson, Pablo Alcayaga, Palnatoke, Pamri, Patrick, Paul August, Penguinnerd121, Pinethicket, Poultney02, Quantling, Qwfp, RL0919, Radiojon, RexNL, Rich Farmbrough, Roozbeh, RoseParks, RyanCross, SURIV, Sameeersingh, SanjivBhatia, Sfdan, Shannon1, Shmget, Signalhead, SlamDiego, Sligocki, SoSaysChappy, Sobreira, Steinsky, Stemonitis, Stpasha, Struway, Surfergrl813, Swaroopch, Sławomir Biały, TPK, Thatguyflint, The Thing That Should Not Be, Thomas Tvileren, Tomi, Triskell, Trusilver, U3002, Urhixidur, Vishnava, WeijiBaikeBianji, Well, girl, look at you!, Wetman, WikHead, WikiDao, WikipedianMarlith, Wildscop, XXJASHANXx, Zundark, Zvika, ^demon, 381 anonymous edits Mean  Source: http://en.wikipedia.org/w/index.php?oldid=386262603  Contributors: 02barryc, 123candy, 128.195.169.xxx, 165.123.179.xxx, 16@r, 212.153.190.xxx, 213.253.39.xxx, 7121989, A314268, APT, Adambro, Adamjslund, Ahruman, Aitias, Alan Liefting, Alansohn, Ale jrb, Alexwuv, Alksentrs, Allen3, Altenmann, Amirab, Ann Stouter, Anna Lincoln, Antandrus, Arakunem, Arcadian, Arithmonic, Army1987, Aruton, AubyJR., Avenue, AxelBoldt, B4hand, Babycruzrocks, Bagatelle, Banes, BarretBonden, Bart133, BenFrantzDale, Bequal, Berland, Bfigura, Bjcairns, Blehfu, BlueEditor, Bobble2, Bobo192, Bobrayner, Brossow, C'est moi, CMW275, COMPFUNK2, CRGreathouse, CWii, CardinalDan, Carmen56, Casper2k3, Catgut, Chickchick19, Cmacpher, Cody574, Connelly, Conversion script, Coolaaron88, Corecirculator, Cronholm144, CroydThoth, Crzrussian, Cubs Fan, Cybercobra, DVD R W, Dac04, Dandv, DarkFalls, Davidruben, DearPrudence, Delldot, Den fjättrade ankan, Dethme0w, Diegotorquemada, Dietcokelime2006, Discodontron, Discospinster, Dmcq, Donarreiskoffer, Dysepsion, ERcheck, ESkog, Ear121, Eddthegr8one, Ejy2007, El C, Epbr123, Eubulides, Falcon8765, Faradayplank, FastLizard4, Fieldday-sunday, Fireice, Footwarrior, Fortdj33, Freedomlinux, Fresheneesz, Funkystuff267, Funnybunny123, G716, GTBacchus, Gary King, Gene Nygaard, GeoWriter, Ghazer, Giftlite, Gogo Dodo, Graham87, Grant meads, Grim23, Groengras, Gul e, Gurch, Gwernol, Gyrofrog, Hadleywickham, HenningThielemann, Henrygb, Hithisishal, Hobartimus, Hongooi, Honker04, Hossein-amidi, Htim, HumbleGod, Hypnosifl, IRP, Icairns, Impdog, Infovarius, Into The Fray, Iridescent, J roc69, J.delanoy, Jdm, Jeffrey Mall, Jem11299, Jitse Niesen, JocK, Jok2000, Jonathan de Boyne Pollard, KDesk, Kaimbridge, Katalaveno, Kazvorpal, Kbdank71, Kevin Lamoreau, KinaseD, Kjetil1001, Kkbairi, Lapqmzlapq, Learner505, Leondz, LibLord, Linkracer, Loonymonkey, Luna Santin, MacMed, Maksim-e, MarkSweep, Marshall Williams2, Maxem, Mebiancame, Megaman en m, Melcombe, Mets501, Michael Hardy, Mikeblas, Mmortal03, Mpj17, MrOllie, Mschlindwein, Mwtoews, NawlinWiki, Nburden, Neonxing23, Neptune5000, Netalarm, Nwerneck, Octahedron80, Oleg Alexandrov, Oore, Orphan Wiki, Oxymoron83, Ozob, P Carn, ParticleMan, Partycows, Patrick, Paul August, Persian Poet Gal, PeterSymonds, Pharaoh of the Wizards, Philip Trueman, Philomathoholic, Piano non troppo, Pinethicket, Polar, Poulpy, Pseudomonas, Pt, Qwfp, R. S. Shaw, RJaguar3, RL0919, Rachel jean, Redvers, Riahmare815, Rich Farmbrough, Rjohnson92, Ronhjones, Salix alba, Sam Derbyshire, Samwb123, Sanfordn, Sanjiv swarup, Sbfw, Schzmo, Scohoust, Seba5618, Secretlondon, Semperf, Shirulashem, Shorelander, Silly rabbit, Siroxo, SkyWalker, Slamb, Slowking Man, Snowolf, Some poser, Sp3000, Spazure, Srleffler, Suffusion of Yellow, Sunderland06, Synchronism, Syrthiss, Sławomir Biały, TPK, TacomaZach, Tanaats, Tangerines, Taransingh63, Teklund, Temerster, Tero, Tex, The Rambling Man, The Red, The Thing That Should Not Be, Tide rolls, Tom-, Trevor MacInnis, Vcpandya, Victoriaplummer, Vrkunkel, Vsb, Whatfg, Willking1979, With goodness in mind, Woogee, Wyatt915, Xyb, Yono, Zvika, 535 anonymous edits Statistical population  Source: http://en.wikipedia.org/w/index.php?oldid=349804544  Contributors: Abanima, Aeon1006, Arodichevski, Avenue, BD2412, Boffob, CapitalR, Conversion script, Den fjättrade ankan, DerHexer, Dick Beldin, Dori, Giftlite, Graham87, HorsePunchKid, Irbobo, Jitse Niesen, Kembangraps, Kku, Lamro, Larry_Sanger, Latka, MCepek, Melcombe, Michael Hardy, Nbarth, Piotrus, Raul654, Ronz, Salix alba, Suisui, TakuyaMurata, Ugen64, Wildt, Wood Thrush, 39 anonymous edits Sampling (statistics)  Source: http://en.wikipedia.org/w/index.php?oldid=384869207  Contributors: 16@r, 2D, A8UDI, AbsolutDan, Addshore, Aegis Maelstrom, Ahoerstemeier, Aidan Croft, Alansohn, Andrewpmk, Andy M. Wang, Andycjp, Antandrus, Aremith, Arnold90, Avenue, BenRG, Betacommand, Betsypider, Bgeelhoed, Bigbermus, BinoChrist, BlaiseFEgan, Bobo192, Bongwarrior, Burner0718, CBM, Caferzorlu, Cdc, ChaosNil, Christian List, Correogsk, Cp111, Cr123, Crystal whacker, Ctacmo, Cutler, Dagterje, Dantheman531, Dbalson, Dcoetzee, Deadmanjones, Den fjättrade ankan, Diderot, Digitat, Doldrums, Dt128, E rulez, ECEstats, ESkog, EdH, Email4mobile, Epbr123, Eric Kvaalen, FamicomJL, Favonian, Felagund, Finlay McWalter, Fiskehaps, Francisco.brito, FuelWagon, G716, Gagi, Gail, Gaius Cornelius, GenericBob, Geometry guy, Giftlite, Gilliam, Gsaup, Gsociology, Haoie, Inkwina, J.delanoy, JD554, Jaytan, Jbartii, Jbenno, Jeff G., Jmariel2000, John Foley, Johnbibby, Johnkarp, Josh Parris, Josh3580, Joshuagross, Jtneill, Jtpickering, Kateen205, Katieh5584, Keilana, Kewp, Kiefer.Wolfowitz, Kingpin13, Kku, Krakenflies, Kungfuadam, Lexor, Lgallindo, Lokorin, MER-C, MPerel, Majj, MarkGallagher, Math Champion, Matt Crypto, MauriceJFox3, Maurreen, Maxamegalon2000, Mdanh2002, Medwardz, Melcombe, Meredithphd, Mhardingipi, Michael Hardy, Mikael Häggström, Minna Sora no Shita, Mirv, Mmernex, Modster, Mr Stephen, Mudomon, Mydogategodshat, Navidemami, NerdyScienceDude, Nyjuliet99, NymphadoraTonks, Orderud, OverlordQ, Oxymoron83, Patrick, Pavium, Pgan002, Pgrabarek, Physicistjedi, Pinkadelica, Piotrus, Possum, Professor01, Qutezuce, Qwfp, Reedy, RenaudDetry, Res2216firestar, Rich Farmbrough, RichardF, Rjwilmsi, Rlsheehan, Ronz, Rucheleh, Salix alba, Sam Hocevar, Schandi, Seahorseruler, Seaphoto, Secfan, Shadowjams, Shreevatsa, Smith609, Sotdjin, Stagalee, StefanosK, Straussian, SunCreator, Sxf1984, THF, TehBrandon, Tesi1700, The Rambling Man, Thelaststand3,

232

Article Sources and Contributors Thomasmeeks, Thunderboltz, Tide rolls, Tomi, Tripbeetle, Tuxedo junction, Utcursch, Vegaswikian, Vern Reisenleiter, Votemania, WikHead, Wmahan, Wragge, Yamamoto Ichiro, YellowPigNowNow, Zawersh, Zigger, Zymose, Zzuuzz, 551 anonymous edits Probability theory  Source: http://en.wikipedia.org/w/index.php?oldid=385552220  Contributors: 129.116.226.xxx, 1ForTheMoney, APH, Aastrup, Abce2, Ali Obeid, Andeggs, Anonymous Dissident, Arcfrk, Arjay369, Beland, Betterusername, Bjankuloski06en, Bjcairns, Bobo192, Boleslav Bobcik, Booniesyeo, Borislav, Bryan Derksen, Btyner, Calabraxthis, Capricorn42, Charles Matthews, ChicXulub, Christopher Connor, Conversion script, Coyets, Cretog8, Cyfal, Cyrillic, DARTH SIDIOUS 2, DarkAudit, David Eppstein, Dbtfz, Debresser, Den fjättrade ankan, Drizzd, DutchDevil, Dyaa, Dylan Lake, Dysprosia, El C, Elassint, Ensign beedrill, Fantastic4boy, Fastfission, Flammifer, Frankman, FutureNJGov, Gala.martin, Gbr3, Geometry guy, Giftlite, Gill110951, Givegains, Goatasaur, Goldfinger 93, Graham87, Gutsul, Hadal, Hayabusa future, Hirak 99, INic, Jason Patton, Jauhienij, JayJasper, Jheald, Johannes Hüsing, Jonik, Josang, Josephbrophy, KingTT, Knutux, KoyaanisQatsi, Krun, Kungfuadam, Kurtan, Lambiam, Larry_Sanger, Lee Daniel Crocker, Lenthe, Leroytirebiter, Lethe, Levineps, Liko81, MER-C, MH, Magmi, Malhonen, MathMartin, Maximaximax, Mayumashu, McSly, Mdd, Melcombe, Michael Hardy, Michael Slone, Miguel, MisterSheik, Msh210, Myasuda, Ncmathsadist, Nguyen Thanh Quang, Niteowlneils, Numbo3, Obradovic Goran, Oda Mari, Oleg Alexandrov, Omicronpersei8, PAR, PJTraill, Patrick, Paul August, Pax:Vobiscum, Pb30, Pinethicket, Porcher, Progicnet, Quiet photon, Qwfp, RJHall, RainbowOfLight, Raymond Meredith, Raymondwinn, RexNL, Rgclegg, Rich Farmbrough, Rjwilmsi, Robertetaylor, Roman V. Odaisky, Salix alba, Sceptre, ShaunMacPherson, SiobhanHansa, Sleeping123, Snielsen, Spebudmak, Spudbeach, Srinivasasha, Suruena, TMLutas, Tayste, The Anome, TheMandarin, Tiddly Tom, Tide rolls, Tizio, Tosha, Treisijs, Trovatore, Tsirel, Unionhawk, Urdutext, User27091, Utcursch, Vivacissamamente, Wavelength, Weialawaga, Wyatt915, Wynand.winterbach, Ynh, Zaharous, Zenohockey, Zundark, Zwilson, 225 anonymous edits Normal distribution  Source: http://en.wikipedia.org/w/index.php?oldid=386388912  Contributors: 0, 119, 194.203.111.xxx, 213.253.39.xxx, 5:40, A. Pichler, A.M.R., AaronSw, Abecedare, Abtweed98, Alektzin, Ali Obeid, AllanBz, Alpharigel, Amanjain, AndrewHowse, Anna Lincoln, Appoose, Aude, Aurimus, Awickert, AxelBoldt, Aydee, Aylex, Baccyak4H, Beetstra, BenFrantzDale, Bhockey10, Bidabadi, Bluemaster, Bo Jacoby, Boreas231, Boxplot, Br43402, Brock, Bryan Derksen, Bsilverthorn, Btyner, Bubba73, Burn, CBM, CRGreathouse, Calvin 1998, Can't sleep, clown will eat me, CapitalR, Cburnett, Cenarium, Charles Matthews, Charles Wolf, Chill doubt, Chris53516, ChrisHodgesUK, Christopher Parham, Ciphergoth, Coffee2theorems, ComputerPsych, Conversion script, Coolhandscot, Coppertwig, Coubure, Courcelles, Crescentnebula, Cruise, Cwkmail, Cybercobra, DFRussia, Damian Yerrick, DanSoper, Dannya222, Darwinek, David Haslam, DavidCBryant, Den fjättrade ankan, Denis.arnaud, Dima373, Dj thegreat, Doood1, Drilnoth, Drostie, Dudzcom, Dzordzm, EOBarnett, Eclecticos, Ed Poor, Edin1, EelkeSpaak, Egorre, Elektron, Elockid, Enochlau, Epbr123, Eric Kvaalen, Ericd, Evan Manning, Fang Aili, Fangz, Fergusq, Fgnievinski, Fibonacci, Fintor, Firelog, Fledylids, Fnielsen, Fresheneesz, G716, GB fan, Galastril, Gandrusz, Gary King, Gauravm1312, Gauss, Geekinajeep, Gex999, GibboEFC, Giftlite, Gil Gamesh, Gioto, GordontheGorgon, Gperjim, Graft, Graham87, Gunnar Larsson, Gzornenplatz, Gökhan, Habbie, Heimstern, Henrygb, HereToHelp, Heron, Hiihammuk, Hiiiiiiiiiiiiiiiiiiiii, Hu12, Hugo gasca aragon, Ian Pitchford, It Is Me Here, Ivan Štambuk, Iwaterpolo, J heisenberg, JaGa, JahJah, JanSuchy, Jason.yosinski, Jeff560, Jim.belk, Jitse Niesen, Jmlk17, Joebeone, Jorgenumata, Joris Gillis, Josephus78, Josuechan, Jpk, Jpsauro, Junkinbomb, KMcD, KP-Adhikari, Karl-Henner, Kaslanidi, Kay Dekker, Keilana, KipKnight, Kjtobo, Knutux, LOL, Lansey, Laurifer, Lee Daniel Crocker, Leon7, Lilac Soul, Livius3, Lixy, Loadmaster, Lpele, Lscharen, Lself, MATThematical, MIT Trekkie, Manticore, MarkSweep, Markus Krötzsch, Marlasdad, Mateoee, Mcorazao, Mdebets, Mebden, Meelar, Melcombe, Message From Xenu, Michael Hardy, Michael Zimmermann, Miguel, Millerdl, Mindmatrix, MisterSheik, Mkch, Mm 202, Morqueozwald, Mr Minchin, Mr. okinawa, MrOllie, MrZeebo, Mundhenk, Mwtoews, Mysteronald, Naddy, Nbarth, Nicholasink, Nicolas1981, Nilmerg, NoahDawg, Noe, Nolanbard, O18, Ohnoitsjamie, Ojigiri, Oleg Alexandrov, Oliphaunt, Olivier, Orderud, Ossiemanners, Owenozier, PAR, PGScooter, Pablomme, Pabristow, Paclopes, Patrick, Paul August, Paulpeeling, Pcody, Pdumon, Personman, Petri Krohn, Pfeldman, Pgan002, Pinethicket, Piotrus, Plantsurfer, Policron, Prodego, Prumpf, Ptrf, Qonnec, Quietbritishjim, Qwfp, R3m0t, RDBury, RHaworth, RSStockdale, Rabarberski, Rajah, Rajasekaran Deepak, Randomblue, Rbrwr, RexNL, Rich Farmbrough, Richwales, Rjwilmsi, Rmrfstar, Robbyjo, Romanski, Ronz, RxS, Ryguasu, SGBailey, SJP, Saintrain, SamuelTheGhost, Samwb123, Sander123, Schmock, Schwnj, Scohoust, Seidenstud, Seliopou, Seraphim, Sergey Suslov, SergioBruno66, Shabbychef, Shaww, Siddiganas, Sirex98, Snoyes, Somebody9973, Stan Lioubomoudrov, Stephenb, Stpasha, StradivariusTV, Sullivan.t.j, SusanLarson, Sverdrup, Svick, Taxman, Tdunning, TeaDrinker, The Anome, The Tetrast, TheSeven, Thekilluminati, TimBentley, Tomeasy, Tomi, Tommy2010, Trewin, Tristanreid, Trollderella, Troutinthemilk, Tryggvi bt, Tschwertner, Tstrobaugh, Unyoyega, Vakulgupta, Velocidex, Vhlafuente, Vijayarya, Vinodmp, Vrkaul, Waagh, Wakamex, Wavelength, Why Not A Duck, Wile E. Heresiarch, Wilke, Will Thimbleby, Willking1979, Wissons, Wwoods, XJamRastafire, Yoshigev, Zero0000, Zhurov, Zrenneh, Zundark, Zvika, 589 anonymous edits Standard deviation  Source: http://en.wikipedia.org/w/index.php?oldid=387215420  Contributors: 1exec1, AJR, Aberglaube, Abscissa, AbsolutDan, Abtin, Adamjslund, Addshore, Adi4094, Admissions, Aeriform, Afa86, Alansohn, Ale jrb, Alex.g, Alexandrov, Allessia67, Alvinwc, Amead, Amitch, Amorim Parga, Anameofmyveryown, Andre Engels, Andres, AndrewWTaylor, Andy Marchbanks, Anonymous Dissident, Anonymous editor, Anwar saadat, Arbitrarily0, Aroundthewayboy, Artichoker, Artorius, Asaba, Ashawley, Ashiabor, AugPi, AxelBoldt, Bart133, Bdesham, Beefyt, Beetstra, Behco, Beland, BenFrantzDale, BiT, Billgordon1099, Blehfu, Bo Jacoby, Bobo192, Bodnotbod, Brianga, Brutha, BryanG, Bsodmike, Btyner, Buchanan-Hermit, Bulgaroctonus, Butcheries, CJLL Wright, CRGreathouse, CSWarren, CWii, CYD, Calculator1000, CambridgeBayWeather, Captain-n00dle, Cathardic, Ceyockey, Charles Matthews, Chatfecter, Chillwithabong, Chris the speller, ChrisFontenot13, Chrism, Christopher Parham, Chrysrobyn, Ck lostsword, Clemwang, Cmichael, Coffee2theorems, Conversion script, Coppertwig, Corrigann, Crazy Boris with a red beard, Crisófilax, Cutler, DRHagen, DRTllbrg, DVD R W, DanielCD, Danielb613, Danski14, Dave6, DavidMcKenzie, Davidkazuhiro, Dcoetzee, Ddiazhn, Ddofborg, Ddr, Decayintodust, DeeDeeKerby, Dekuntz, Delldot, Den fjättrade ankan, DerHexer, Dhanya139, Dick Beldin, DieterVanUytvanck, Diomidis Spinellis, Dirkbb, Discospinster, Doctorambient, DomCleal, Dominus, Drappel, Dycedarg, Dylan Lake, Earth, Eb Oesch, Economist 2007, Egriffin, Elanb, Elaragirl, Elliskev, Emerah, Enigmaman, Epbr123, Eric Olson, Esrever, Eve Teschlemacher, Everyking, Falcon8765, Fatla00, Felixrising, Flamurai, Forlornturtle, Forty two, Frehley, Frencheigh, Fsiler, Furrykef, G.engelstein, G716, Gabbe, Gail, Gary King, Gatoclass, Gauge, Gauravm1312, Geneffects, George Drummond, Gerriet42, Giftlite, Gilliam, Gingemonkey, Giraffedata, Gjshisha, GlassCobra, Glen, Gogowitsch, Gouveia2, Graham87, Greentryst, Greg L, Gurch, Gvanrossum, Gyro Copter, Gzkn, Gökhan, H3llkn0wz, HaakonHjortland, Hadleywickham, Haham hanuka, Haizum, HalfShadow, Harmil, HatlessAtlas, Hawaiian717, Hede2000, Heezy, Helix84, Henrygb, Hgberman, Hgrenbor, Hu12, Hut 8.5, Iccaldwell, Imagine Reason, Inomyabcs, Intangir, Iridescent, Isaac Dupree, Isis, Isomorphic, IustinPop, JForget, JNW, JRBrown, Jacob grace, JadeInOz, Jake04961, Jamamala, Jamned, Janderk, Jcw69, Jean15paul, Jeremy68, Jeremykemp, Jfitzg, Jim.belk, Jmoorhouse, Jni, Joerite, John Newbury, John11235813, JohnCD, Jratt, Jts10101, Justanyone, Justinep, KJS77, Kainaw, Kbolino, Kelvie, Khunglongcon, Kingpin13, Kingturtle, Kiril Simeonovski, Kjtobo, Knkw, Knutux, Krinkle, Kungfuadam, Kuratowski's Ghost, Kuru, Kvng, Kyle824, LGW3, Lambiam, Larry_Sanger, Ldm, Learning4ever, LeaveSleaves, Legare, Lethe, LewisWasGenius, Lgauthie, Lilac Soul, LizardJr8, Loodog, Lucasgw8, Luna Santin, LysolPionex, M2Ys4U, M360 Real, MBisanz, MCepek, MER-C, MONGO, Madir, Madoka, MagneticFlux, Magnus Bakken, Malo, Mapley, Marcos, MarkSweep, Markhebner, Markkawika, Markpravda, Marokwitz, Matthew Yeager, Mbloore, Mbweissman, McKay, Mcorson, Melchoir, Melcombe, Mercuryeagle, Mets501, Mhinckley, Miaow Miaow, Michael Hardy, Miguel.mateo, Mike Rosoft, Mimithebrain, MisterSheik, Mjg3456789, Mjrice, Mjroyster, Moeron, Mollerup, Mooli, MrOllie, Ms2ger, Msm, Mud4t, Munckin, Murb:, Mwtoews, NHRHS2010, Nakon, Nathandean, NawlinWiki, Nbarth, Nectarflowed, NeilRickards, Neudachnik, Neutrality, Ngoddard, Niallharkin, Nigholith, Noe, Nonagonal Spider, Normy rox, NorwegianBlue, Novalis, O18, Octahedron80, Ocvailes, Oleg Alexandrov, Omegatron, Omicronpersei8, Oxymoron83, P toolan, P.Silveira, PMHauge, Pak21, Pakaran, Paul August, Peter ryan 1976, Pharaoh of the Wizards, PhilKnight, Philip Trueman, Pickledweller, Pinethicket, Piotrus, Poochy, Ppardal, Psb777, Pstanton, Psychlohexane, Publius3, Qwfp, RDBury, RJaguar3, RadioKirk, Ranjithsutari, RayAYang, Razorflame, Rednblu, Reedy, Rettetast, Revipm, Rich Farmbrough, Richard001, Rickyfrizzlebum, Rompe, Rose Garden, Rseay267, Ryk, Salix alba, Sam Korn, Sameer r, SamuelTheGhost, Sander123, Savidan, Savie Kumara, SebastianHelm, Seresin, Shanes, Shaun ward, Sietse Snel, Skbkekas, SkerHawx, Skittle, Slakr, Slowking Man, Someguy1221, Sonicblade128, SpeedyGonsales, Speedyboy, Spinality, Spliffy, Sstoneb, Stemonitis, Stepheng3, Stevvers, StewartMH, Storkk, Stpasha, Suffusion of Yellow, Surachit, Susurrus, Swcurran, THEN WHO WAS PHONE?, Takanoha, Taxman, Tayste, TedE, Tempodivalse, Thadius856, The Thing That Should Not Be, The sock that should not be, Thingg, ThomasNichols, ThomasStrohmann, Thr4wn, Tide rolls, Titoxd, Tlroche, Tom harrison, Tomi, Tompa, Tosayit, Tpbradbury, TradingBands, Triwbe, Urdutext, Useight, Vaughan Pratt, Verbum Veritas, Versus22, Vice regent, VictorAnyakin, VladimirReshetnikov, Voyagerfan5761, Waggers, Warniats, Wavelength, Wikipe-tan, Wikipelli, Wildingd, William Avery, Winchelsea, Wmahan, Wolfkeeper, Wonglkd, Woood, Wykypydya, X-Fi6, Yachtsman1, Yamaguchi先生, Yochai Twitto, Zafiroblue05, Zenkat, Zhieaanm, Zigger, Zvika, ‫کشرز‬, 1353 anonymous edits Random variable  Source: http://en.wikipedia.org/w/index.php?oldid=384035501  Contributors: Aearluin, AlanUS, Albert Rosado, Albmont, Alfredo J. Herrera Lago, Algebraist, AllanBz, Andrew Maiman, Andyjsmith, Anonymous Dissident, ArnoldReinhold, AxelBoldt, Belizefan, BenFrantzDale, Bewrocrat, Bjcairns, Brian Tvedt, BrokenSegue, Bryan Derksen, CaseInPoint, Ccerer, Constructive editor, Conversion script, Courcelles, Creidieki, Damodarnb, Danielx, Davewho2, Dbtfz, Demonocracy, Dick Beldin, Discospinster, DrEricH, Dysepsion, Dysprosia, Error792, FF2010, Fangz, Flammifer, Flavio Guitian, Frencheigh, Fresheneesz, G716, Giftlite, Glane23, Googl, Graham87, Gulbrand, Haham hanuka, Hede2000, Helgus, Hiken86, Illykai, JamesBWatson, Jason Goldstick, Jheald, Jitse Niesen, Jk350, Jmath666, Justin W Smith, Karada, Kbodouhi, Keegan, Keithalewis, Kiril Simeonovski, Kjs50, Kku, Kurtan, LOL, Lambiam, Loodog, Lova Falk, LoveMonkey, Marc van Leeuwen, MarkS, Markaci, Marner, Mathemajor, Maxim Razin, Maximus Rex, Mcorazao, Melcombe, Memming, Metacomet, Michael Hardy, Miguel, MisterSheik, Mitchoner, Mmernex, Moberg, Mothmolevna, Msh210, Nageh, Ncmathsadist, Numbo3, O18, Oksala, Oleg Alexandrov, Oruaann, Orz, Owenozier, Oxymoron83, P64, Patrick, Paul August, Paul Pogonyshev, Pax:Vobiscum, Peleg, Phill, Pintu 052, Pooryorick, Pstudier, Qwfp, Rdsmith4, Relativefrequency, Rich Farmbrough, Seberle, Shawnc, Shoefly, Sl, Solian en, Stevertigo, Stpasha, Svick, TedDunning, Thehotelambush, Thelittlestspoon, Tomek81, Topology Expert, Tsirel, VMS Mosaic, Waldir, Yintan, Yworo, Zimbie, Zundark, Zzxterry, 176 anonymous edits Probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385275015  Contributors: (:Julien:), 198.144.199.xxx, 3mta3, A.M.R., A5, Abhinav316, AbsolutDan, Adrokin, Alansohn, Alexius08, Ap, Applepiein, Avenue, AxelBoldt, BD2412, Baccyak4H, Bfigura's puppy, Bhoola Pakistani, Bkkbrad, Bryan Derksen, Btyner, Calvin 1998, Caramdir, Cburnett, Chirlu, Chris the speller, Classical geographer, Closedmouth, Conversion script, Courcelles, Damian Yerrick, Davhorn, David Eppstein, David Vose, DavidCBryant, Dcljr, Delldot, Den fjättrade ankan, Dick Beldin, Digisus, Dino, Domminico, Dysprosia, Eliezg, Emijrp, Epbr123, Eric Kvaalen, Fintor, Firelog, Fnielsen, G716, Gaius Cornelius, Gala.martin, Gandalf61, Gate2quality, Giftlite, Gjnyasa, GoodDamon, Graham87, Hu12, ImperfectlyInformed, It Is Me Here, Iwaterpolo, J.delanoy, JRSpriggs, Jan eissfeldt, JayJasper, Jclemens, Jipumarino, Jitse Niesen, Jon Awbrey, Josuechan, Jsd115, Jsnx, Jtkiefer, Knutux, Larryisgood, LiDaobing, Lilac Soul, Lollerskates, Lotje, Loupeter, MGriebe, MarkSweep, Markhebner, Marner, Megaloxantha, Melcombe, Mental Blank, Michael Hardy, Miguel, MisterSheik, Morton.lin, MrOllie, Napzilla, Nbarth, Noodle snacks, NuclearWarfare, O18, OdedSchramm, Ojigiri, OverInsured, Oxymoron83, PAR, Pabristow, Patrick, Paul August, Pax:Vobiscum, Pgan002, Phys, Ponnu, Poor Yorick, Populus, Ptrf, Quietbritishjim, Qwfp, Riceplaytexas, Rich Farmbrough, Richard D. LeCour, Rinconsoleao, Roger.simmons, Rursus, Salgueiro, Salix alba, Samois98, Sandym, Schmock, Seglea, Serguei S. Dukachev, ShaunES, Shizhao, Silly rabbit, SiobhanHansa, Sky Attacker, Statlearn, Stpasha, TNARasslin, TakuyaMurata, Tarotcards, Tayste, Techman224, Thamelry, The Anome, The Thing That Should Not Be, TheCoffee, Tomi, Topology Expert, Tordek ar, Tsirel, Ttony21, Unyoyega, Uvainio, VictorAnyakin, Whosasking, Whosyourjudas, X-Bert, Zundark, 218 anonymous edits

233

Article Sources and Contributors Real number  Source: http://en.wikipedia.org/w/index.php?oldid=387056239  Contributors: 345Kai, AbcXyz, Acct001, Acetic Acid, Addshore, AgentPeppermint, Ahoerstemeier, Aitias, Aizenr, Akanemoto, Alai, AlexBedard, Aliotra, Amalas, Andre Engels, Andres, Andrewrost3241981, Angielaj, AnnaFrance, Anonymous56789, Antonio Lopez, Arichnad, Arthur Rubin, AstroNomer, Avaya1, AxelBoldt, AzaToth, Bagatelle, Balster neb, BenB4, Bertik, Bobo192, Boemanneke, Borgx, Brian0918, Brion VIBBER, Bryan Derksen, CONFIQ, CRGreathouse, Carwil, Catherineyronwode, Charles Gaudette, Charles Matthews, Chinju, Chris Roy, Christian List, Conversion script, CorvetteZ51, Curps, Cyan, CynicalMe, DVdm, DYLAN LENNON, Damian Yerrick, Debresser, Demmy100, Den fjättrade ankan, DerHexer, Digger3000, Dmharvey, Dmmaus, DonAByrd, Doradus, Doshell, Dysprosia, Długosz, Eddideigel, Egarres, Eiyuu Kou, Ejrh, El C, Elizabeyth, Elroch, Equendil, Eric119, Euandrew, FilipeS, FocalPoint, Franklin.vp, Fredrik, Freezercake4d4, Fresheneesz, Fropuff, Frungi, Future Perfect at Sunrise, Gaius Cornelius, Galoubet, Gemini1980, Gene Ward Smith, Gesslein, Giftlite, Goodnightmush, Grafen, Graham87, Grover cleveland, Hans Adler, Heartyact, Helenginn, Herbee, Hmains, Ian Maxwell, Ideyal, Immunize, Isnow, Iulianu, IvanDurak, J.delanoy, J04n, Jaberwocky6669, JackSchmidt, Jagged 85, JamesMazur22, Jcrocker, Jerzy, Jiddisch, Jitse Niesen, Joeblakesley, Josh Cherry, Josh Parris, Jrtayloriv, Jshadias, Jumbuck, Jusdafax, Karch, Klemen Kocjancic, Koeplinger, Leks81, Lethe, Linas, LizardJr8, Lockeownzj00, LongAgedUser, Loodog, MC10, MPerel, Macrakis, Mani1, Marek69, Markcollinsx, Marquez, Masgatotkaca, Mejor Los Indios, Michael Hardy, Michael Keenan, Miguel, MikeHobday, Miles, Modernist, Motomuku, Mr Death, Ms2ger, Msh210, Myahmyah, Mygerardromance, N Shar, Nabla, NawlinWiki, Nbarth, Newone, Niking87, Nil Einne, Nk, No Guru, Nono64, Northumbrian, Notinasnaid, Nowhither, Oleg Alexandrov, Omtay38, Oxymoron83, Panoramix, Patrick, Paul August, Paxsimius, Pcap, Pdcook, Peql, Peterhi, PhotoBox, Piano non troppo, Pierre de Lyon, Pinethicket, Pizza Puzzle, Pizza1512, Platonicglove, Pmanderson, Pomte, Poochy, Populus, Puddleglum Marshwiggle, Quaeler, Qwfp, R.e.b., R3m0t, Raja Hussain, Randomblue, RaseaC, Rasmus Faber, Renfield, Rich Farmbrough, Rmrfstar, Romanm, Rph3742, Salix alba, Sam Hocevar, Sapphic, Scepia, Sesu Prime, Sfmammamia, Siddhant, Sjakkalle, Skizzik, Slowking Man, Smithpith, Sorrywikipedia1117, SpeedyGonsales, Spur, Stephanwehner, Stevenj, Stevertigo, Stewartadcock, SuperMidget, Symane, T00h00, TGothier, Taejo, TakuyaMurata, Tarquin, Tarret, TenPoint, Tero, The Thing That Should Not Be, Tide rolls, Tkuvho, Tobby72, Tobias Bergemann, Toby, Toby Bartels, Tosha, Tparameter, Trovatore, Tubalubalu, Tubby23, Tweenk, VKokielov, Varlaam, WAREL, Wimt, Wolfrock, WpZurp, Wshun, X42bn6, XJamRastafire, Xantharius, Yarnalgo, Zero sharp, Zundark, Ævar Arnfjörð Bjarmason, 386 anonymous edits Variance  Source: http://en.wikipedia.org/w/index.php?oldid=386749529  Contributors: 16@r, 212.153.190.xxx, 28bytes, ABCD, Aastrup, Abramjackson, AbsolutDan, Accretivehealth, Adamjslund, Adonijahowns, Adpete, Afa86, Ahoerstemeier, Alai, Albmont, Alex756, AmiDaniel, Amir Aliev, Anameofmyveryown, Andre.holzner, Angela, Animum, AntiVMan, Anuphysicsguy, As530, Auntof6, Awickert, Baccyak4H, Bart133, BenFrantzDale, Blotwell, Bmju, Bobo The Ninja, Borgx, Brandon Moore, Brian Sayrs, Bryan Derksen, Brzak, Btyner, CanDo, Casey Abell, Cazort, Centrx, Cfp, Cgsguy2, Compassghost, Conversion script, Coppertwig, Cremepuff222, Cruise, Cryptomatt, Cumulonix, Cybercobra, DRE, DavidCBryant, Davwillev, Dcljr, Dearleighton, Den fjättrade ankan, Diophantus, Disavian, DoctorW, Docu, Double Blind, Duncharris, Dylan Lake, Ehrenkater, Elgreengeeto, Emrahertr, EnJx, Eric-Wester, Eric.nickel, Eykanal, Fibonacci, Foam bubble, G716, Gap, Garamatt, Giftlite, Gjshisha, Glimz, Graft, Guanaco, Gurch, Gzkn, Hao2lian, Happy-melon, Hede2000, Het, Hgberman, Ht686rg90, Hu12, Hulk1986, I am not a dog, Inezz40, Inter16, Isaac Dupree, J.delanoy, JackSchmidt, Jackzhp, Jessemv, Jfessler, Jheiv, Jmath666, Joepwijers, Johnny Au, Josh Cherry, Jt68, Juha, JulesEllis, Junkinbomb, Justin W Smith, Jutta, Katzmik, Keenan Pepper, Keilana, Kiril Simeonovski, Kstarsinic, Kurykh, Kymacpherson, LOL, Lambiam, Larry_Sanger, LeaW, Lilac Soul, Madprog, Mandarax, Marek69, MarkSweep, Matthew.daniels, Maxí, Mbloore, McKay, Mebden, Mejor Los Indios, Melcombe, Mgreenbe, Michael Hardy, Michel M Verstraete, Mjg3456789, MrOllie, Msanford, Mwilde, Mwtoews, Natalie Erin, Nbarth, Nevillerichards, Nicogla, Nijdam, Notedgrant, O18, Oleg Alexandrov, Orphan Wiki, Ottawa4ever, Paresnah, Patrick, Paul Pogonyshev, PerfectStorm, Pgan002, Phantomsteve, Phoenix00017, Pichote, PimRijkee, Piotrus, Pmanderson, Pokipsy76, Psychlohexane, Qwfp, Ranger2006, Rbj, Rich Farmbrough, RobertCoop, RobinK, Robinh, Romanski, SD5, Salix alba, Sanchom, SchfiftyThree, SereneStorm, Shoeofdeath, Shreevatsa, SimonP, Sinverso, Sirnumberguy, Skbkekas, Sligocki, Spinality, Spoon!, Stpasha, TedPavlic, The Thing That Should Not Be, Thermochap, Thesilverbail, Thomag, Tide rolls, Tilo1111, Tim Starling, Tomi, TomyDuby, Unamofa, Unyoyega, Vaughan Pratt, Voidxor, Waldir, Wikomidia, William Graham, Wmahan, WordsOnLitmusPaper, Wykypydya, Yamamoto Ichiro, Zippanova, Zirconscot, Zundark, Zven, Борис Пряха, 408 anonymous edits Probability density function  Source: http://en.wikipedia.org/w/index.php?oldid=386782326  Contributors: 3mta3, A. Pichler, Autopilot, AxelBoldt, BlaiseFEgan, CBM, Chowbok, ChrisIsBelow, Ciphers, Classicalecon, Complexica, Compvis, Cyp, Damiano.varagnolo, Dino, Dirk gently, Disavian, ENRGO, Fangz, Fnielsen, Freshraisin, Gcm, Giftlite, Henrygb, Icek, Inike, Jalal0, Jayen466, Jeff G., John G. Miles, Jost Riedel, Jovan, Khoikhoi, Kku, Kn4sbs, KudzuVine, LOL, Lankiveil, Letournp, LiDaobing, Loodog, Markhebner, MassimoAr, Mattopia, Meduz, Melcombe, Michael Hardy, Miguel, MisterSheik, Mmernex, Ohanian, Oleg Alexandrov, Paul Pogonyshev, Pgan002, Piotrus, Qwfp, Rferreirapt, Rich Farmbrough, Rimoll, Rotem Dan, Ruthiebabes, Ryguasu, Salgueiro, Sjoosse, Skippy le Grand Gourou, Srleffler, Stpasha, Tac-Tics, TakuyaMurata, Tayste, Tercer, The Anome, Theodds, ThorinMuglindir, Tiaanvangraan, Tiles, Tpb, Tsirel, Velocidex, Winterfors, Wmahan, Xanthius, ‫لیقع فشاک‬, 144 anonymous edits Cumulative distribution function  Source: http://en.wikipedia.org/w/index.php?oldid=382709149  Contributors: 129.186.205.xxx, Adoniscik, Aeusoes1, Aitias, Ap, AxelBoldt, Bertrus, Betaeleven, Bunyk, Casey1138, Cburnett, Constructive editor, Conversion script, Cretog8, David Haslam, Davwillev, Dick Beldin, Duncanka12, DylanW, Earlh, Elitropia, Flavio Guitian, Fresheneesz, Gerbrant, Giftlite, Graham87, GregorB, Hede2000, Hu12, HyDeckar, Inike, Internetkid2006, Jeffq, Jeppesn, Jitse Niesen, Jmsteele, KHamsun, Kwamikagami, LOL, Larry_Sanger, Lese, LiDaobing, Ling.Nut, Llorenzi, Loodog, Marqueed, Max Duchess, Melcombe, Michael Hardy, Miguel, MisterSheik, Neelix, Nickj, O18, Obradovic Goran, Oleg Alexandrov, OttoA, Paul Pogonyshev, Phuzion, Qwfp, R.J.Oosterbaan, RajeevA, Rumping, SMesser, Sannse, Screwpassenger, Shaww, Sluzzelin, Splash, Spoon!, Sullivan.t.j, TakuyaMurata, Tedunning, Tiagofassoni, Toby, User A1, Vlcb, Wmahan, X-Bert, Zach1994, Zundark, ‫ينام‬, 100 anonymous edits Expected value  Source: http://en.wikipedia.org/w/index.php?oldid=387174758  Contributors: 65.197.2.xxx, A. Pichler, Aaronchall, Adamdad, Albmont, Almwi, AxelBoldt, B7582, Banus, Bdesham, BenFrantzDale, Bjcairns, Brews ohare, Brockert, Bth, Btyner, CKCortez, Caesura, Calbaer, Caramdir, Carbuncle, Cburnett, Centrx, Charles Matthews, Chris the speller, Cloudguitar, Coffee2theorems, Conversion script, Cretog8, Dartelaar, Daryl Williams, DavidCBryant, Dpv, Draco flavus, Drpaule, El C, Elliotreed, Fibonacci, FilipeS, Fintor, Fresheneesz, Funandtrvl, Gala.martin, Gary King, Giftlite, Glass Sword, GraemeL, Grafen, Grapetonix, Greghm, Grubber, Guanaco, H2g2bob, HenningThielemann, Hyperbola, INic, Iakov, Idunno271828, Ikelos, Jabowery, Jancikotuc, Jcmo, Jitse Niesen, Jj137, Jordsan, Jrincayc, Jsondow, Jt68, KMcD, Karol Langner, Katzmik, Kazabubu, Kurykh, LALess, LOL, Lee Daniel Crocker, Leighliu, Levineps, Lponeil, MHoerich, MarSch, Markhebner, Mccready, Melchoir, Melcombe, Mgreenbe, Michael Hardy, Mindbuilder, Minimac, MrOllie, Netheril96, NinjaCharlie, O18, Obradovic Goran, Oleg Alexandrov, Openlander, Ossiemanners, PAR, Patrick, Percy Snoodle, Pgreenfinch, Phdb, PierreAbbat, Pol098, Poor Yorick, Populus, Puckly, Q4444q, Qwfp, R3m0t, Reetep, Reric, Rjwilmsi, RobHar, Robinh, Romanempire, Ronald King, Rray, Ryguasu, Saebjorn, Salix alba, Schmock, SebastianHelm, Shredderyin, Shreevatsa, Skarl the Drummer, Steve Kroon, Steven J. Anderson, Stpasha, Tarotcards, Tarquin, Taxman, TedPavlic, Tejastheory, The Bad Boy 3584, TheObtuseAngleOfDoom, Tide rolls, Tobi Kellner, Tomi, Troy112233, Tsirel, Unfree, Unyoyega, Varuag doos, Viesta, Werner.van.belle, Wmahan, Yesitsapril, Zero0000, ZeroOne, Zojj, ZomBGolth, Zvika, 207 anonymous edits Discrete probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=377893105  Contributors: (:Julien:), Alan smithee, Algebraist, AxelBoldt, Billinghurst, Bjcairns, Bob.v.R, CRGreathouse, Classicalecon, Closedmouth, Conversion script, Dreadstar, Dues Ex Machina, G716, Gary King, Giftlite, Hammerite, Incnis Mrsi, Jamelan, Kurykh, Linas, Melcombe, Michael Hardy, MisterSheik, Nabla, NawlinWiki, Novosyolov, Oleg Alexandrov, P64, Ptmc2112, Qwfp, Rich Farmbrough, RoseParks, Rumping, Salix alba, TakuyaMurata, The enemies of god, Trevor MacInnis, Zalle, Zundark, 26 anonymous edits Continuous probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=384004327  Contributors: 1ForTheMoney, Avenue, AxelBoldt, Conversion script, Coppertwig, Edison, Filemon, It Is Me Here, Jamelan, LOL, Larry_Sanger, MarkSweep, Melcombe, Michael Hardy, Minesweeper.007, MisterSheik, Patrick, Paul August, Pjrobertson, Psolrzan, Qwfp, Radagast83, Reedy, Rhetth, Ricardogpn, Ruinia, Rumping, Sevilledade, Stijn Vermeeren, Stpasha, Ulner, 29 anonymous edits Probability mass function  Source: http://en.wikipedia.org/w/index.php?oldid=376291855  Contributors: Bjcairns, Booyabazooka, Brunton, Casper2k3, CesarB, Eraserhead1, Giftlite, Incnis Mrsi, J.delanoy, Jiejunkong, Jitse Niesen, Jj137, LOL, LimoWreck, Melcombe, Memming, Michael Hardy, MisterSheik, Noyder, Oleg Alexandrov, Pi.C.Noizecehx, Qwfp, Rama, Salgueiro, Silly rabbit, The Anome, Typofier, Zundark, 60 anonymous edits Continuous function  Source: http://en.wikipedia.org/w/index.php?oldid=386068785  Contributors: 213.253.39.xxx, ABCD, AdamSmithee, Aetheling, Ams80, Andywall, Ap, Army1987, Arthena, Ashted, AxelBoldt, Bdmy, BenKovitz, Bethnim, Bloodshedder, CRGreathouse, Charles Matthews, Cheeser1, Cic, Conversion script, Ctmt, D.M. from Ukraine, Dallashan, Darth Panda, Dcoetzee, DomenicDenicola, Domitori, Dr.K., Dysprosia, EdC, Edemaine, Error792, Evilchicken1234, Fabartus, Felix Wiemann, Fgnievinski, Fiedorow, Fresheneesz, Giftlite, Glenn, Gombang, Graham87, Grinevitski, Gthb, Harriv, Henry Delforn, Hqb, HyDeckar, Hyacinth, Iameukarya, Ian Pitchford, Igiffin, Igrant, Intangir, Isomorphic, Iulianu, Jacj, JahJah, Jim.belk, Jimp, Jitse Niesen, Joseaperez, Jrtayloriv, Jshadias, K-UNIT, Katzmik, Klutzy, Kompik, LachlanA, Lambiam, Larryisgood, Lee Larson, Leoremy, Linas, Lupin, MC10, MSGJ, Markus Krötzsch, MathMartin, Mdd, Michael Hardy, Mikez, Monkey 32606, Mormegil, Mplourde, Msh210, Musicpvm, NawlinWiki, Nbarth, Oleg Alexandrov, PV=nRT, Paul August, Pdn, Penumbra2000, Pillcrow, Pizza Puzzle, QYV, Qz, RDBury, Ramzzhakim, Rbb l181, Rhetth, Rick Norwood, Rinconsoleao, Roman3, Sabbut, Salgueiro, Sapphic, Sbacle, Schneelocke, Seb35, Sligocki, Smmurphy, Splarka, Stan Lioubomoudrov, Stca74, Stevenj, StradivariusTV, Sullivan.t.j, Svick, T00h00, TedPavlic, Template namespace initialisation script, Thehotelambush, Thenub314, Thierry Caro, Tiagofassoni, Timhoooey, Tkuvho, Tlevine, Tobias Bergemann, Toby, Tosha, Tuxedo junction, Ulipaul, Ultramarine, Wolfrock, Wshun, Xantharius, Yacht, Youandme, Zoicon5, Zundark, ZyMOS, 127 anonymous edits Measure (mathematics)  Source: http://en.wikipedia.org/w/index.php?oldid=383839920  Contributors: 16@r, 3mta3, ABCD, AiusEpsi, Akulo, Alansohn, Albmont, AleHitch, Andre Engels, Arvinder.virk, Ashigabou, AxelBoldt, Baaaaaaar, Bdmy, Beaumont, BenFrantzDale, Benandorsqueaks, Bgpaulus, Boobahmad101, Boplin, Brian Tvedt, BrianS36, CRGreathouse, CSTAR, Caesura, Cdamama, Charles Matthews, Charvest, Conversion script, Danielbojczuk, Daniele.tampieri, Dark Charles, Dave Ordinary, DealPete, Digby Tantrum, Dino, Discospinster, Dowjgyta, Dpv, Dysprosia, EIFY, Edokter, Elwikipedista, Empty Buffer, Everyking, Fibonacci, Finell, Foxjwill, Gabbe, Gadykozma, Gar37bic, Gauge, Geevee, Geometry guy, Giftlite, Gilliam, Googl, Harriv, Henning Makholm, Hesam7, Irvin83, Isnow, Iwnbap, Jay Gatsby, Jheald, Jorgen W, Joriki, Juliancolton, Jóna Þórunn, Keenanpepper, Kiefer.Wolfowitz, Lambiam, Le Docteur, Lethe, Levineps, Linas, Loisel, Loren Rosen, Lupin, MABadger, MER-C, Manop, MarSch, Markjoseph125, Masterpiece2000, Mat cross, MathKnight, MathMartin, Matthew Auger, Mebden, Melcombe, Michael Hardy, Miguel, Mike Segal, Mimihitam, Mousomer, MrRage, Msh210, Nbarth, Obradovic Goran, Oleg Alexandrov, OverlordQ, Patrick, Paul August, PaulTanenbaum, Pdenapo, PhotoBox, Pmanderson, Point-set topologist, Prumpf, Ptrf, RMcGuigan, Rat144, RayAYang, Revolver, Rgdboer, Richard L. Peterson, Rktect, Salgueiro, Salix alba, SchfiftyThree, Semistablesystem, Stca74, Sullivan.t.j, Sverdrup, Sławomir Biały, TakuyaMurata, Takwan, The Infidel, The Thing That Should Not Be, Thehotelambush, Thomasmeeks, Tobias Bergemann,

234

Article Sources and Contributors Toby, Toby Bartels, Tosha, Tsirel, Turms, Uranographer, Vivacissamamente, Weialawaga, Xantharius, Zero sharp, Zundark, Zvika, 134 anonymous edits Bias of an estimator  Source: http://en.wikipedia.org/w/index.php?oldid=377588775  Contributors: Aaron Kauppi, AgentRew, Ahauptfleisch, Aniboy2000, Barichd, Belchman, BenFrantzDale, Bo Jacoby, C45207, Cancan101, Dcoetzee, Ecov, Farshidforouz, Gauravm1312, Giftlite, Hongooi, Jheald, Kiefer.Wolfowitz, Landroni, Marmelad, Melcombe, Michael Hardy, Mikael Häggström, Mmernex, Nbarth, Netopir, O18, Pharaoh of the Wizards, Piil, Q0k, Qwfp, Reenus, Schomerus, Sergey shandar, Shadiakiki1986, Sohanz, Spoon!, Stpasha, ThatProf, Thr4wn, Tybruce, Uraza, Wikomidia, Willsmith, Zvika, 63 anonymous edits Probability  Source: http://en.wikipedia.org/w/index.php?oldid=387105618  Contributors: 21655, APH, Abby, Abby1019, AbsolutDan, Acerperi, Acroterion, Aitias, Aka042, Alansohn, Alberg15, Alexjohnc3, Aliyah4499, Altenmann, Amalthea, Andeggs, AndrewHowse, Antandrus, Antonwalter, Ap, Arakunem, Arcfrk, Arenarax, Arjun01, ArnoLagrange, Avenue, BRUTE, Badgernet, Beaumont, Bfinn, Bhound89, Bjcairns, Bobblewik, Bobo192, Braddodson, Brendo4, Brianjd, Brumski, Bryan Derksen, Btball, Buttonius, CBM, CO, CSTAR, Cactus.man, Caltas, CanisRufus, Capitalist, Capitan Obvio, Capricorn42, Captmog, Carricko, Ceannaideachd, Cenarium, Centrx, Charles Matthews, CharlotteWebb, Chas zzz brown, Chetan.Panchal, Ciphers, Classical geographer, Clausen, Clovis Sangrail, Connormah, Conversion script, Coppertwig, Craphouse, CrazyChemGuy, Cremepuff222, Cyclone49, D, DEMcAdams, DJ Clayworth, Dabomb87, Danno12345, DarkFalls, DaveBrondsema, David Martland, David from Downunder, Dbtfz, Debator of mathematics, Dekisugi, Demicx, Demnevanni, Desteg, Dhammapal, Dirtytedd, Discospinster, Disneycat, DopefishJustin, Doug Bell, Drestros power, Drivi86, Drmies, Dysprosia, ESkog, Ebsith, Edgar181, Ehheh, El Caro, Eliotwiki, Enchanter, Eog1916, Epbr123, Ettrig, Evercat, Excirial, Fangz, Fantastic4boy, Fastilysock, Favonian, Fetchcomms, FishSpeaker, Flammifer, Footballfan190, FrF, FrankSanMiguel, Fred Bauder, Free Software Knight, FreplySpang, G716, Gail, Garion96, Giftlite, Giggy, GoldenPi, Googie man, Graham87, Grstain, Guess Who, Gwernol, Hadal, Haduong, Hagedis, Happy-melon, Hasanbay, Hasihfiadhfoiahsio, Henrygb, Heron, Hirak 99, Hoomank, Hu12, Hut 8.5, II MusLiM HyBRiD II, INic, Ideyal, Ignacio Icke, Infarom, Instinct, Ixfd64, J.delanoy, JJL, JTN, Ja 62, Jacek Kendysz, Jackollie, Jake Wartenberg, JamesTeterenko, Jaysweet, Jeff G., Jeffw57, Jheald, Jimmaths, Jitse Niesen, Jj137, Jmlk17, Jni, John Vandenberg, Johnleemk, Johnuniq, Jonik, JosephCampisi, Jpbowen, Jung dalglish, Jwpurple, KG6YKN, Kaisershatner, Kaksag, Kbodouhi, Kevmus, King Mir, Kingpin13, Klapper, Koyaanis Qatsi, Krantz2, Kurtan, Kushalneo, Kzollman, Lambiam, Larklight, Learnhead, Lee J Haywood, Lenoxus, Levineps, LiDaobing, Liang9993, Lifung, Lipedia, Lit-sci, Localhost00, Looxix, LoveMonkey, Lugnuts, MER-C, Mabsjenbu123, Mac Davis, Mario777Zelda, MarkSweep, Markjoseph125, Marquez, MathMartin, Matthew Auger, Mattisse, Maximaximax, McSly, Mebden, Melcombe, Menthaxpiperita, Metagraph, Mets501, Michael Hardy, Mikemoral, Mild Bill Hiccup, Mindmatrix, Minesweeper, MisterSheik, Mlpkr, Mortein, MrOllie, Msh210, Myasuda, Mycroft80, NYKevin, NatusRoma, NawlinWiki, Ncmvocalist, NewEnglandYankee, Nigholith, Nijdam, Noctibus, NoisyJinx, Nsaa, Ogai, Omicronpersei8, Onore Baka Sama, OwenX, Oxymoron83, Packersfannn101, Paine Ellsworth, PaperTruths, Patrick, Paul August, Paulcd2000, Pax:Vobiscum, Pd THOR, Pdn, Peter.C, Peterjhlee, PhilKnight, Philip Trueman, Philippe, Philtr, Pinethicket, Pointless.FF59F5C9, Progicnet, Psyche825, Puchiko, Putgeminmouth, QmunkE, Qwertyus, Qwfp, RVS, RabinZhao, Randomblue, RandorXeus, Ranger2006, RattleMan, Razorflame, Readro, Recentchanges, Reddi, Reedy, Regancy42, Requestion, RexNL, Richard001, Richardajohns, Riotrocket8676, Rogimoto, Ronhjones, Ronz, Rtc, RuM, Sagittarian Milky Way, Salix alba, Santa Sangre, Scfencer, SchfiftyThree, Schwnj, Sengkang, Sevilledade, ShawnAGaddy, Shoeofdeath, Sina2, SiobhanHansa, Sluzzelin, Snoyes, Solipsist, Someguy1221, SonOfNothing, Srinivasasha, Stephen Compall, Stevenmitchell, Stux, Suicidalhamster, Suisui, SusanLesch, Swpb, Sycthos, Symane, Takeda, Tarheel95, Tautologist, Taxisfolder, Tayste, The Thing That Should Not Be, The Transhumanist, TheGreenCarrot, Thesoxlost, Thingg, Tide rolls, TigerShark, Tintenfischlein, Treisijs, Trovatore, Twisted86, Uncle Dick, UnitedStatesian, Valodim, Vandal B, Vanished User 1004, Varnesavant, VasilievVV, Velho, Vericuester, Vicarious, Virgilian, Vivacissamamente, Voyagerfan5761, Wafulz, Wapcaplet, Wetman, Wikistudent 1, Wile E. Heresiarch, William915, Wimt, Wmahan, Wordsmith, Wormdoggy, Wxlfsr, Wyatts, XKL, Yamakiri, Ybbor, Yerpo, Youandme, YourEyesOnly, Zach1994, Zalle, Zundark, ‫ןמיירפ‬, 701 anonymous edits Pierre-Simon Laplace  Source: http://en.wikipedia.org/w/index.php?oldid=385035312  Contributors: 16@r, 213.253.39.xxx, 3mta3, 5 albert square, Ac1201, Adam McMaster, Ahoerstemeier, Ajb, Alfio, Amicon, Aminrahimian, Andre Engels, Andres, Angela, AnonMoos, Arcadia616, Asperal, Asyndeton, AtticusX, Attilios, AugPi, Avicennasis, Bachrach44, Bemoeial, Ben-Zin, Bender235, BerndGehrmann, Bkonrad, Blueboy814, Bracodbk, Bsskchaitanya, Bubba73, C.Fred, Can't sleep, clown will eat me, Caroldermoid, Charles Matthews, Chicheley, Chris Hardy, Chris the speller, ChrisfromHouston, Corystight1, Courcelles, Cozy, CrocodileMile, Curps, Cutler, Cyan, D6, DJ Clayworth, Dadude3320, Dchristle, Deb, Den fjättrade ankan, Dispersion, Doctorsundar, Docu, Dv82matt, ERcheck, Eeekster, Electron9, Ellywa, Elsweyn, Elysnoss, Emerson7, Eric Kvaalen, Everyking, Francis Schonken, Gaara144, Gadfium, Gauss, Geni, GeoGreg, Giftlite, Gliese876, Gmaxwell, Goochelaar, GraemeL, Graham87, GregorB, Haham hanuka, Hannoscholtz, HappyApple, Hektor, Hemmingsen, Hongooi, Hqb, Husond, Indiedude, J.delanoy, J04n, JASpencer, Jaerik, Jamesmorrison, Jaredwf, Jaytan, Jmu2108, Johan1298, John, Johnbibby, Jojit fb, Joseph Solis in Australia, Jugbo, Julesd, Jumbuck, Jusdafax, Knutux, Kostisl, Kraxler, LarryB55, Lexor, LilHelpa, Lova Falk, Lradrama, Lucidish, Lunarian, Lupo, Lzur, Mackensen, Maestlin, Maghnus, Manop, Marcus2, Markus Poessel, MartinHarper, Mashford, Metacomet, Metasquares, Michael Hardy, Mike Rosoft, Mild Bill Hiccup, Mion, Mitteldorf, Mneideng, Monegasque, Mpatel, Mschlindwein, NBeale, NeueWelt, Neutrality, New World Man, Nicolaennio, Nixdorf, Nk, Oleg Alexandrov, Olivier, Paine Ellsworth, Palnot, Paolo.dL, Paul August, PaulGarner, PdDemeter, Piniricc65, Pizza1512, Plucas58, Pmanderson, Pohick2, Pointqwert, Postdlf, Pred, Promus Kaa, Psients, Ptranouez, Punstar, QueenAdelaide, Quess, QuiteUnusual, Qwfp, RJHall, RS1900, Randomblue, Rbj, Rdanneskjold, Renatops, Riisikuppi, Robma, Rory096, Rwv37, SMStigler, Sadi Carnot, Sam Hocevar, Samuel, Santa Sangre, Schlier22, SchuminWeb, ScienceApologist, SevereTireDamage, SimonTrew, SlamDiego, Snoyes, StephenFerg, Stpasha, Studerby, Stwalkerster, Sublium, TangoTheory, Tarotcards, TedE, Terry0051, The Thing That Should Not Be, Themerejoy, Tiddly Tom, Tomas e, Tomixdf, Tpbradbury, Tt 225, Uksam88, Unara, Urhixidur, Utcursch, UtilityIsKing, Vojvodaen, Vsmith, WolfmanSF, XJamRastafire, XM, Zoicon5, 409 anonymous edits Integral  Source: http://en.wikipedia.org/w/index.php?oldid=387177827  Contributors: 129.174.59.xxx, 4C, 6birc, A bit iffy, Ace Frahm, Acegikmo1, Admiral Norton, Adrruiz, Ais523, Alansohn, Aleksander.adamowski, Alexius08, Alsandro, Aly89, AnOddName, Andrei Stroe, Andrew Moylan, Andrewcmcardle, Anonymous Dissident, Antillarum, Apokrif, Arcfrk, ArnoldReinhold, Arunirde, AxelBoldt, Azuredu, [email protected], BarretBonden, Bdesham, Bdmy, Beefman, Bemoeial, BenFrantzDale, Benjaminwill, Benzi455, Berria, Bethnim, Bnitin, Bo Jacoby, Bobo192, Bomac, Borb, Boreas231, Bovineone, Brews ohare, Brufydsy, Bsmntbombdood, Butko, CSTAR, CalebNoble, Calltech, Caltas, Calvin 1998, Capefeather, Caramdir, Cardamon, Cassini83, Cat2020, Catgut, Centrx, Cflm001, Chait1027, Charles Matthews, Chetvorno, Chinju, Ciko, Closedmouth, Conversion script, Crakkpot, Cronholm144, DHN, Da nuke, DabMachine, Daryl Williams, Davewild, DavidCBryant, Dchristle, Den fjättrade ankan, Diberri, Discospinster, Diza, Djradon, Doctormatt, Dojarca, Drdonzi, Dugwiki, Dysprosia, E2eamon, Eagleal, Edward, Edward Knave, Einsteins37, ElTchanggo, Electron9, Emaalt, Emet truth, Enochlau, Epbr123, Espressobongo, Evil saltine, Favonian, Ferengi, Filemon, FilipeS, Fintor, FractalFusion, FrankTobia, Franklin.vp, Fredrik, Frokor, Fulvius, Fyyer, Gadykozma, Gandalf61, Garo, Gary King, Geometry guy, Gesslein, Giftlite, Glaurung, Glenn, Gmcastil, Gnixon, Goethean, Greg Stevens, HairyFotr, Hajhouse, Hakeem.gadi, Hakkasberra, Hal0920, Herald747, Heron, Hotstreets, Hrafeiro, Icairns, Igny, Inter, Introareforcommonpublic, Iridescent, Iulianu, Ivan Štambuk, JB Gnome, JForget, JKBlock, JRSpriggs, Jagged 85, JakeVortex, Jakob.scholbach, Jalesh, Jdlambert, Jfgrcar, Jim.belk, Jitse Niesen, Johnlemartirao, JonMcLoone, JonezyKiDx, Jose77, Josh dos, Jugander, Jynus, KSmrq, Kapoor Amit, Karada, Karimjb, Karol Langner, Katzmik, Kawautar, Keegan, Kendelarosa5357, Kevin Baas, KieferSkunk, Kiensvay, King Bee, Kingpin13, Kirbytime, Knakts, Kntrabssi, Kumioko, Kurykh, Kusunose, Kwantus, Kyle1278, LOL, Lambiam, Leland McInnes, Lethe, Levi.vaieua, LiDaobing, Light current, Lightdarkness, Lindberg G Williams Jr, Lir, Loisel, Loom91, Luís Felipe Braga, MC10, MJBurrage, MONGO, Madmath789, Marek69, MarkSweep, Matqkks, Matsoftware, Maxvcore, Mcld, Mcorazao, Melchoir, Mets501, MiNombreDeGuerra, Michael Hardy, MightyBig, Mike2vil, Mindmatrix, Minestrone Soup, Miquonranger03, Momusufan, Mormegil, MrOllie, Ms2ger, Mtz1010, MuZemike, Mìthrandir, Nbarth, Nikai, Nitya Dharma, Nnedass, NonvocalScream, Obeattie, Oleg Alexandrov, OrgasGirl, Ourhomeplanet, Ozob, PMG, Paolo.dL, Patrick, Paul August, Paul Matthews, Paxsimius, Pcb21, PhilipMW, Phillip J, PhySusie, Physicistjedi, PiMaster3, Pie4all88, Plasticup, Point-set topologist, Pooryorick, Programmar, Python eggs, Quuxplusone, RJFJR, Raamin, Radomir, Rama's Arrow, Randomblue, Raven4x4x, Razorflame, RedWordSmith, Rich Farmbrough, Rjwilmsi, Rklawton, Robbyjo, Roboquant, Rracecarr, Rubybrian, SJP, Saforrest, Salix alba, Satori42, Schneelocke, Scott MacLean, Sdornan, Seresin, Shadowjams, Shipmaster, Showgun45, Siddhant, Silly rabbit, Simon-in-sagamihara, Slakr, Sligocki, Smithpith, Snigbrook, Someguy1221, Spellcast, Splashkid2, Ssd, StaticGull, Stca74, Stevertigo, Stlrams22, StradivariusTV, Stw, Sushant gupta, Susurrus, Sławomir Biały, TJRC, TStein, Tabletop, Tbsmith, Template namespace initialisation script, The Anome, Thegeneralguy, Thenub314, Thiseye, Tjdw, Tobby72, Tobias Bergemann, Tomyumgoong, Topology Expert, TroyBurm, Twp, Ukexpat, Urdutext, VKokielov, Van helsing, Vanished User 0001, Velvetron, Viames, Vladislav Pogorelov, Waabu, Waltpohl, Wik, Wile E. Heresiarch, Wilsonater23456, Wimt, WinterSpw, Witchinghour, Wknight94, Wood Thrush, Wordsoup, Wtshymanski, Xantharius, Yacht, Ybbor, Yosha, Youandme, Yuyudevil, Zfr, Zoicon5, ZooFari, Zundark, 461 anonymous edits Function (mathematics)  Source: http://en.wikipedia.org/w/index.php?oldid=387220321  Contributors: 21655, ABCD, Aatomic1, Aazn, AbsolutDan, Ac44ck, Adam majewski, Adi4094, Ae-a, Agüeybaná, Aksi great, Al.locke, Aleph4, Alex43223, [email protected], Alexius08, Ali Obeid, Altenmann, Ams80, Andre Engels, Andreas Kaufmann, Andres, Andy.melnikov, Angela, AnonGuy, Anonymous Dissident, Anthony Kull, Arammozuob, Arcfrk, Army1987, Artem Karimov, Arthur Rubin, Asdfqwe123, Autonova, AvicAWB, Avraham, AxelBoldt, Ayda D, Bidabadi, Bigoperm, Bo Jacoby, Boute, Bradgib, Brainyiscool, Brianjd, BridgeBuilderKiwi, CBM, CBM2, CRGreathouse, Calayodhen, Can't sleep, clown will eat me, Carl.bunderson, Cenarium, Cetinsert, Ceyockey, Charles Matthews, Chas zzz brown, Cheese Sandwich, Chridd, Christian List, Clark Kimberling, Classicalecon, Cmathio, Cnyrock, Crucis, Cs32en, Cybercobra, DARTH SIDIOUS 2, Danakil, Darkreason, Daven200520, David Eppstein, David Gerard, David Shear, David spector, Dcoetzee, DefLog, Dfass, Digby Tantrum, Dino, Dmcq, Dominus, Domitori, Donarreiskoffer, Dpr, Drmies, DuaneLAnderson, Dylan Lake, Dysprosia, ELDRAS, East of Borschov, El C, Elizabeyth, Enviroboy, Epbr123, Equendil, EugeneZelenko, Excirial, Falcon8765, Fastily, Favonian, Foxjwill, Frankenpuppy, Fredrik, Fresheneesz, Furrykef, Fæ, GTBacchus, Gantlord, Gary King, Gauge, Gene s, Geometry guy, Gesslein, Giftlite, Glenn, Gogo Dodo, Gregbard, H.ehsaan, Henrygb, Heryu, Hmains, Hu, Hxxvxxy, Hydrogen Iodide, I-20, I69U, IGraph, Iainspeed, Ilya Voyager, Imjustmatthew, Immunize, Indil, Indon, Iner22, Inimino, Iulianu, J heisenberg, J00tel, Jacj, Jackol, Jagged 85, Jcobb, Jeff3000, Jerry teps, Jiddisch, Jimp, Jitse Niesen, Jojhutton, Jomomaindahouse, Jon Awbrey, Jorge Stolfi, Josh Parris, Jrtayloriv, Juliancolton, Jumbuck, Jusdafax, Justin W Smith, Jvohn, KSmrq, Kablammo, Katalaveno, Ken Kuniyuki, Keta, Kevs, Kierstend97, Kilbad, Kku, Kusunose, L Kensington, Lambiam, LarryLACa, Laurens-af, LeaveSleaves, Leibniz, Lightmouse, LinuxDude, LizardJr8, Lobas, Logicist, Lucky13pjn, Lunisneko, MFH, MagnaMopus, MarSch, Marc Venot, Marcos, Marcos (usurped), Marek69, MathMartin, Matijap, MattGiuca, Merovingian, Mets501, Michael Hardy, Mindmatrix, Mintleaf, Misza13, Mjhsrocker, Mjhy0926, Mkawick, Mor, Mormegil, Mousomer, MrOllie, MrRadioGuy, Msh210, Mufka, Natalie Erin, NawlinWiki, Newbyguesses, Nguyễn Hữu Dung, Nikai, Noisy, Ntmatter, Oleg Alexandrov, Onevalefan, Orphan Wiki, Ouzel Ring, OverlordQ, Oxymoron83, Ozob, Pak21, Palica, Palnot, Paolo.dL, Patrick, Paul August, PaulTanenbaum, Pcap, Peterhi, Philip Trueman, Phils, PhotoBox, Piano non troppo, Pj.de.bruin, Pooryorick, Populus, Porcher, Possum, Pruneau, Quaeler, R'n'B, RG2, Radon210, Ramu50, Randall Holmes, Rasmus Faber, Reach Out to the Truth, RedWolf, Reinderien, Renegadeshark, RexNL, Rich Farmbrough, Rick Norwood, Ronhjones, Rossami, Rousearts, Rrburke, Ruud Koot, Ryan Reich, Salix alba, Sam Staton, Sampayu, Sbandrews, Schapel, SchfiftyThree, Senehas, Shades78, Shd, Silvaskull, Sirhanx2, SixWingedSeraph, Sligocki, Someguy1221, Sonett72, SpeedyGonsales, Splash, Spoon!, Sprachpfleger, Stefano85, Stephen Shaw, Steven Russell, Stevertigo, Sverdrup, Symane, Sławomir Biały, THEN WHO WAS PHONE?, TakuyaMurata, Tarif Ezaz, Taxman, Template namespace initialisation script, The editor1, TheDJ, TheNightFly, Thehotelambush, Thenub314, Thomasmeeks, Tide rolls, Tilla, Tkuvho, Tobias Bergemann, Toby Bartels, Tooto, Tosha, Tparameter, Trixx, Turgidson, Twsx, Tyrol5, Urdutext, UserDoe, VKokielov, Vagary, Vary, Velho, Versus22, Vivacissamamente,

235

Article Sources and Contributors Vriullop, Wafulz, Wavelength, Waxex, Weixifan, Wild one, Winekeke, Wolfrock, Woodstone, Wshun, Wvbailey, Yacht, Yamakiri, Yamamoto Ichiro, Yoshigev, Zayzya, Zfr, Zimbardo Cookie Experiment, Zundark, Zy26, Zzuuzz, 543 anonymous edits Calculus  Source: http://en.wikipedia.org/w/index.php?oldid=386904227  Contributors: 01001, 07fan, 129.132.2.xxx, 14chuck7, 1exec1, 207.77.174.xxx, 24.44.206.xxx, 4.21.52.xxx, 4twenty42o, 64.252.67.xxx, 6birc, ABCD, APH, Aaronbrick, Abcdwxyzsh, Abmax, Abrech, AbsolutDan, Accident4, Ace Frahm, Acepectif, Acroterion, Adamantios, Ahoerstemeier, Ahy1, Akrabbim, Aktsu, Alansohn, AlexiusHoratius, Ali, Allen Moore, Allen3, Allen4names, Alpha Beta Epsilon, Alpha Omicron, AltGrendel, AmeliaElizabeth, AnOddName, AndHab, Andonic, Andorphin, Andre Engels, Andrewlp1991, Andrewpmk, AndyZ, Angela, Angr, Animum, Antandrus, Antonio Lopez, Ap, Appropo, Arcfrk, Arno, Arthur Rubin, Arthursimms, Asjafari, Astropithicus, Asyndeton, Atallcostsky, Aurumvorax, AustinKnight, Avenue, Awh, AxelBoldt, B, BOARshevik, Badagnani, Ballz4kidz, Barneca, Baronnet, Batmanand, Bazookacidal, Bcherkas, Bcrowell, Beerad34, Bellas118, BenB4, Berek, Berndt, Bethnim, Bethpage89, Bevo, Bfesser, Bgpaulus, BiT, Billymac00, Binary TSO, Bingbong555, Bkell, Bkessler23, Black Falcon, Black Kite, Blahdeeblaj, Blainster, BlueDevil, Bmk, Bobblewik, Bobo192, Bogey97, Bonadea, Bongwarrior, Bookmaker, Bookmarks4life, Boznia, Brian Everlasting, Brianga, Brion VIBBER, BryanHolland, Bsroiaadn, Buckner 1986, Buillon sexycat800, Burris, C S, C quest000, CART fan, CBM, CDutcher, CIreland, CL8, CSTAR, Cabalamat, Cabhan, Caesar1313, Calculuschicken, Callmebill, Calqulus, Caltas, Calton, Calvin 1998, Camw, Can't sleep, clown will eat me, CanadianLinuxUser, Cap'n Refsmmat, Capricorn42, Carasora, CardinalDan, CarlBBoyer, Carso32, Castro92, Catgut, CathySc, Cdthedude, Cenarium, Cessator, Cfarsi3, Charles Matthews, Cheeser1, Chibitoaster, Choster, Christofurio, Chriszim, Chun-hian, Ckatz, Cmarkides, Coldsquid, Commander Keane, CommonModeNoise, Comrademikhail, Conversion script, Courcelles, Courtneylynn45, CptCutLess, Cronholm144, Crotalus horridus, Css2002, Cthompson, Cymon, DARTH SIDIOUS 2, DHN, DMacks, DVdm, Da Gingerbread Man, Damian Yerrick, Damicatz, Daniel Arteaga, Daniel Hughes 88, Daniel J. Leivick, Daniel Quinlan, Daniel5127, DanielDeibler, Daniele.tampieri, Dannery4, Danski14, Darth Panda, Daryl Williams, Daven200520, Davewild, David Newton, DavidCBryant, Daxfire, Db099221, Dbach, DeadEyeArrow, Debator of mathematics, Deeptrivia, Dekisugi, Delbert Grady, DerHexer, Dferg, Diddlefart, Diginity, Diletante, Dimimimon7, Dionyziz, Discospinster, Diverman, Dmharvey, Doctormatt, Dominus, Domthedude001, Dontwerryaboutit, DopefishJustin, DragonflySixtyseven, Drdonzi, DreamGuy, Drilnoth, Drywallandoswald, Dtgm, Dullfig, Dyknowsore, Dysepsion, Dysprosia, EJF, EdH, Edcolins, Edmoil, Eduardoporcher, Edward, Edward321, Egil, Egmontaz, Einsteins37, Eisnel, Ekotkie, El C, Elementaro, Eliyak, Elkman, Eloquence, Email4mobile, Emann74, Emily Jensen, Emmett, Empty Buffer, Epbr123, Escape Orbit, Espressobongo, Estel, Everyking, Evil saltine, Excirial, Existentialistcowboy, Eyu100, Faithlessthewonderboy, Falcorian, Farquaadhnchmn, Favonian, Feezo, Feinstein, Fephisto, Fetchcomms, Fiedorow, FilipeS, Filippowiki, Finell, Fintler, Fixthatspelling, Flex, Flutefreek, Foobar333, Footballfan190, Four Dog Night, Fowler&fowler, Foxtrotman, Frazzydee, Freakinadrian, Fredrik, FrozenPurpleCube, Frymaster, Furrykef, Fuzzform, G.W., G026r, GT5162, Gabriel Kielland, Gabrielleitao, Gadfium, Gaelen S., Gaff, Gaius Cornelius, Gaopeng, Gene Ward Smith, Genius101, Geoking66, Geometry guy, Gesslein, Giftlite, Gilliam, Glane23, Gnat, Goeagles4321, Gofeel, Gogo Dodo, Golezan, Goocanfly, Goodwisher, Googl, Gop 24fan, Gracenotes, Graham87, Grokmoo, Groovybill, Groundling, Gscshoyru, Guanaco, Guiltyspark, Gurchzilla, Guy M, Gwernol, Gwguffey, Habhab38, Hadal, Hajhouse, Hannes Eder, Hanse, Haonhien, Harryboyles, Hawkhkylax26, Hawthorn, Hdt83, Headbomb, Headhold, Hebrides, Heimstern, Helios Entity 2, Helix84, Helvetius, Heron, Hesacon, Hgetnet, High Elf, Hike395, Hippasus, HolIgor, Homestarmy, Hotstreets, Htim, Hut 8.5, Hydrogen Iodide, IDX, II MusLiM HyBRiD II, Icrosson, Ictlogist, Idealitem, Ideyal, Ieremias, If I Am Blocked I Will Cry, Igiffin, Ike9898, Ikiroid, Ilikepie2221, Imjustmatthew, Infinity0, Infrogmation, Inquisitus, Insanity Incarnate, Interrobang², Ioscius, Iosef, Irish Souffle, IronGargoyle, Ironman104, IslandHopper973, Izzy007, [email protected], J.delanoy, JDPhD, JForget, JFreeman, JJL, JTB01, JWillFiji, JaGa, Jacek Kendysz, Jackbaird, Jacob Nathaniel Richardson, Jacobolus, Jagged 85, JaimenKing, Jak86, Jake Wartenberg, James, James086, Jan1nad, Jandjabel, Jason Lynn Carter, Jasongallagher, Jay.perna, Jclemens, Jeff3000, JeffPla, Jengirl1988, JensenDied, Jenssss, Jersey Devil, Jfiling, Jfilloy, JimR, JimVC3, Jimothy 46, Jimp, JinJian, Jitse Niesen, Jj137, Jjacobsmeyer, Jman9 91, John Kershaw, John254, Johnnybfat, Joodeak, Joseph Solis in Australia, Joshuac333, Jpo, Junglecat, Justinep, Jwpurple, Jxg, Jyril, Jóna Þórunn, KRS, Kai Hillmann, Kamrama, Karl Dickman, Katanaofdoom, Katzmik, Kbdank71, Kemiv, Ken Kuniyuki, Kesac, Ketsuekigata, Killdevil, Killfire72, Koavf, Kocher2006, Koyos, Kragen, KrakatoaKatie, Krich, Kristinadam, Kubigula, Kukooo, Kuru, L Kensington, L33tweasley, LLcopp, Lambiam, Le coq d'or, LeaveSleaves, Leszek Jańczuk, Lethe, Lifung, Lightdarkness, Likebox, Lindmere, Lir, LittleDan, LittleOldMe, Littleyoda20, Loelin, Lollerskates, Lradrama, Luka666, Luna Santin, Lupo, M.hayek, M1ss1ontomars2k4, MER-C, MONGO, MacGyverMagic, Madchester, Madmath789, Magioladitis, Malatesta, Mani1, Manuel Trujillo Berges, MapsMan, Marcushdaniel, Mariewoestman, MarkMarek, Markus Krötzsch, Mashford, Math.geek3.1415926, Matthias Heiler, Mauler90, Maurice Carbonaro, Maurreen, Mav, Maxis ftw, Maxstr9, Mayumashu, Meisterkoch, Melos Antropon, Mentifisto, Merube 89, Mets501, Mgmei, Mgummess, Michael Hardy, Michaelh09, Mike2vil, Miked2009, Minestrone Soup, Miskin, MithrandirAgain, Mjpurses, Mlm42, Modernage, Modulatum, Moink, Mokakeiche, Mr Stephen, MrOllie, MrSomeone, Mrbond69, Mrhurtin, Ms2ger, Mspraveen, Musicman69123, Mygerardromance, N.j.hansen, Nahum Reduta, Nandesuka, Narcissus, Natural Philosopher, NawlinWiki, Nbarth, Ndkl, NeilN, Neokamek, Nick Garvey, Nigel5, Nikai, NinSmartSpasms, Ninly, Nixeagle, Nnedass, Nneonneo, Nohup.in, Nolefan3000, NuclearWarfare, NuclearWinner, Nucleusboy, Nufy8, Nuttyskin, OSJ1961, Obey, Obradovic Goran, Oleg Alexandrov, Oliver202, Olop4444, Omicronpersei8, Oreo Priest, Orlady, Orphic, Otheus, OverlordQ, OwenX, Owlgorithm, Ozob, P Carn, Pabix, Pakula, Pascal.Tesson, Pattymayonaise, Paul August, Pcap, Peere, Penguinpwrdbox, Peruvianllama, Peter Grey, Petter Strandmark, Pgunnels, Phil Bastian, Philip Trueman, PhotoBox, PhySusie, Physprob, Piano non troppo, Pieburningbateater, PierreAbbat, Pilif12p, PinchasC, Pinethicket, Pizza Puzzle, Pluppy, Pmanderson, Pmeisel, Pnzrfaust, Poison ivy boy, Pokemon1989, Pomakis, Pramboy, Pranathi, Professor Fiendish, Proffie, Programmar, Puchiko, Puck is awesome., PurpleRain, Pvjthomas, Pyrospirit, Qertis, Quangbao, Quantumobserver, Quintote, Qxz, RHaworth, RJHall, Ragesoss, Ral315, Ramblagir, Ramin325, Razimantv, Razorflame, Rdsmith4, Reach Out to the Truth, Reaperman, Recentchanges, Recognizance, Reconsider the static, Red Winged Duck, RedWolf, Reepnorp, RekishiEJ, Renato Caniatti, Rettetast, Revolver, Rich Farmbrough, Rick Norwood, Rjwilmsi, Rl, Roastytoast, RobHar, Robertgreer, RodC, Rokfaith, Rorro, Rossami, Rotem Dan, Routeh, Roy Brumback, Royboycrashfan, Roylee, Rpchase, Rpg9000, Rrenner, Rtyq2, Rustysrfbrds99, Rxc, Ryan Postlethwaite, Ryulong, SFC9394, Salix alba, Saupreissen, Savidan, Schneelocke, ScienceApologist, Sciurinæ, Scottydude, Sdornan, SeoMac, Sephiroth BCR, Sfngan, Shanel, Sheeana, SheepNotGoats, Shinjiman, Shizhao, Shunpiker, Silly rabbit, Simetrical, Sjakkalle, Sjforman, Skal, Skater, Skiasaurus, Skydot, Smashville, Smeira, SmilesALot, Smithbcs, Smithpith, Smoken Flames, Snotchstar!, SoSaysChappy, Soltras, Someones life, Sp00n, SpK, SpLoT, Spartan-James, Specs112, Spkid64, Splash, Spreadthechaos, SpuriousQ, Sr1111, Srkris, Stammer, StaticGull, Stephenb, Stevenmattern, Stevertigo, Stickee, Stizz, Storeye, Stumps, Stwalkerster, Suyashmanjul, Swegei, Symane, TBadger, TakuyaMurata, Tangent747, Tanweer Morshed, Tarek j, Tarret, Tawker, Taxman, Tbonnie, Tbsmith, Tcncv, TedE, Tedjn, Telempe, Template namespace initialisation script, Terence, Terminaterjohn, Tetracube, Tfeeney65, That1dude35, Thatguyflint, The Anome, The Thing That Should Not Be, The Transhumanist, The Transhumanist (AWB), The wub, TheMidnighters, Themfromspace, Thenub314, Thomasmeeks, Thunderboltz, ThuranX, Tide rolls, Tiga-Now, Tikiwont, Timo Honkasalo, Timwi, Tkuvho, Tobby72, Tomayres, Tony Fox, Torvik, Tosha, Tothebarricades.tk, Travisc, Trd89, Tribaal, TrigWorks, Trovatore, Trusilver, Truth100, Tualha, TutterMouse, Tzf, Ukexpat, V10, VDWI, VMS Mosaic, Variable, VasilievVV, Viriditas, Visualerror, WPIsFlawed, Wa2ise, Wapcaplet, Watsonksuplayer, Wayward, Welsh, Widdma, Wik, Wiki alf, WikiZorro, Wikiklrsc, Wikilibrarian, WikipedianMarlith, Willardsimmons, William felton, Wimt, Wknight94, Wolfrock, Worrydoes, Wowz00r, Wraithdart, Wwoods, X!, Xantharius, Xharze, Xnuala, Xod, Xornok, Xrchz, Y Strings 9 The Bar, Yacht, Yamamoto Ichiro, Yazaq, YellowMonkey, Yerpo, Yongtze28, Yosri, Youandme, YourBrain, Yute, Zachorious, Zaraki, Zchenyu, Zenohockey, ‫ןושרג ןב‬, ‫ سأ مساب‬2, 1565 anonymous edits Average  Source: http://en.wikipedia.org/w/index.php?oldid=386187730  Contributors: 16@r, 2D, AGToth, AbsolutDan, AirdishStraus, Alai, Amal Hatun, Amirab, Andrewa, Angsteh, Anonymous Dissident, Appleptic, Aqui, Ariel., Arthur Rubin, Averagejef, Averagejefpet, B4hand, Belg4mit, Belovedfreak, Berland, Bevo, Bigvic318, Bo Jacoby, CSWarren, CWY2190, CardinalDan, Carlsotr, Chamal N, Charles Matthews, Chase me ladies, I'm the Cavalry, ClubOranje, CroydThoth, DEMcAdams, Dag Hovland, DanMS, Daniel.Cardenas, Darkspots, DeadEyeArrow, Den fjättrade ankan, DerHexer, Derek Ross, Diegotorquemada, Direvus, Discospinster, Dmcq, Dmmaus, Donarreiskoffer, Dtrebbien, Dweller, Ehrenkater, Ellmist, Ellywa, Epbr123, FrancoGG, FreplySpang, Fresheneesz, Ft1, Fuzheado, G716, Gail, Gap, Gco, Gene Nygaard, Giants27, Giftlite, Gilgamesh, Gilliam, Gizurr, Gogo Dodo, Gothmog.es, Grick, Hadleywickham, Hans Adler, Happy-melon, Heavyweight Gamer, Henrygb, Herrsheng, Hgberman, Hirak 99, Hokanomono, Htim, Hu12, Hydrogen Iodide, Hypnosifl, Impdog, Infrangible, Insanity Incarnate, Io, J.delanoy, JPD, JaGa, Jake Wartenberg, JamesBWatson, Jeffussing, Jffootball133, John254, Johnbibby, Jonas AGX, Jshtz4, Justinfr, Jérôme, Kaldari, Khalad, Kiril Simeonovski, Kiscica, Kjtobo, Kndiaye, Kostisl, Kri, Kyorosuke, L33tminion, Lambiam, Lammidhania, Luboogers25, Lulu of the Lotus-Eaters, Magic.crow, Majorly, Marek69, Melcombe, Michael Hardy, Michal Jurosz, Mmxx, Mohsinvyr, Molotron, Montrealais, Moosetraxx, Mormegil, Mwalcoff, Mwtoews, NawlinWiki, Nbarth, Neutron65, Nickylame2, Nishkid64, O18, Octahedron80, Oleg Alexandrov, OnBeyondZebrax, Optichan, Ozob, PRB, Patrick, Patsup, Paul August, PeterStJohn, Pilif12p, Pinethicket, Piotrus, Plf515, Pmanderson, Pseudomonas, QRS III, Qwfp, RDBury, RTFVerterra, Rjohnson92, Robroot, RoyBoy, Rrburke, Ruakh, Salgueiro, Savidan, Scottqwerty123, Seabhcan, SharkD, Silly rabbit, Sirmylesnagopaleentheda, Smithpith, Snoyes, Srathahtars, Stephan Leeds, Stickee, SunDragon34, Syrthiss, Sławomir Biały, TakuyaMurata, Tannin, Tarquin, Thatguyflint, Tide rolls, Timwi, Userabc, Vsmith, Wikid77, Wildscop, ZackV, Zadcat, Zenohockey, ‫لیقع فشاک‬, 316 anonymous edits

236

Image Sources, Licenses and Contributors

Image Sources, Licenses and Contributors File:The Normal Distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:The_Normal_Distribution.svg  License: Public Domain  Contributors: Original uploader was Heds 1 at en.wikipedia File:Gretl screenshot.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gretl_screenshot.png  License: GNU General Public License  Contributors: Den fjättrade ankan, Hannibal, WikipediaMaster File:Euclid.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Euclid.jpg  License: Public Domain  Contributors: Cyberpunk, Deerstop, Fishbone16, HUB, Mattes, Petropoxy (Lithoderm Proxy), 5 anonymous edits Image:Kapitolinischer Pythagoras adjusted.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Kapitolinischer_Pythagoras_adjusted.jpg  License: GNU Free Documentation License  Contributors: Original uploader was Galilea at de.wikipedia File:maya.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Maya.svg  License: GNU Free Documentation License  Contributors: Bryan Derksen File:GodfreyKneller-IsaacNewton-1689.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:GodfreyKneller-IsaacNewton-1689.jpg  License: unknown  Contributors: Algorithme, Beyond My Ken, Bjankuloski06en, Grenavitar, Infrogmation, Kelson, Kilom691, Porao, Saperaud, Semnoz, Siebrand, Sparkit, Thomas Gun, Wknight94, Wst, Zaphod, 4 anonymous edits File:Leonhard Euler 2.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Leonhard_Euler_2.jpg  License: unknown  Contributors: Haham hanuka, Herbythyme, Serge Lachinov, Shakko, 6 anonymous edits File:Infinity symbol.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Infinity_symbol.svg  License: Public Domain  Contributors: Darapti, Hello71, Indolences, Kilom691, Magister Mathematicae, Wst, 6 anonymous edits File:Carl Friedrich Gauss.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Carl_Friedrich_Gauss.jpg  License: unknown  Contributors: Bcrowell, Blösöf, Conscious, Gabor, Joanjoc, Kaganer, Kilom691, Luestling, Mattes, Rovnet, Schaengel89, Ufudu, 4 anonymous edits File:Abacus 6.png  Source: http://en.wikipedia.org/w/index.php?title=File:Abacus_6.png  License: unknown  Contributors: Flominator, German, Grön, Luestling, RHorning File:Elliptic curve simple.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Elliptic_curve_simple.svg  License: GNU Free Documentation License  Contributors: User:Pbroks13 File:Rubik's cube.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Rubik's_cube.svg  License: GNU Free Documentation License  Contributors: User:Booyabazooka File:Group diagdram D6.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Group_diagdram_D6.svg  License: Public Domain  Contributors: User:Cepheus File:Lattice of the divisibility of 60.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lattice_of_the_divisibility_of_60.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Ed g2s File:Illustration to Euclid's proof of the Pythagorean theorem.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Illustration_to_Euclid's_proof_of_the_Pythagorean_theorem.svg  License: Public Domain  Contributors: Darapti, Gerbrant File:Sine cosine plot.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Sine_cosine_plot.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Qualc1 File:Hyperbolic triangle.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Hyperbolic_triangle.svg  License: Public Domain  Contributors: Bender235, Kieff, 1 anonymous edits File:Torus.png  Source: http://en.wikipedia.org/w/index.php?title=File:Torus.png  License: Public Domain  Contributors: Kieff, Rimshot, SharkD File:Mandel zoom 07 satellite.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Mandel_zoom_07_satellite.jpg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Wolfgangbeyer File:Measure illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Measure_illustration.png  License: Public Domain  Contributors: User:Oleg Alexandrov File:Integral as region under curve.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_as_region_under_curve.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: 4C File:Vector field.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Vector_field.svg  License: Public Domain  Contributors: User:Fibonacci File:Airflow-Obstructed-Duct.png  Source: http://en.wikipedia.org/w/index.php?title=File:Airflow-Obstructed-Duct.png  License: Public Domain  Contributors: Original uploader was User A1 at en.wikipedia File:Limitcycle.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Limitcycle.jpg  License: GNU Free Documentation License  Contributors: Dcoetzee, It Is Me Here, Kilom691, Knutux File:Lorenz attractor.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lorenz_attractor.svg  License: Creative Commons Attribution 2.5  Contributors: User:Dschwen File:Princ argument ex1.png  Source: http://en.wikipedia.org/w/index.php?title=File:Princ_argument_ex1.png  License: GNU Free Documentation License  Contributors: User:ThibautLienart File:Venn A intersect B.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Venn_A_intersect_B.svg  License: Public Domain  Contributors: User:Cepheus File:Commutative diagram for morphism.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Commutative_diagram_for_morphism.svg  License: Public Domain  Contributors: User:Cepheus File:DFAexample.svg  Source: http://en.wikipedia.org/w/index.php?title=File:DFAexample.svg  License: Public Domain  Contributors: User:Cepheus File:Caesar3.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Caesar3.svg  License: Public Domain  Contributors: User:Cepheus Image:Gravitation space source.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gravitation_space_source.png  License: GNU Free Documentation License  Contributors: Duesentrieb, Schekinov Alexey Victorovich, Superborsuk, WikipediaMaster Image:BernoullisLawDerivationDiagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:BernoullisLawDerivationDiagram.svg  License: GNU Free Documentation License  Contributors: User:MannyMax Image:Composite trapezoidal rule illustration small.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Composite_trapezoidal_rule_illustration_small.svg  License: Attribution  Contributors: User:Pbroks13 Image:Maximum boxed.png  Source: http://en.wikipedia.org/w/index.php?title=File:Maximum_boxed.png  License: Public Domain  Contributors: User:Freiddy Image:Two red dice 01.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Two_red_dice_01.svg  License: Public Domain  Contributors: Stephen Silver Image:Oldfaithful3.png  Source: http://en.wikipedia.org/w/index.php?title=File:Oldfaithful3.png  License: Public Domain  Contributors: Anynobody, Maksim, Mdd, Nandhp, Oleg Alexandrov, WikipediaMaster, 6 anonymous edits Image:Market Data Index NYA on 20050726 202628 UTC.png  Source: http://en.wikipedia.org/w/index.php?title=File:Market_Data_Index_NYA_on_20050726_202628_UTC.png  License: Public Domain  Contributors: Denniss, Jodo Image:Arbitrary-gametree-solved.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Arbitrary-gametree-solved.svg  License: Public Domain  Contributors: User:Qef Image:Signal transduction v1.png  Source: http://en.wikipedia.org/w/index.php?title=File:Signal_transduction_v1.png  License: GNU Free Documentation License  Contributors: Original uploader was Roadnottaken at en.wikipedia Image:Ch4-structure.png  Source: http://en.wikipedia.org/w/index.php?title=File:Ch4-structure.png  License: GNU Free Documentation License  Contributors: Benjah-bmm27, Dbc334, Maksim Image:GDP PPP Per Capita IMF 2008.png  Source: http://en.wikipedia.org/w/index.php?title=File:GDP_PPP_Per_Capita_IMF_2008.png  License: Creative Commons Attribution 3.0  Contributors: User:Sbw01f Image:Simple feedback control loop2.png  Source: http://en.wikipedia.org/w/index.php?title=File:Simple_feedback_control_loop2.png  License: unknown  Contributors: Corona Image:Normal Distribution PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_PDF.svg  License: Public Domain  Contributors: User:Inductiveload Image:Normal Distribution CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_CDF.svg  License: Public Domain  Contributors: User:Inductiveload Image:standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon, Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits Image:De moivre-laplace.gif  Source: http://en.wikipedia.org/w/index.php?title=File:De_moivre-laplace.gif  License: Public Domain  Contributors: User:Stpasha Image:QHarmonicOscillator.png  Source: http://en.wikipedia.org/w/index.php?title=File:QHarmonicOscillator.png  License: GNU Free Documentation License  Contributors: Inductiveload, Maksim, Pieter Kuiper Image:Fisher iris versicolor sepalwidth.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Fisher_iris_versicolor_sepalwidth.svg  License: Creative Commons Attribution-Sharealike 3.0  Contributors: User:Pbroks13

237

Image Sources, Licenses and Contributors Image:Planche de Galton.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Planche_de_Galton.jpg  License: Creative Commons Attribution-Sharealike 3.0  Contributors: User:Antoinetav Image:Carl Friedrich Gauss.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Carl_Friedrich_Gauss.jpg  License: unknown  Contributors: Bcrowell, Blösöf, Conscious, Gabor, Joanjoc, Kaganer, Kilom691, Luestling, Mattes, Rovnet, Schaengel89, Ufudu, 4 anonymous edits Image:Pierre-Simon Laplace.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Pierre-Simon_Laplace.jpg  License: unknown  Contributors: Ashill, Ecummenic, Elcobbola, Gene.arboit, Jimmy44, Olivier2, 霧木諒二 File:cumulativeSD.svg  Source: http://en.wikipedia.org/w/index.php?title=File:CumulativeSD.svg  License: Public Domain  Contributors: User:Inductiveload, User:Wolfkeeper Image:Standard deviation illustration.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_illustration.gif  License: unknown  Contributors: Forlornturtle File:Comparison standard deviations.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Comparison_standard_deviations.svg  License: Public Domain  Contributors: User:JRBrown File:Standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon, Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits Image:Latex real numbers.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Latex_real_numbers.svg  License: GNU Free Documentation License  Contributors: User:Arichnad File:Number-line.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Number-line.gif  License: Public Domain  Contributors: Original uploader was MathsIsFun at en.wikipedia Image:Boxplot vs PDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Boxplot_vs_PDF.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: Original uploader was Jhguch at en.wikipedia Image:Discrete probability distribution illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distribution_illustration.png  License: Public Domain  Contributors: User:Oleg Alexandrov Image:FoldedCumulative.PNG  Source: http://en.wikipedia.org/w/index.php?title=File:FoldedCumulative.PNG  License: Creative Commons Attribution-Sharealike 3.0  Contributors: Rumping Image:Discrete probability distrib.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distrib.svg  License: Public Domain  Contributors: User:Oleg Alexandrov Image:Discrete probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis Mrsi Image:Normal probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis Mrsi Image:Mixed probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Mixed_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis Mrsi Image:Fair dice probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Fair_dice_probability_distribution.svg  License: Public Domain  Contributors: User:Oleg Alexandrov File:Rapid Oscillation.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Rapid_Oscillation.svg  License: Creative Commons Attribution 3.0  Contributors: --pbroks13talk? Original uploader was Pbroks13 at en.wikipedia Image:Right-continuous.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Right-continuous.svg  License: Public Domain  Contributors: w:User:JacjJacj Image:Left-continuous.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Left-continuous.svg  License: Public Domain  Contributors: Jacj, Plasticspork Image:continuity topology.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Continuity_topology.svg  License: Public Domain  Contributors: User:Dcoetzee Image:Measure illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Measure_illustration.png  License: Public Domain  Contributors: User:Oleg Alexandrov File:Flag of France.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Flag_of_France.svg  License: Public Domain  Contributors: User:SKopp, User:SKopp, User:SKopp, User:SKopp, User:SKopp, User:SKopp Image:Rotating spherical harmonics.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Rotating_spherical_harmonics.gif  License: GNU Free Documentation License  Contributors: Cyp, Jengelh, Pieter Kuiper, 1 anonymous edits Image:Laplace house Arcueil.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Laplace_house_Arcueil.jpg  License: unknown  Contributors: User:cutler Image:Pierre-Simon-Laplace (1749-1827).jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Pierre-Simon-Laplace_(1749-1827).jpg  License: unknown  Contributors: Gabor, Luestling, Olivier2, Umherirrender File:Integral example.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_example.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:KSmrq File:ArabicIntegralSign.svg  Source: http://en.wikipedia.org/w/index.php?title=File:ArabicIntegralSign.svg  License: Public Domain  Contributors: ZooFari File:Integral approximations.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_approximations.svg  License: GNU Free Documentation License  Contributors: User:KSmrq File:Integral Riemann sum.png  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_Riemann_sum.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:KSmrq File:Riemann sum convergence.png  Source: http://en.wikipedia.org/w/index.php?title=File:Riemann_sum_convergence.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:KSmrq File:Improper integral.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Improper_integral.svg  License: GNU Free Documentation License  Contributors: User:KSmrq File:Volume under surface.png  Source: http://en.wikipedia.org/w/index.php?title=File:Volume_under_surface.png  License: Public Domain  Contributors: User:Oleg Alexandrov File:Line-Integral.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Line-Integral.gif  License: GNU Free Documentation License  Contributors: Cronholm144, Darapti, Nandhp, SkiDragon File:Surface integral illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Surface_integral_illustration.png  License: Public Domain  Contributors: Darapti, Oleg Alexandrov, WikipediaMaster File:Numerical quadrature 4up.png  Source: http://en.wikipedia.org/w/index.php?title=File:Numerical_quadrature_4up.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:KSmrq Image:Graph of example function.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Graph_of_example_function.svg  License: Creative Commons Attribution 2.5  Contributors: KSmrq File:Function machine2.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Function_machine2.svg  License: Public Domain  Contributors: Wvbailey (talk). Original uploader was Wvbailey at en.wikipedia. Later version(s) were uploaded by Threecheersfornick at en.wikipedia. Image:Function machine5.png  Source: http://en.wikipedia.org/w/index.php?title=File:Function_machine5.png  License: Public Domain  Contributors: User:Wvbailey File:Gottfried Wilhelm von Leibniz.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Gottfried_Wilhelm_von_Leibniz.jpg  License: unknown  Contributors: Beyond My Ken, Davidlud, Eusebius, Factumquintus, Gabor, Luestling, Mattes, Schaengel89, Svencb, Tomisti, 4 anonymous edits File:Tangent derivative calculusdia.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Tangent_derivative_calculusdia.svg  License: GNU Free Documentation License  Contributors: Minestrone Soup File:Sec2tan.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Sec2tan.gif  License: GNU Free Documentation License  Contributors: User:OSJ1961 File:NautilusCutawayLogarithmicSpiral.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:NautilusCutawayLogarithmicSpiral.jpg  License: Attribution  Contributors: User:Chris 73

238

License

License Creative Commons Attribution-Share Alike 3.0 Unported http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

239

Related Documents


More Documents from "Logical Muhammad"