E. C. Pielou - The Interpretation Of Ecological Data_ A Primer On Classification And Ordination (1984, John Wiley & Sons).pdf

Uploaded by: Victor Costa Ferreira Gomes
0
0

March 2021
PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View E. C. Pielou - The Interpretation Of Ecological Data_ A Primer On Classification And Ordination (1984, John Wiley & Sons).pdf as PDF for free.

More details

Words: 86,832
Pages: 265

Preview
Full text

Loading documents preview...

The lnterpretation of Ecological Data

The lnterpretation of Ecological Data A Primer on Classification and Ordination

E. C. Pielou University of Lethbridge

A Wiley-lnterscience Publication

JOHN WILEY & SONS New York • Chichester. Brisbane. Taranta. Singapore

Preface

Thethaim . . d of this d b book is to .give a fu 11' d etailed, introductory account of the me lfo s. use fi /d cdommuruty ecologists to make large, unwieldy masses of mu ivanat~ e ata comprehensible and interpretable. 1 am convinced that there is need for such a book. There are more advanced books that cover sorne · · . . of the same material ' for example, L . or l'oc1., s M u¡twarzate Analysls zn Vegetation Research (W. Junk 1978) and A D G d ' Cl ifi · ' . . or on s assz catwn. Methods for the Exploratory Analysis of Multiuariate Data (Chapman and Hall, 1981), but they assume a much higher level of mathematical ability in the reader than
vii

viii

to expect ecologists to understand what . . t unreasonable d However, it is no_ h en if they do not understan how. There do1ng for t ero ev the programs are the person who uses a ready-made progra:rn. f difference between . is a world o d . nvectors of a large matnx and who under. · nvalues an eige to find the eige . d the person who delegates the whole task of h t these things are, an ent analysis (for instance) to such a program With stands w ~ . 1 d . a pnncipa compon . omg d. f what the analysis
E. C. Lethbridge, A/berta, Canada April 1984

PIELOU

Contents 1

INTRODUCTION

1

1.1

Data Matrices and Scatter Diagrams, 3

1.2

S~me Definitions. and Other Preliminaries, 8 A1m and Scope of This Book 11

1.3

1

2 CLASSIFICATION BY CLUSTERING 2.1

lntroduction, 13

2.2

Nearest-Neighbor Clustering, 15

2.3

Farthest-Neighbor Clustering, 22

2.4

Centroid Clustering, 25

2.5

Mínimum Variance Clustering, 32

2.6

Dissimilarity Measures and Distances, 40

2.7

Average Linkage Clustering, 63

2.8

Choosing Among Clustering Methods, 72

2.9

Rapid Nonhierarchical Clustering, 76

13

Appendix: Apollonius's Theorem, 78 Exercises, 79

3

TRANSFORMING DATA MATRICES 3.1 lntroduction, 83 3.2 Vector and Matrix Multiplication, 85

83

ix

X

3.3

3.4 3.5

f Data Matrix and its Transpose, 102 The Product o a f a Square The Eigenvalues and Eigenvectors o Symmetric Matrix, 11

6

The Eigenanalysis of XX' and X 'X, 126

Exercises, 129

133

4 ORDINATION 4.1

lntroduction, 133

4.2

Principal Component Analysis, 136

4.3

Four Different Versions of PCA, 152

4.4

Principal Coordinate Analysis, 165

4.5

Reciproca! Averaging or Correspondence Analysis, 176

4.6

Linear and Nonlinear Data Structures, 188

4.7

Comparisons and Con el usions, 197

Exercises, 199

5 DIVISIVE CLASSIFICATION

203

5.1

lntroduction, 203

5.2

Constructing and Partitioning a Mínimum Spanning Tree, 205

5.3

Partitioning a PCA Ordination, 211

5.4

Partitioning RA and DCA Ordinations, 218

Exercises, 221

6 DISCRIMINANT ORDINATION 6.1

lntroduction, 223

6.2

Unsymmetric Square Matrices 224

6.3 Discriminant Ordination of Exercises, 237

223

S~~eral Sets of Data

230 '

coNTENTS

xi

ANSWERS TO EXERCISES

239 GLOSSARY BIBLIOGRAPHY

INDEX

247 257 261

The lnterpretation of Ecological Data

Chapter One

lntroduction

Probably are familiar with field noteb ooks whose pages look . ali. ecologists . so~et~g like Fi~ure l. l. Probably most ecologists, even those still at the begmnmgs of the~ careers, have drawn up tables like it. Their efforts may b~ neater or .mess1er, dep~nding on the person and the circumstances (wind, ram, mosqmtoes, gathenng darkness, a rising tide, or any other of the stresses an ecologist is subject to). But such tables are ali the same in principie. They show the values of each of several variables (e.g., species quantities) in each of severa! sampling units (e.g., quadrats). Tables such as these are the immediate raw material for community study and analysis. Although natural, living communities as they are found in the field are, of course, an ecologist's ultimate raw material, it is impossible to come to grips with them mentally without first representing them symbolically. A table such as that in Figure 1.1, which is part of a data matrix, * is a typical symbolic representation of a natural community. 1t is the very first representation from which all subsequent analyses, and their representations, flow. Therefore it is the first link in the chain leading from an actual, observed ' to a theory concerning the community, and possibly to more community inclusive theories concerning ecological communities generally. The interpretation of such data matrices is the topic of this book. This introductory chapter first describes in general outline the p~oc~dures that make data interpretation possible. Then, as a necessary preliminary to ali *Words italicized in the text are defined in the Glossary at the end of the book. 1

{JU- ~ '

c: ?oo "" ~ ~ '4...o,_._11;; ~ <;1-e.ep~ R. ; Jf-0 .h-1 lv..!~j

f t,f-C,

(P. .t 1-~1-t

~ /t

~ /{-:W

Q~

.,. ,,

Jr' II I

#:12-.

/'f

13

s 8

7

t8

'1 ~

I

'l

I

11/f

~

lo f1

13

3

£"

I

I I

c,

,,_I

3

fl

ai

2

I

6

0

~

I

•

1 /

/

"J

2

1

/

q :i.

/ "")

/o

:i.

y

I

JS-

,3

1 ~

,~

;l.

2-

lb

J_

t.

--

/':,-

/O

'O fo lb

16

9

19

'f

1

1~

J

/

/

3

1

4

Scanned by CamScanner

I

I

I

I·

3

I

I

I (

I

I

(

11.. : /l ; /'l : /":> ;

-latd

't /

111aw.1llllltebook. This one records observations on the ground lD

the floodplain of the Athabasca River, Alberta.

oATA MATRICES ANO SCATTER DIACRAMS

3

that follows, is a section on termino} . . . f ogy. As is m . t bl . b rowing su ~ect, a ew of the technical t ev1 a e lil a rapidly g . erms are used . d. ·trerent wnters. Therefore it is Ilece m lfferent senses by dl . ' ssary to den . senses in which they are used in this book . lne,. unamb1guously, the ' as is done m Section 1.2. \

1.1.

DATA MATRICES AND SCATTER DIAGRAMS

A data matrix in the most general sense of th t . · d e erm is any table of observat10ns ma e up of rows and column Th d . . s. e ata matnces most commonly encountered m community ecology t bl h · . . are a es s owmg the amounts of severa! spec1es m · Thus . . each of a number of samp ling uruts. there are obVIously two poss1ble ways of constructing a data matrix: either one may let each row represent a different species and each column a different sampling unit (as is done throughout this book), or vice versa. The method used here is the one favored by the majority of ecologists. Any matrix (and that includes data matrices) can be denoted by a single symbol that represents, by itself, the whole array of numbers making up the table. A symbol representing a matrix is usually printed in boldface. If the matrix has s rows and n columns, it is described as an s X n matrix or, equivalently, as a matrix of order s X n. The symbols s and n are used throughout this book to denote the orders of data matrices for mnemonic reasons: s stands for s pecies and n for n umber of sampling units. In specifying the order, or size, of a matrix, the number of rows is always written first and the number of columns second. Now consider the symbolic representation of a 3 X 4 data matrix, say X, using subscripted x s in place of actual numerical values. Then

X = (

~:: ~:~ ~:: ~:: ) . X31

X32

X33

X 34

t . that is every individual term, As may be seen, every element 0 ~ the m:c~, the po~ition of the element in has two subscripts. These subscnpts sp Yb f the row and the second th . . es the num er o ' e matrix: the first subscn~t giv . e element occurs. For example, X24 the number of the column m which th h mn of x· in general, w and fourt co1u ' . denotes the elemen t in the secon d ro d .th column. This rule is · the ith row an 1 ar lt one writes x .. for the element m . . where matrices appe · adhered to uciversally in all mathematical wntmg

111111.

1

1iill1 ,

11

l

1111 1 1

'''

ll

111

11

11 lllll

q ti' 1

d11:1 111.1t

·· 111

llH

1tll\'l. 'l1 lll

I

.0 11111111 , say, i..: onslitulcs a hst of lhe (if a spccics is ahsent frorn . f thc 1 . 1111 row is alisto thequantitic

tli r ¡ l 1i

11

ti 1

• ¡1 '' 11111' 11111' ti 11

·•

) 1 1I t V1St ' I l l

'

S Of

1111 l 111111 .•i. . h . th · s 11111 p 111nh le 111 O 1 l 1•I. 1•.1 mtcrprctat1on, t . e subJect of tk " 'lll 1111 t tu l H 1 . 111 . latcnl structure m a raw'' data Vl 111 '"' ,., H' ll . IVl 0 11 ¡~. ldn1 11 t.isv whether it has any ¡,, ,, 1k '1• 1 i 11 11ht• 11 ·ll I , <>• cvcn to judge . 1 1 l 1 1\\.1 1 · . 1 1s '""'P' ' ' • W l 111 ' illl •.111 y 'systematac pattern that wout.1 '""I 111 111 . 111 1\ ti11l lh 's t " ' . . i•roups of species tended to ""~1111pk l 1t.1111 l"'I -~ 11 "11 ·. 111· 1h.11. ,,,. ' ¡ , whcn appropriately arranged, WoUld • 1 t 1 s·1111pl111¡• 11111 s, .. 1 1,, .l th ' · 11 111. 1 ll ·: , , • ín Lhcir spccies compos1ttons. ' 1 • 1 llll(lllllOllS lt ll( 1 }Q d ' • t11h1t .1":"11" ·' . .. 1• . thc following 10 X atamatnx.It 11

if,\ '"'"'

\1, I 11

1 ,111 '

:.. ,111 :11 lil 1ri:l l

1 • 1 .m"

t ,

:1111pk, OllSI< CI

n1.1tii . ()

()

()

()

()

3

o

1

1

4

3

4

1

o

2

3

2

3

2 4

o o

3 3

o o

2

1

3

4 3

o o 4 o 1 2 o 2 o 1 o o

4

1

~

o o

1

3

o

o 3 o 2 o o

4 2 l

4 l

2

3

l

()

o o o o o

2 1 2

o o

2 3

o

o o

2 1

4 o ,,, .tlwa vs. thl: rnws rcprcscnt species and the columns represent s ()

3

l

2

3

uni ts.

undcniably, lacks any evident structure. But now 1h l sa111plmg units (rnlumns) and the species (rows) were rearr 1pptllp11at • fashion . Thc result is the "arranged matrix" l'h1 s 111ati i: .

4

3

2

1

3

4

3

2

2

3

4

3

o

2 1

3 2 1

4 3 2 1

1

o o o o o 1 o o o o 2 1 o o o 2

o o 1 o

3 4 3

2 3 4

1 2 3

1

2

3

2 3 4 3 2

1

', . o o o o o. o 1h1s mnt1' . l. ontains xartly the s

o o o o o o o o o

3 4 3 2 1

u o o o o 1 2 3 4

. th ord rings of th. . .· ame infonnation as th spt:ltcs, and of the sampling unit

DATA MATRICES ANO SCATIER DIAGRAMS

5

for instance, the. species labeled # 1 (therefo re, appeanng . m . the first row) . in the raw matnx IS labeled # 4 ( and therefo . . ' re, appears m the fourth row) in the arranged matnx. . # 1, m . column 1 . .As to the colurnns ' samPling urut of the ra~ matnx, for mstanc~, has been relabeled as # 2 and laced in column 2 m the arranged matnx. P The method by which the labeling system th t d . . . . a pro uces the arranged matnx was determmed Is descnbed in Chapter 4 · . . and th ese two matnces, w1th the1r rows and columns labeled, are given again in Table 4.ll. lt su~ces to remark here t~at the method is reasonably sophisticated. To denve the a~~ange~ m~'tnx from the. raw version without a prescribed method (an a~g?~Ithm ) would be time-consuming and rather like the efforts of a norurutiate to solve a Rubik cube. The preceding arranged matrix X is a clear example of a matrix with "structure." Data interpretation, as the term is used in this book, consists of methods of perceiving the structure in real data matrices even though these matrices, in raw form, may be as seemingly unstructured as the raw version of X. Table (or matrix) arrangement, as demonstrated here, is only one such method. Two other techniques, classification and ordination, do more to reveal the structure of a data matrix than
ª

INTRODUCTION 6

· a row of samPle plots ( quadrats) along b servmg . t might collect by o 1 eco og1s . f mple up th e s1·ae of a mountain or across vironmental grad1ent, or exa . th ordering of the quadrats is an en In tbis case e h al tmarsh from land to sea. as . need for an "ordination." Now suppose t. e . in level mixed fores t w1th · en in advance and there is no giv h b ceous vegetat10n ' . h ecologist samples the er a that the enwonment, t ough t and suppose a1so Id b d randomly scattered qua ra s, . . d fi .te gradients. There wou e no exhibits no e m moderately heterogeneous, d ts but it might be reasonable . . to order the qua ra ' . . . d "f nly one could discover it. The immediately obv10us way 1 d " existe 1 o to suppose that a "natura or er h vironment of the forest consists ily with clear boundaries supposition amounts to assuming that t e en a·~ habitats (not necessar of a mosaic of I erem d 1h t these different habitats have, themthat the successive habitats up a between the mosaic patches) an t al order in the same way .. . 1 se ves,tain a naoruracross a saltmarsh , h ave a natural order. If this . suppos1t10n is. . t hnique can presumably be devised for d1scovenng this moun h' correct, t en a ec hni · Id an ordina natural order from the vegetation data. Such a tec que y1e s

ª

f tion. ·· The preceding paragraph describes the airo of the first practlt10~ers o ecological ordination. Now let us approach the t~pic _from a different starting point. To malee the discussion concrete, V1Sualize the data that would be amassed by a forest ecologist estimating the biomasses of the trees of several different species in a number of sampling plots in mixed forest. It is required to ordinate the plots. An obvious way of doing this would be to rank them according to the quantity they contained of the most abundant species. But why stop at one ordination? They could also be ranked according to the quantity they contained of the second most abundant species. An easy way to obtain these rankings would be to draw a scatter diagram in which the quantity of the first species is measured along the x-axis, and of the second species along the y-axis; each sample pl9t is represented by a point. Clearly, if the points were projected onto the x-axis, their order would correspond with that of the first ordination; likewise, if they were projected onto the y-axis, their order would correspond with that of the second ordination. But a clearer picture of community structure would be g¡ven by the scatter diagram itself; nothing is gained by considermg. the_ two_ axes separately. The scatter diagram can be considered as an · . ordinat10n m two dunensio . ·1 · . of plot ns, ilabels. is a p1ctonal representation of the data, rather than merely a list I t should now be appare 1 h dim ens1onal . scatter diagram n(sh w ·ere the h argument is leading · If a twoowmg t e amounts in the plots of the two

DATA MATRICES AND SCAITER DIAGRAMS

7

rnost important species) is good ' then a three-d1me . · 1 · showing the three most important s . . ns10na scatter diagram pec1es) is e t 1·nl ( harder to construct-it would have to b d . er Y better. Though f . . e ma e with · stuck in corkboard-1t would contain . pms o vanous lengths B . . more mformaf thfee spec1es? Every time a species that . wn. ut why stop at . . is present 1·s d' . isregarded, a certam amount of mformation is sacrificed Th f . . · ere ore the b t . would be one d1splaymg all the data on all '. . es scatter diagram ' s spec1es m th f of how large s may be. The only difficult . h e orest, regardless . . . Y is t at a scatter d. · than three dlIDens1ons is impossible even t . liz iagram m more . o v1sua e let al One is left with a "conceptual scatter diag ,, ' . one construct. investigate. ram, an unsatisfactory object to

ª

Ecological ordination as it is now practiced ' however,
?f

ali obvious. We now tum to practica! considerations. To arrive at a two-dimensional ordination of a many-dimensional scatter diagram, one h~s to operate on the given data namely an s X n data matrix (recall that s is the number of · ' ' · b h ght of as species and n the number of sampling uruts). The data may et ou . 01 · · · tt d' ram plotted m a b"Vlng the coordinates of the pomts m a sea er iag . COordinate frame with s mutually perpendicular axes (one for each spec1es),

INTRODU( fl(J~

o

Jing unit). The coord 1 ach sarnP . . f data points (one for e 1 rnents in the J th co1umn c,f · ung o n . by the e e . " ,, and cons1s . t say are given . ts form.Ing a swarm or h ·th pom , ' . f d ta potn ' . nates of t el_ Th whole collectton o ny ordination techmque~ the data m~trIX. lel d a data swarm. All the frna pping an s-dimensjonaJ " d ,, w1ll be ca e . ays o rna . . clou ' . e amount to d1fferent w r What is obtamed is an that ecolog1sts us heet of two-dimensional pape ~st widely used method~ data swarm on a s . h best-known, m . 1 . . f mpling uruts. T e ssary mathematica preordmation o sa 4 f llowing sorne nece are described in Chapter ' o Iiminaries in Chapter 3. . d that if it is Iegitimate to treat a bod~ of No doubt the reader has not1c~ d. i·onal space (s-space) as JUst . . ts ID s- imens . . data as eqmvalent to n pom . . t treat it as s pomts m n-space. described then it is equally leg1t1mate rº h column) of the data matrix ' h 1nstead o eac . When this is done, eac row Th ares points altogether and they · f data pomt. ere . 0 gives the coordmates ª . h mutually perpendicular axes, one . d·nate frame w1t n . 1 are plotted m a coor _ resents a species. Ordinat10n of the . t for each sampling umt. Each pom . rep_ . data swarm therefore, gives an ordmat10n of spec1es. . . . ' . 1· nits is known as an R-type ordznatwn , An ordmauon of samp mg u . . . . whereas an ordination of speci~s_i_La Q-type ordznation. Smrilarly, there are R-type and Q-type classifications, but these terms are sel_don:1 used. . . The techniques for performing R-type and Q-type ordmat10ns are 1dent1cal, and at first thought the two types of analyses seem equally legitimate. Q-type analyses have one great drawback, however. If one plans to carry out any statistical tests on the data, it is essential that the "objects" sampled be independent of one another. Community sampling is nearly always done in a way that ensures the mutual independence of the sampling units, and the sampling units are the objects in an R-type ordination. But the species in the sampling units are almost certainly not independent and it is the species that. are the objects in a Q-type ordination. Statistical hypothesis testing is outside the scope of this book, and we do not have occasion to consider the randomness and independence of sampling units again. However, the contrast between R and Q-ty 1 f . . . pe ana yses rom the statistical point of y¡eW should be kept in mind.

ª

e

1.2.

SOME DEFINITIONSANO OTHER PRELIMINARIES

Before proceeding it is most im . book of two terms much u d . portant to give the definitions used in this se m community ecology: sample and clustering.

SOME DEFINITIONS ANO OTHER PRELIM INARIES 9

The word sample is a source of

enormous co f .

most unfortunate. To illustrate· 1·m . n us10n in ecology which 1·s . · agme that ' observat10ns on a number of quad t * ª plant ecologist has made

. h ra s. To a st r .. ecolog1sts, eac quadrat is a samnf" . a istician, and to many . r mg unlt and th quadrats.1s a sample. To other ecologists ( e whole collection of quadrat 1s a "sample" and the whole lel.g.,. Gauch, 1982) each individual co ection of d . qua rats is a "sample . set." The muddle can be shown most el 1 words in the four cells are the name ~ar y m 2 X 2 table in which the s g1ven to the b · . row labels by the people specified in th ~ects spec1fied in the eco1umn labels. Thus:

ª

Single unit (e.g., quadrat) Collection of uni ts

°

Statisticians and Sorne Ecologists

Other Ecologists

Sampling unit

Sample

Sample

Sample set

!his boo~ uses the terms in the left-hand column. Neither terminology is entlfely satisfactory, however, because it is a nuisance to have to use a two-word term (either sampling unit or sample set) to denote a single entity. Therefore, in this book 1 have used the word quadrat to mean a sampling unit of any kind, and have occasionally interpolated the additional words "or sampling unit" as a reminder. This is a convenient solution to the problem but it remains to be seen whether it will satisfy ecologists whose sampling units are emphatically not quadrats: for example, students of river fauna who collect their material with Surber samplers; palynologists, whose sampling unit is a sediment core; planktonologists whose sampling unit is tbe catch in a sampling net; entomologists, whose sampling unit. is the .c~tch in a sweep net or a light trap; diatom specialists, whose sampling umt is.ª microscope slide; foresters, whose sampling unit is ~ plot or stand ~h~t is much larger than a traditional quadrat although, like a quadrat, it is ª delimited area of ground. .. . t A G Tansley and T. F. Chipp, in their *The term quadrat is surely fanuhar to all ecologi~ s. .f· h Empire Vegetation Committee, class1c Aims and Methods in the Study 01 Vegetatzon n ~Is ermanently marked off as a 192 " · 1 are area temperan Y or P . 6), define a quadrat as s1mp .Y a squ ,, A ore modero definition would omit 1 1 sample of any vegetation it is desired to study e os~ y. al m that although the definition says the word "square"; a quadrat can be any shape. ?te h soght ~f as smaller than a "plot." . . d. ge a quadrat is t ou 1· . t the nothi ng about size m or mary usa . f drat nor Iower lffilt 0 H ' limit to the size o a qua • . owever, there is no agreed upon upper bl be called by either name. 1,ize of a plot; sorne" marked off areas" could reasona Y

(B

1N TRO DUCT1o~ 10

. d h word sample because of its as possible 1 have avoide . t. e sed in the statistical sense to As far . ear i t is u . 't but where it does app ' amb1gw y, . f d ats · · whole collect10n o qua r · . Some ecologists treat lt as a mean . al amb1guous. is so ifi t. Other ecologists treat it as The word clustering . ( e class1 ca wn · . synonymous with agglo.mer~ IV . h general sense, including both ag. synonymous with clas~i~c~twn I~o~se Schematically, the two possibilities gl~merative and and divISive met . are as follows: Classification Classification Agglornerative (=

Divisive

( = clustering) Agglomerative

Divisive

clustering)

This book uses the scherne shown on the left. It should also be remarked here that the word cluster is ambiguous too. Consider a two-dimensional swarm of data points. Sorne writers would apply the word cluster only to a "natural" (in other words, conspicuous) cluster of points, that is, a group of points that are simultaneously close to one another and far from all the rest. Other writers use the word cluster to mean any group of points assigned to the same class when a classification is done, no matter how arbitrary its boundaries. The word is not used in this book; hence there is no need to make a choice between these definitions. A note on symbols should make subsequent chapters easier to follow. It was remarked that the term in the ith row and jth column of a data matrix ~also known as the (i, j)th term, oras X¡¡] denotes the quantity of species i m q.uadrat J. Throughout this book, the symbol i always represents a spec1es and the syrnbol j a quadrat. Matrices that are not data matrices are encountered in the book Sorne of these show the relationship b t . . · ·&r e ween parrs of spec1es necessitating the use of two d iuerent symbols to repr t d' esen two ifferent species. In such cases the symbols are h d .

an

1.

Likewise, sorne matrices show the rel . . . necessitating the use of tw diIB ationship between parrs of quadrats, quadrats. In such cases th: berent sy~bols to represent the two different T . sym o1s are J and k bis convention should be recall d . . countered. It is used in all b t f e whenever paued subscripts are en· u a ew special contexts that are explained as

I

¡\IM ANO SCOPE OF THIS BOOK

11

they arise. Thus a term such species i in j. A ter as x,J, With i and 1. . . quadrat . m such as . as subscnpt 1 to a relat10nship between th Yhn With h and . s, re ates to e two s · l as subs · with j and k as subscripts pecies h and i A d cnpts, relates k ' re1ates to a relationshi. bn a term such as z1 k ' · P etween d 1t has been remarked sev al . . qua rats j and er tunes hi data mat~ denotes the "amount" ~ t s ~hapter that each ele The way m which the quantity of a or ~uantity" or a species in ament of a . spec1es should be quadrat. course on the kind of , . orgarusm concemed measured depends of For most ammals, for most 1 · ' · · P ankton 0 poll en grams, and for seedling pla t f rgarusms (plant or animal) f n s o roughl ' or . di 'd al . m VI . u s 1s the simplest measure of ua Y . equal size' the numb er of q ntity. The amounts of su h organisms as mat-forming plant s, and sorne col . 1 e orna corals, sponges and bryozoans, are often best measu d ,, ' · · · re as percen tag " . e cover. The amounts of spec1es m which, though the m·d'1 'd VI ua1s are d1 ( h in size (such as the trees in uneven-a ed f s mct, t ey are very unequal biomass of the individuals present . tgh orest) are best measured by the In all . m e quadrat. commumty studies it is imporiant to decid measure species quantities and the t k e upon the best way to n o ma e the mea . f _surements carefully. However, these matters are outside the purVIew o this book and are not referred to again.

1.3.

AIM ANO SCOPE OF THIS BOOK

The material covered in this book is listed in the Table of Contents. The aim of the book (to paraphrase what has already been said in the Preface) is to explain fully and in detail, at an elementary level, exactly how the techniques described actually work. Packaged computer programs are readily available that can perform ali these analyses quickly and painlessly. Too often users of these ready-made programs do no more than enter their data, select a few options, and accept the results in the printout with no comprehension of how the results were derived. But unless one understands a technique, one cannot intelligently judge the results. Anyone who uses a ready-made program, for instance, to do a

pr~cipal

component analysis, should be capable of doing the identical analysts of .ª small, manageable, artificial data matrix entirely with a desk calculator, or ú

INTRODU CTIO~

12

en with programs written by oneself. N obody can claim to on a cornpu t er, th . . . understand a technique completely who 1s not capabl~ of domg this. . It will be noticed that the book contains no ment10n of such top1cs as sampling errors, confidence intervals, and hypothesis tests. This is because the procedures described are treated as techniques for interpreting bodies 0¡ data that are interesting in their own right, and are not regarded merely as samples from sorne larger population. The techniques can safely be applied, as here described, as long as one realizes that what they reveal is the structure of the data actually in hand. A large body of data can certainly, by itself, provide rewarding ecological insights. But if it is intended to infer the structure of sorne parent population of which the data in hand are regarded only as1 a . sample, then statistical matters do have to be considered . For examp e: it would probably be necessary, at the outset, to transform the observat10~s so as to make their distribution multivariate normal. One is then. entenng the field of multivariate statistical analysis which is h 11 outs1de the scope of this book. ' w o Y The distinction just made between in . multidimensional data swarm' d t~rpretmg the patterns of given s, an analyzmg m lf · . . deserves emphasis· it is too oft bl d u ivanate statlstlcal data, b' , en urre The t and mastery of the first i·s . wo su ~ects are quite distinct a necessary pre · · second. Students who wish t reqms1te to appreciating the 1. . o go on from the . mu tlvanate statistical analys· . present book mto the study of Morrison (1976 ) and Tatsuokais(l 97l) w111 can find many b. 00k s to choose from. be especially recommended.

Chapter Two

Classification by Clustering

2.1.

INTRODUCTION

The t~sk describe~ in t~s chapter is that of classifying, by clustering, a collectlon of sampling umts. In all that follows, the sampling units are called quadrats for brevity and convenience. As described in Chapter 1, the data matrix has s rows, representing species, and n columns, representing quadrats. The ( i, j)th element of the matrix represents the amount of species i (for i = 1, ... , s) in quadrat j (for j = 1, ... , n ). We wish to classify the n quadrats by clustering or, as it is also called, by agglomeration. To begin, each individual quadrat is treated as a cluster with only the one quadrat as member. As the first step, the two most similar clusters (i.e., quadrats) are united to form a two-member cluster. There are now (n - 1) clusters, one with two members and all the rest still with only one member. Next, the two most similar of these (n - 1) clusters are united so that the total number of clusters becomes ( n - 2). The two clusters united may be single quadrats (one-member clusters), in which case two of the (n - 2) clusters have two members and the rest one. Or else one of the two clusters united with another may be the two-member cluster previously formed; in that case one of the ( n _ 2) clusters has three members an~ the rest ?ne. Again, the two most similar clusters are united. And agam and agam and · . ll h · inal quadrats have been agam. The process continues unt11 a t e n ong agglomerated into a single ali-inclusive cluster. 13

CLASSIFICATION BY CLUSTER.!

~~

14

Certain decisions need to be made before this process can be carried out The questions to be answered are: t. ttow shall the similarity (or its converse, the dissimilarity) between tWo

individual quadrats be measured? 2. How shall the sinúlarity between two clusters be measured when at leas1 one and possibly both clusters have more than one member quadrat? Both these questions can be answered in numerous ways. First, to answer question l. Recall (see page 8) that, given n quadrats and s species, the data can be portrayed, conceptually, as n points (representing the quadrats) in an s-dimensional coordinate frame. Therefore, one possible way of measur. ing the di~i~arity between two quadrats is to use the Euclidean distance in this s-space, between the points representing the quadrats. The coorcti'. nates of the jth of these n points are (x 11 , x 21 , ... , xs)· This records the fact ~hat quadrat j co~tai~s. x 11 individuals* of species 1, x 21 individuals of spec1es 2, ... , and xsJ mdlVlduals of species s. The distance in s-dimensional space between the j th and k th · denoted by d(j, k ), is, therefore, pomts,

d(j,k)

=

V(x11 - xa)2 +(x21 - x2k)2 + ... +(x SJ. - x sk )2 s

L (x;J -

X;k)2.

i=l

This is simply an extension to a space of . . of Pythagoras's theorem whose t d. s ~imens1ons of the familiar resull wo- imens10n I vers10n · is · shown in Figure 21

··

ª

The Euc~dean distance between the . . ?f the ways m which the dissimil . pomts representmg them is only one 1s the measure we adopt ¡ d.anty of. two qu ªdrats nnght . be measured. JI techniq n iscussmg f f ues. 0 ther ways of measurin d. . ~ur . requently used clustering 26 ·· g 1ss1milanty are d.iscussed in Section . *lnstead of meas . ind.ivid · · unng the amou uals, lt is sometime nt of a species in areal "cover." s preferable to meas h a _quadrat by counting the numbef ol ure t e b1omass or, f or many plant spec1es, . t!Jc

NfAREST·NEIGHBOR CLUSTER! NG

15

::x::22 -

-

-

-

-

-

-

-

-

-

- -

-

-

-

-

2

rx,,

: r

!

N

______ I

(/)::X:

w

21

l.)

w

(L (/)

1---::x::l2 1

::X:

11

1

1 -....!

1

1

1

1

1 1 1 1

1 1 1 1

1

1

SPECIES

1

Figure .2.1. .The distance between points 1 and 2, Wl'th coord'mates (x 11 x ) and ( respect1vely, is d(l , 2). From Pythagoras's theorem, ' 21 x12,

) X22 ,

Next for question 2, on how to measure the dissirnilarity (now distance) between two clusters when each may contain more than one point (i.e., quadrat): the different ways in which this can be done are the distinguishing properties of the first three clustering methods described in the following.

2.2.

NEAREST-N EIGHBOR CLUSTERING

In nearest-neighbor clustering, also known as single-linkage clustering, ~he distance between two clusters is taken to be the distance separatmg the closest pair of points such that one is in one cluster and the other in the other (see Figure 2.2). E · f h d re applied toan XAMPLE. The following is a demonstration t e proce u . . 10 artificiaUy simple data matrix representing the amounts of 2 spec1es m

°

CLASSIFICATION BY CLUSTERI NG 16

-. . " " ........

/

I

/

I

\

•

\" ", ..._ .

\

\1

_.,.., /

-------

N

F

/

,..,..,.,.----.......

f

(

1

.

"-. \

\ 1

\

I

\

"-,

-- -----

/ / /

. f th distance between two clusters. The nearest-neighbor Figure 2.2.. Two poss1bledi~easures o d ~e farthest-neighbor distance F is the longest distance distance N is the shortest stance, an between a ~er of one cluster anda member of the other.

quadrats. With only two species, it is possible to 12lot the data points in a plane and .o.Ullille the successive clusters in the order in which they are created .(see Figure 2.3). Table 2.1 shows the data matrix (Data Matrix # 1) and, bel?w it, the distance matrix. In the distance matrix the numerical value of, for example, d(3 , 5), the distance between the third and the fifth poiñ.ts, 'lQP~S in the (3, 5)th cell, which is the cell where the third row crosses the fifth column. It is d(3 , 5) = 12.5. Since the distance matrix m7st ol;iously be symmetrical, only its l!PlleJ: right half is sh9wn. The smallest distance in the matrix (in bold face tyPe) is d(5 , 8) = 2.2. Therefore, the first cluster is formed by uniting quiidrats 5 and 8. we call the cluster (5, 8). ' The distance matrix · " . is now reconstructed as shown in Table 2 2 In thís . d new istance matrix distance t0 . · · [5, 8) are entered in every pomt from the newly formed cluster s . row 5 and column 5· 8 d ·h asterisks to show that d ' row an column 8 are filled wit qua rat 8 no long · Th distance from [5 8) to . · er exists as a separate entity. e ' any pomt f · . f ' _0! _mstance, to pomt 3, is the lesser 9 d(3 , 5) and d(3 , 8). Since d 35 ( ' ) - 12.5 and d(3, 8) = 14.4, the distance

f,.\BLE 2.t. pATA MATRIX #1. THE QUANTITIES OF 2 SPE A. UADRATS. CIES IN 10 Q r

-----

---

1

2

12

20 18

Quadrat

species 1 species 2

30

3

4

28 26

5

6

7

8

9

10

11

22

5

8 34

13 24

20 14

39 .

15

16 11

B rHEDISTANCEMATRIX (THE ROW AND COLUMN . QUADRAT NUMBERS)

34

LABELS ARE THE

Quadrat

1

I

1

2

3

4

5

o

14.4

16.5

25.0

o

18.0

11.3

15.8

o

27.0

2 3 4 5 6 7

o

í

6

7

8

9

JO

3.6

5.7 20.0 21.5 29.2 -23.6

13.6 40.3 25.5 31.0 27.9

19.4 8.1 19.2

o

17.9 4.0 14.4 12.7 2.2 23.3 12.2

27.3 24.8

12.514.9

6.1 9.2 15.1 19.l 12.7 11.2

o

27.6

o

o

8 9

o

7.8; 7.2 24.4 13.3 5.0 32.5

o

10

TABLE2.2. THE RECONSTRUCTED DISTANCE MATRIX AFTER THE FUSION OF QUADRATS 5 AND 8.

1

2 3 4 [5, 8]

6 7 8 9

10

1

2

3

4

[5, ~ ]

6

7

8

9

JO

o

14.4

16.5

25.0

17.9

5.7

6.1

* * * * * * *

27.3

19.4

24.8

8.1

13.6

19.2

40.3

7.8

o

11.3

15.8

3.6

20.0

9.2

o

27.0

12.5'

21.5

15.1

o

12.7

29.2

19.1

o

23.3

12.2 ,

o

11.2

o

o

25.5

5.0

31.0

24.4

27.9

13.3

*

o

* 32.5

o 17

CLASSIFICATION BY CLUSTER!~ 18

. 9

30

N (f)

20

l±:! l..)

w

o..

(f)

® SPECIES

1

12

w

l..)

z

¡:! (f)

o

4

®

o 9

3

7

6

4

'º

2

5

8

Figure 2.3. (a) The data points of Data Matrix #1 (see Table 2.1). The "contours" show th1 successive fusions with nearest-neighbor clustering except that, for clarity, the final contou enclosing ali 10 points is omitted. ( b) The corresponding dendrogram. Details are given ii Table 2.3. The height of each node in the dendrogram is the distance between the pair o clusters whose fusion corresponds with the node.

from point 3 to the cluster [5, 8] is

d(3 , [5,8]) = 12.5.

!~s values ª.PPe~rs in the (3, S)th cell of the reconstructed distance mató> d . ~e 8entnes m. the fifth row and column, which give the distance ' Dallfor all 1 5 or 8, are the lesser of d(j 5) a.nd d(j 8) e sm est entry in the ' ' . ·r boldface) in the (2, 5)th cell ;econstructed dist~nce matrix is 3.6 (shown. I · hus the next step m the clustering is the fusior

(~h[

*

,

NEA

RfST-NEIGHBOR CLUSTERING 19

Step Number

Fusionsª

1

5,8 [5, 8], 2 [5, 8, 2], 10 1, 6 [1, 6], 7 '[5, 8, 2, 10], 4 [5, 8, 2, 10, 4], [1, 6, 7] [5, 8, 2, 10, 4, 1, 6, 7], 3 [5, 8, 2, 10, 4, 1, 6, 7, 3], 9 Ali points are in one cluster

2 3 4 5

6 7 8 9

10

Nearest Pointsb

Distance Between Ousters

5, 8 2, 5 8, 10 1,6 1, 7 4, 10 2, 7 2, 3

2.2

9.2 11.3

3,9

13.6

3.6 5.0 5.7 6.1 7.8

ªUnbracketed· numbers refer to individual quadrats · The numb ers m · square b rackets are the quadr ats m a e1uster. hThe distance between these two points defines the distance (given in the last column) between the two clusters united at this step.

of the existing two-member cluster [5, 8] with quadrat 2 to form the three-member cluster [2, 5, 8] . . The distance matrix is reconstructed again, by adjusting the entries in row and column 2 and putting asterisks in row and column 5. And the procedure continues. The succession of steps is summarized in Table 2.3. At every step a new cluster is formed by the fusion of two previously existing clusters. (This includes one-member "clusters.") The final column in Table 2.3 shows the distance separating the clusters united at each step. The procedure is shown graphically in Figure 2.3a; Figure 2.3b shows the result of the clustering as a tree diagram or dendrogram. The horizontal links in the dendrogram are known as nodes and the vertical lines as i~ternodes. The height of each node above the base is set equal to the d1stance between the two clusters whose fusion the node reptesents. These

distances are shown on the vertical scale on the left. It should be noticed that the ordering of the points (quadrats) along the 0 ttom is to sorne extent optional. Thus if the labels 1 a~d ~ were mterchanged, or 5 and 8, there would be no change in the implications ~f the dendrogram. The dendrogram may be thought of as a hanging mobile

?

20

. bl t swivel freely where it is attached to the intern 0 w1 th each node ª e 0 de above it. . h d t divide the quadrats mto · · 1 e1asses, the r e are obviou Jf we w1s · e hich o ·t could be done, all of them arbitrary. . The arbitras.l 1 several. ways m w . . . b e the points exhibtt no natural clustenng. The contours.n. ness anses ecaus . . .. 10 Figure do not represen! abrupt d1scontmu1t1es any more than the lines on a relief map of hilly country represen! steps VlSlble on the contour230 ground. There are occasions, however, when an "unnatural" classification (some. times called a dissection) is needed for practical purposes. For example classification is required as a preliminary to vegetation . mapping even if' 1·n, fact, the plant communities on the ground merge mto one another with broad, indistinct ecotones. The lines separating communities on such ama are analogous to contour Jines on a relief map, and are no less useful. p How to distinguish clusters, given a dendrogram like that in Figure 2.3b is a matter of choice. Sorne common ways of classifyin,g are as follows : ' l. T~e number of cl~sses to be recognized is decided beforehand. Thus

~uppose it had been dec1ded to classify the 10 points in . Data Matrix #1

mto 4 classes. The memberships of the classes are found b d . line across the dendrogram at a level h Y rawmg a horizontal . . . nodes It will b w ere 1t cuts four rnter·

(4, 10."2, 5, 8].

e seen that the resultan! classes are [9], [3], (7, 1, 6], and

2. The minimum distance that mu recognized as distinct may be d .d dst separate clusters for them to be distance of 10 units were chos ~c1 hie. beforehand. Suppose a minimum b . en m t s example Th hr e recogruzed, namely (9) ¡ d · en t ee classes would , ' 31 ' an [7, 1, 6, 4, 10, 2, 5, 8]. lf the internodes of a de d with short ones at the botto: r:;~alm are of conspicuously different lengths the polints fall naturally into clust~ng onehs at the top, then it follows that examp e. rs w1t out arbitrariness. Consider an

~XAMPLE. Data Matrix # m 10 quadrats Th 2 (see Table 2.4) sh the dendrogr . e data points are show ows the amounts of 3 speciel am resulting f n graphically · F' d The separation . t rom nearest-neighb . in igure 2.4a, aJI obvious in both three classes, namely (1 3or clustenng is in Figure z.4b· iagrams, and no for i' ' '2], [4, 5, 6, 7] and (8 9 10] Jl is needed g procedure ' ' ' to

~·º

•llliill~~~~~~~~~~~~=m=a~c~l~u~st~erin

f;\BLE 2.4. DATA MATRIX #2. THE QUAN QUADRATS.

TlTIES OF 3 SPECIES IN

10 1

2

3

4

5

6

7

8

9

10

24

27

24

30

29

8 20

20

14 22

36 14

3

1

2

36 9

11

10 18 14

14

32

13

12

8

8

41 12 6

Quadrat

Species 1 Species 2 Species 3

40

¡ ' - - ....... I IA \ 1

30

/

/

N (f)

w

20

/ I 1

u w

\

"

........._

10

• __

3+ • 2 ) \

, ___

./ /

' \\

l

I I

/

/

5

\

o...

\ .........

*7 .6

4-=*

1

(f)

-----

1

/

.,......,..

/-----

ª*

I

_,,/

!

9*__

1

1 \

@

.....

,

*

10 /

\ 1

J

I

........... /

40

30

20

10

'- .__

........

SPECIES

1

20

15

w u

z

10

(f)

o 5

F·

in~;·4·

®

o

1

3

2

4

s

6

1

e

9

10

(a) The data points of Data Matrix #2 (see Table 2.4). The amount of species J

detu1ro :drat ts shown by the number of "spokes" attached to each point. ( b) The gr

yielded by nearest-neighbor clustering. 21

22

bowever, is the fact that, Witq . them. What makes the task e_asy, an be displayed in visualizab¡ recogruze f da ta pom ts e e three species, the swarm o . used in figure 2.4b of represent ly on b the dev1ce " · three-dimensional space, or ~ d. ate by the number of spokes,, ing the magnitude of the .third coodr. mensional coordinate frame. When · a two- 1m radiating from the po1Ilts 1Il . 1 tly when the swarm of data Püints · r equ1va en ' . . there are many spec1es 0 ' ¡¡z·ation becomes 1mposs1ble anct a . . · al space, v1sua occupies a many-d1mens10n d a dendrogram. formal procedure is needed to pro uce . . . t often used in practice because it is Nearest-neighbor clu~t~nn~ ishnot ndency for early formed clusters to . · Cha1Il1Ilg is t e e prone to ehammg. . f . gle points one after another in . w b the accret1on to them 0 Slfl gro .Y abe seen in Figure 2.3 where the first, tigh¡ success1on. The eu ect can h 4 icks up the points 2, then 10, and t en , one ata [5 81 t two-membere1us er , P . time, before uniting with another cluster containing more than one pomt.' H a classification is intended to define classes for a purpose such as ve~e~ation mapping, then a method that frequently leads to exaggerated chammg is defective. It results in clusters of very disparate sizes. Thus, as shown before, when the dendrogram in Figure 2.3b is used to define three clusters, two ol them are "singletons" and ali the remaining eight points are lumped together in the third cluster. For vegetation mapping, or for descriptive classifications generally, one usually prefers a method that yiefds clusters ol roughly equal sizes. However, chaining may reveal a true relationship among the quadrats. Therefore, if what is wanted is a dendrogram that is in sorne sense "true to life," a clustering method that reveals natural chaining, if it exists, certainly has an advantage. l~deed~ a dendrogram is more than merely a diagram on which a cl~ssificatton can be based. It is a portrayal in two dimensions of a swarm of pomts occupying many dimen s1ons · (as many as there are species). A ~tendrogr.aghm need not be used to yield a classification. I t can be studied in 1 s own n tas a representation f th · · . among a swarm of 0 d e mterrelat10nships · 1 a a pomts. Sorne workers find a d d two-dimensional 0 d" t' en rogram more informative than a r ma ion.

2.3. FARTHEST-NEIG HBOR CLUSTERING In f arthest-neighbor clustering (al kn the distance between two so . own as complete-linkage clustering) 1 b'tween a point in one clustc usters is defined as the maximum distance er and a po · t · m m the other (see Figure 2.2).

f,i\l~rHES

T-NEIGHBOR CLUSTERING

23

LE z.5. STEPS IN THE FARTHEST-NEIGHBOR f,.\ B ~KATRIX #l. CLUSTERING OF

-----

DATA !V~----------:::-------SteP "Farthest" n·1stance between Fusions Nurnber Pointsª Ousters

--~~~~~~~~::::__

ºThese are the quadrats whose distance apart, shown in the last column, defines the distance between the two clusters.

To apply the method to Data Matrix #1 , we again start with the distance matrix in Table 2.1 and unite the two closest clusters (at this stage, individual points) which are, of course, points 5 and 8 as before. But in compiling each of the sequence of reconstructed distance matrices, we use the greater rather than the lesser of two distances. For example, the distance between cluster [5, 8] and point 2 is defined* as

d(2, [5, 8])

=

max[ d(2 , 5), d(2, 8)]

=

max(3.6, 4.0) = 4.0,

for farthest-neighbor clustering, whereas it was defined as d(2, [5, 8]) =

min[ d(2, 5), d(2, 8)] = min(3.6, 4.0) = 3.6

for nearest-neighbor clustering. The two clustering procedures (neares~

neighbor and farthest-neighbor) are the same in all respects ex~ept for this changed definition of intercluster distance. The result of clustenng the data in Data Matrix # 1 by farthest-neighbor clustering is. show.n in Table 2.5 ~ Figure 2.5. The figure should be compared w1th Figure 2.3. The

difprence ·-a~,miesis conspicuous. · . ..:i

· 1 " "t" th t i·t tends to y1eld e usters t-ne1ghbor clustering has the men a . fairly equal in size. This is because the farthest-neighbor d1stance lienotes the max.imum of x and y, and analogously for min(x, y).

CLASSI FIC

A TION BY CLUSTERIN" "

24

30

N (f)

20

LLl

G LLl o... (f)

10

®

40~ 30

w

u z

20

(f)

o 10

®

o

7

1

6

2

5

8

4

10

3

9

· # 1. The dendrogram : Figure 2.5. Farthest-neighbor clustering applied to Data M atnx based on Table 2.5.

· · clusters is · o f ten 1arge m · .sp~·1e of the. fa'el between two populous ne1ghbonng that, as a whole, they may be very similar. Consequently, 1t is more lik. that an isolated unattached point at a moderate distance will be united wit one of them than that they will unite with each other. Hence an anomaloV quadrat may become a cluster member quite early in the clustering proce~ and the fact that it is anomalous will be overlooked. When true natural clusters exist, the outcomes of nearest-neighbor 0, farthest-neighbor clustering are usually very similar. Farthest-neighbor clO· tering applied to Data Matrix # 2 gives results indistinguishable frotn tbº5 in Figure 2.4.

ª

. 1rllº'º

25

CtUSTERING

cEr•

2.4.

CENTROIO CLUSTERING

.d c/ustering is one of several methods designed t O St rik e a h appy · di rn between the extremes of nearest-neighbor clustering on the rne u . hb one . halld and farthest-neig or e1ustenng on the other. Nearest and farthestneighbor rnethod~ hav~ the defe~t that_ they are in~uenced at every step by the chance l~cat10ns m. th~ ~-dunen~10nal coordmate frame of only two . dividual pomts. That IS, It IS the distance between two points only that ind ides the outcome of each step. Centroid clustering overcomes this ec .. drawback by ~sing a defim~10n ~f intercluster distance that takes account of the locations of all the pomts m each cluster. To repeat, there are many ways in which this might be done, and centroid clustering, described in this section, is only one of the ways. A more general discussion of the various roethods and how they are interrelated is given in Section 2.7. In centroid clustering the distance between two clusters is taken to be the distance between their centroids. The centroid of a cluster is the point representing the "average quadrat" of the cluster; that is, it is a hypothetical quadrat containing the average quantity of each species, where the averaging is over all cluster members. Hence if there are m cluster members and sspecies, and if we write ( c1 , c2 , ... , es) for the coordinates of the centroid, (entrOl

then

c1

=

1 - (x 11 + x 12 + · · · +x1m) m

1

m

L X1J' m J=l

= -

and, in general,

For example, the centroid of the three-point cluster [2, 5, 8) in Data Matrix #1 has coordinates [(20 + 22 + 20)/3, (18 + 15 + 14)/3) = (20.67, 15.67). .The clustering procedure is carried out in the same way as f~r n~arest netghbor and farthest-neighbor clustering. Thus each step consists I~ the fuston of the two nearest clusters (as before, a cluster may have only a ~mgle lllQQiber); one finds which two clusters are nearest by searching the mterr distance matrix for its smallest element. As in the methods alre~dy ·......i • d f h fusion by entermg ~, the d1stance matrix is reconstructe a ter eac

CLASSIFICATION BY CLUSTER

I~~

26

. . h wly formed cluster to every other cluster. B in 1t the d1stances from l e ne ·d Th · Ut now the distances are those separating cluster centrdo1 s .. b de wfay m Which · constructed 1s escn e a ter we h the successive distance matnces are ave looked at sorne results. . . The dendrogram produced by applying the centro1d clusten~g procedure to Data Matrix # 1 is shown in Figure 2.6. It should be notlced that th dendrogram is intermediate between that yielded b)' nearest-neighbor cluse tering (Figure 2.3b) and that yielded by farthest-neighbor clustering (Figure 2.5b ). Thus in centroid clustering, as in farthest-neighbor clustering, points 3 and 9 unite to forro the cluster [3, 9], whereas in nearest-neighbor clustering these two points are chained, one after the other, to the cluster formed by the remaining eight points. But centroid clustering, like nearest. neighbor. clustering, c~ains point and then point 4 to cluster [2, 5, 8], whereas m farthest-ne1ghbor clustenng, cluster [10, 4] is formed first and 18· only later united with [2, 5, 8]. Ta~le 2.6, which resembles Tables 2.3 and 2.5, shows the ste s . centro1d clustering of Data Matrix # 1· Ob serve t h at mstead . . . . of ,ap e m1 the g¡vmg the dtstance between the two clusters united at each o uw column gIVlng the square of this a· t Th step, there IS a Is anee. e reason for this, together with

I?

20

15

LU

u

z

~

10

Cf)

i5

11ie deftdro o l 6

•

7

'

•

&ram PrOduced by

'

~~·

s

l

a

io

4

Ylng centroid clust .

3 9

enng to Data Matrix #l. Jt

r

eEN TR

OID CLUSTERING

27

TABLE 2.6. STEPS IN THE CENTROID CLU :MATRIX #l. STERINGOF DATA Step Number 1 2

3 4 5 6 7 8 9 10

Square of lntercluster Distance

Fusions

5, 8 2, [5, 8] 1,6 10, [2, 5, 8] 7, [l, 6] 4, [2, 5, 8, 10] 3,9 [7, 1, 6], [2, 5, 8, 10, 4] [7, 1, 6, 2, 5, 8, 10, 4], [3, 9] Ali points are in one cluster

5.00 13.25 32.00 43 .56 73.00 162.50 185.00 326.25 456.83

the derivation of these values, is explained in the following. Distances rather than distances-squared have been used to fix the heights of the nodes in the dendrogram so that the three dendrograms so far obtained from Data Matrix # 1 may be easily compared. We now consider the computations required in centroid clustering. Determining the distance between two cluster centroids is perfectly straightforward. Suppose there are s species, so that the distance required is that between two points in s-dimensional space or s-space. Let the two centroids be labeled C' and C"; their coordinates, obtained from Equation (2.1), are (e{, cí, ... , e;) and (c-í', e'{, ... , e;'), respectively. Then the square

e" is 2 ( )2 ( / ")2 d2 (C',C") = (cí - cí' ) + cí - e'{ + · · · + es - es ·

of the distance between

e/

and

As an example, recall Data Matrix # 1 (see Table 2.1) and !et. C' and C ", be the clusters [7, 1, 6] and [2, 5, 8, 10, 4], respectively. The coordmates of C are

(e;, cí) = [t(13 + 12 + 8), 1(24 + 30 + 34)] = (11, 29.33); analogous calculations show that the coordinates of C" are

( ci',

en=

(17.8, 12.6).

1

CLASSIFICATION BY CLUSTEl{i~C

28

of the distance between Hence the square

thern is

")2 +(e'2 -

d2(C',C")={c{-C1

e~

)2 -

326 24 · · ..

.

d. error in the final d1g1t) Wlth the This corresponds (except for a ro~n mtgp #8 in Table 2.6. d' oppos1te s e . squared intercluster ist~nc~ t straightforward way of findmg the . . nnc1ple the mos . . . Although thi s 1s, m P ' 1us ter centroids, it 1s mconveruent in two e . between square of the distance . b e the coordinates of the original . . . nce anses ecaus . pract1ce. The mconveme . h lations every time. lt 1s more efficient . h t be used m t e ca1cu t of each successive distance matrix data pomts ave 0 . f lly to denve the e1emen s computa 10na d rs The following is a demonstration of from the elements of its pre ecesso . . D Matrix # 1 after which the the first few steps of the process applied to ata ' generalized version of the equation is given. . . . 2 Each element in the initial distance matnx, m~re prec~sely a d1sta~c~ matrix, is the square of the corresponding element m the d1stance matnx m Table 2.1. This initial matrix is at the top in Table 2.7. lts smallest element (in boldface) is d 2(5, 8) = 5.00. Therefore, once again, step # 1 is the formation of cluster (5, 8]. Now we do the first reconstruction of the distance 2 matrix. As before, since point 8 no longer exists as a separate entity, the elements of row and column 8 are replaced with asterisks. The fifth row and column, now labeled [5, 8], contain the distances-squared from the centroid of [5, 8] to every other point. The required distance-squared from the jth point is (as proved later) d

Thus, when j

2

(J, [5, 8])

=

=

td

2

(j, 5) + td 2 (j, 8) - ~d 2 (5 , 8) .

(2.2)

1,

d2(1, [5, 8]) = td2(1, 5) + !d2(1, 8) - !d2(5 , 8)

= 3~5 + 3~0 Likewise, when j

=

-

d2(2, [5, 8]) = 1f- + and so on.

24 =

321 •25 •

2,

lf - l = 13.25 ,

. These values appear in the row distance2 matrix in Table 2 7 All and column labeled [5, 8] in the second It is seen that the sman . t. 1 othe~ elements remain as they were. es e ement 1 th [5, 8]. Hence the second t . n e new matrix is 13.25 in row 2, s ep m the el · · d the three-member el ustenng is the fusion of 2 an uster [2, s, 8].

27 THE FIRST THREE D!STANCE' MATRICES CONSTRUCTED TABLE · éENTROID CLUSTERING OF DATA MATRIX #1.' pVRJNG . . ' .. tial matrix. Each element is a d1stance between two points. 2 3 4 5 6 (ll Tbeim 1 7 8

o

1

2

208

o

3

272

626

325

32

37

320

9 745

128

250 730

13 157

400

85

16

617

221

850

229 365 162 125

208 162 5 544

185 1625 650 961

377 65 369 61 52 593

149

776

178

o

761

25 1058

o

o

4

5

464

o

557

6 7 8 9

o

o

10

o

o 1O . after the fusion of points 5 and 8. (2) The second matnx, 4 [5 8] 6 1

2

3

'

7

8

131.25 13.25

1 2 34

* * * * * *

181.25

• 190.25 •

[5,8]

549.25

9

154.25

6 7 8

10

704.25 37.25

*

*

*

9

10

9 JO . . af ter the fus10n of2and[5 , 8]. 5

(3) The third matnx, 1 1

[2,5,8]

[2 ,5,8]

280.56

3

4

* 160.56 207.22

* * *

3 4

8

6

7

486.56

128.22

*

* * * *

*

*

* *

5 6 7

672.22 43.56

*

*

*

8 9

1O ªln mctrices 2 and 3 only elemen elements are shown as dots.

.

ts diffenng

· earlier fro111 those Ul

Unchanged ·ces are shoWO.

01 atn

29

CLASSIFICA TION BY CLUSTERlt-.Jc

30

anew. The Th e distance 2 matnx· must now be reconstructed . ¡ elements f of d 1 mn [5 8] are replaced with astensks. The new e ements or the row an row co uand co'1 umn, now labeled [2 ' 5, 8], are found from the for111u1a ·second

d2(J, [2, 5, 8]) = td2(J, 2) + ~d2(j, (5, 8]) - id2(2 , [5 , 8]). (2.3) When j

1,

=

+ t X 321.25 -

d2(1, [2, 5, 8]) = 2~8

~X 13.25

=

280.56 ;

i

=

160.56 ;

when j = 3, d2(3, [2, 5, 8]) = 1 ~ 8 +

tX

181.25 -

X

13.25

and so on. These values appear in the row and column labeled [2, 5, 8] in the third matrix in Table 2.7. Equations (2.2) and (2.3) are particular examples of a general equation which we now derive. It is the equation for the distance 2 from any point (or cluster centroid) P to the centroid Q of an ( m + n )-member cluster created by the fusion of two clusters [M1 , M 2 , ... , Mm] and [N1 , N , .• • , Nn] with m 2 and n members, respectively. The centroids of these clusters are M and N. The set-up is shown in Figure 2.7. Let MP =a, NP = b, and MN =c. Let the angles MQj =a and NQP =

f3

with a

+ f3

=

180º.

2

The distance required is x , where PQ = x. Recall that x2 is needed as ~n eled~ent o; the r~w or column headed [M1 , M 2 , ... , Mm , N , N , ... , N ] 1 2 m a istance matnx undergom· g · · 11 . reconstruct10n as one of the steps m a kn · h clustenng operation. The values of a 2 b2 and c2 1 · . ' ' are own smce t ey are 2 e ements m the d1stance matrix constructed at an earlier step. M

~

\

\ \ \

\

\

\

\

\ \ \

~igure2 72·7·

\

\

Illustration of the derivation of Equa· . ). Q is the centroid of clusters M and N; il ts assumed that to N th n > m and, therefore, Q is closer an to M. See text for further details.

~ion (

p

CENTROID CLUSTERING

31

As a preliminary to finding x 2 it is necessary t fi d

. M d N o n MQ (or NQ = e MQ). Smee an are, · respectively ' the cen tro1.ds of m-member and n~merober clusters an d Q is at their center of grav1·ty, it . is . clear that

NQ=~

and

MQ= m:n

+n·

m

Now, from Apollonius's theorem, *

(PQ)2 +(MQ)2 - 2(PQ)(MQ)cosa

=

(MP)2

and

2 2 (PQ) + (NQ) - 2(PQ)(NQ)cos{3 = (NP)2. Equivalently, x2

+

x2 +

n 2c 2 (m + n)2

2xnc

- --cosa= m+n

m 2c 2

·

(m+n)

2

-

a2 ·

(2.4)

--cosf3 = b2 .

(2.5)

'

2xmc

m+n

Multiply (2.5) by n/m, and do the substitution cosf3 = cos(180º - a)=

-cosa. Then 2

nx 2 nmc 2 2xnc b n -+ +--cosa=-. m (m + n)2 m +n m

Add (2.4) and (2.6) to eliminate cosa. The sum is b2 n

2

x 2 ( 1 + !!._ ) + ne 2 m (m + n)

.(

m + n) = a 2 + -¡;;

whence 2

m + n)

x ( -;;-

nc2

- a2m

+ -m+n

+ b2n m

.

*A proof of Apollonius's theorem is given as an appendix to this chapter.

(2.6).

CLASSIFICATION BY CLlJ STERINc

32

Multiply through by m/( m

x

2

+ n)

to obtain

mnc2

+

(m + n)

2

a2m + b2n m +n

or x2

m 2 =m+n --a

n

+n m+

b2 -

(m

mn

2

+ n)2 e .

(2.7)

?f

It is seen that (2. 7) is the required general form (~.2) ~nd (2.3). It is now apparent that when centroid clustenng IS bemg .done, the most convenient measure of the dissimilarity between two clusters IS the square of the distance separating their centroids rather than the distance itself (but see page 46). The terms x 2 , a 2 , b 2 , and c 2 in Equation (2.7) are all squared distances, and use of this equation makes it easy to construct each distance 2 matrix from its predecessor as clustering proceeds from step to step. There is no simple relation among x, a, b, and e when they are not squared. Centroid clustering has been described with greater mathematical rigor by Orloci (1978), who gives an example. He uses the name "average linkage clustering" for the method. Average linkage clustering is an inclusive term for severa! similar clustering procedures of which centroid clustering is one. The interrelationships among the several methods are brie:fly described in Section 2. 7.

2·5· MINIMUM VARIANCE CLUSTERING

T~s i~ the dlast ~lu~te:rng method that is fully described in this book Before gomg mto etails, It is necessary to defin h . . . and to give two methods f . .e t e term withzn-cluster dispersion , or computmg It Th fi f obvious one implied by th d fi . . · e rst o these methods is the ' e e mtion The s d · . way of obtaining the identical econ ' ~onobVIous method 1s a 1 First, the definition. the w·thinr~su lt y a c.omputationally simpler route. · I -e uster d · defined as the sum of th ispersion of a cluster of points is h e squares of the d. t t e centroid of the cluster. is anees between every point and Next, we illustrate the comput 1· a 1ons.

b

E~~E. Consider Data Matrix q antities of each of t . . # 3 shown in Table 2 8 It li t the wo spec1es in fi . . ss d ve quadrats· ·t ' I can be represente

flp....- - - - - - - - - - - - - - - - - - - · .

MtNtMUM VARIANCE CLUSTERING

33

TABLE 2.8. DATA MATRIX #3. THE QUANTITIES OF TWO SPECIES IN f!VE QUADRATS. Quadrat

1

2

3

4

5

Species 1 Species 2

11

36 30

16 20

8 12

28

14

32

graphically by a swarm of five points (representing the quadrats) in a space of two dimensions, that is, a two-dimensional coordinate frame with axes representing the species. Suppose the five points have been combined into a single cluster, as they will have been when the last step in a clustering process is complete. The centroid of the five-point cluster has coordinates

11 + 36 + 16 + 8 + 28 14 + 30 + 20 + 12 + 32) 5 ' 5

=

(

=

(19.8, 21.6).

( C1' Cz )

Now write Q[l, 2, 3, 4, 5] for the within-cluster dispersion of the cluster of points 1, 2, 3, 4 and 5; let d 2 (j, C) be the square of the distance from the centroid to the jth point. Then, from the definition, 5

Q[l,2,3,4,5] =

L

d 1 (J,C)

j=l

=

d1(1,C) + d1(2,C) + ... +d2(5,C)

= [ (11

- 19.8)2 + (14 - 21.6)2)

+ [ (36 - 19.8)2 + (30 - 21.6)2] + ... + [ (28 - 19.8)2 + (32 - 21.6)2] =

892.

. . same result is to use the equation A simpler way of obtammg the

2 3 4 5]=_!_ ~d Q[l ' , ' , n J
2

(J,k).

CLASSIFICATION BY CLUSTE

34

~IN~

(For a proof, see Pielou, 1977, p. 320.) Here n = 5, the number of point . the cluster; d 2 (j, k) is the squared distance . between ki points h j . and k·' sthe summation is over every possible pair of pomts, ta ng eac parr once. l'hi is the reason for putting the condition j < k below the summation sign s ensures that, for example, d 2 (1, 2) shall be a component of the sum, but · lt d 2 (2, 1) which is merely a repetition of d 2 (1, 2). There are n ( n - 1)/2 ,,no¡ distinct pairs of points, and hence 10 distinct components of the ¡ !fj d 2 (i, k ). Thus ºl'JJJ

~

Q[1,2,3,4,5]

2 Hd 2 (1,2) + d 1 (1,3) + ... +d (4 , s)}

=

with 10 components between the braces. Now d2(1, 2) = (11 - 36)2 + (14 - 30)2

881,

=

d2(1, 3) = (11 - 16)2 + (14 - 20)2 = 61, .............. . . . . . . . . . . . . . . . . . d2(4, 5)

=

(8 - 28)2 + (12 - 32)2 = 800.

Hence Q[l, 2, 3, 4, 5] =·!X 4460 = 892 '

'

as before. . For a two-member clust . d1spersion is er, say of pomts a and b, the within-cluster

.

Q[a,b)=~d2(a,b);

that is, it is half th F e square of th d. d' or. a one-member cluster e istance between them ispers1on is zero· h . ' say point a b . . We are now . ' t at is, Q[a] = O. y Itself, the within-cluster m a P .. each step th os1tion to descn·b . . e nunun . ' ose two 1 mcrease in withi le usters are to be un·t d um variance clustering. At matters is not s· n-c uster d.ispersion 1t ·1 e. whos e fus1on · yields the Ieast 1 formed cluster ~upt yththe value of the wi1tshinrrnplortant to notice that what ' e amount by which thi -e ust er a·ispersion of a newlY s value exceeds the sum o¡

MINIMUM VARIANCE CLUSTERING

35

itbin-cluster dispersions of the two separate 1 w ew cluster e usters whose fusion formed the n · for example, consider the two clusters [a b ] . ' , e and [d e] h · · hi uster dispers10ns of Q[a, b, e] and Q[d e] r . ' ' avmg w1t nCl ' , espectively s h united to form a new cluster whose withi 1 . up~ose t ey are · · . n-c uster d1spersion · . h is Q[ a' b, e, d'e ]. Th e mcrease m w1thin-cluster d1.sp ers1on t at thi f · h s us10n as brought about, denoted by q([a, b, e], [d, e]), is q([a,b,c], [d,e])

=

Q[a,b,c,d,e] _ Q[a,b,c] _ Q[d,e].

It is values such as q([a, b, e], [d, e]) that are the criteria for deciding which two clusters should be united at each step of the clustering process. At every step the clusters to be united are always the two for which the value of q is least. As with the clustering procedures described in earlier sections the mínimum variance method also requires the construction of a sequen~e of "criterion" matrices. Then the position in the matrix of the numerically smallest element indicates which clusters are to be united next. The matrices obtained when minimum variance clustering is applied to Data Matrix # 3 are shown in Table 2.9. In addition to the sequence of criterion matrices Q 1, Q2 , Q3 , and Q4 , and printed above them, is the matrix D 2 . It is a distance 2 matrix whose elements are the squares of the distances separating every pair of points. The elements of D 2 are used to construct the successive criterion matrices. We now carry out minimum variance clustering on the data in Data Matrix # 3. Q , the first criterion matrix, has as its elements the within-clus1 ter dispersion of every possible two-member cluster that could be formed by uniting two individual points. It has already been shown that for a twomember cluster consisting of points j and k, say, the within-cluster disper-

sion is

· 1 e half the values of the Therefore, the elements of Q 1 are slffip Y on 2

. corresponding elements of D • 4) · Q is 6 5 (shown in boldface) m cell (1, · Th 11 1 e sma est e ement m i · t et the next Therefore, the first cluster to be formed is [1, 4]. We now co~s ru . t 4 no . . d column 4 smce pom cnterion matrix Q • I t has asterisks m row an '

2

TABLE 2.9. sucCESSIVE MATRICES CONSTRUCTED IN THE MINIM: VARIANCE CLUSTER!NG OF DATA MATRIX # 3. 111¡ 2

The Distance Matrix 1

D2=

o

o

5

61

13

500

1108

o

128

881

2 3

4

3

2

1

o

4

5 Tbe Sequence of Criterion Matrices 1 1

o

o 2 440.5

o

2

Q¡=

3

4

5

30.5

6.5

306.5 34

250

554

o

64

3 4

144 400

o

o

5 [1 , 4] 2

Q2=

[J , 4]

2

o

660.83

o

4

3 60,83

o

5

[J , 4]

o

[2 , 5)

Q3=

[2,5] 830.25 .

3

4

60.83

[2 ,5 ]

3 4

5

36

34 144

*

o [J , 4, 3]

[2, 5]

o

790.67

o

3

5

* * * o

251.33

5

Q4=

468.83

o

o

3 4

[1 , 4, 3]

5

* * *

250

o

3 4

[1 , 4]

613 68 288 800

5

4

* * o

* * * * o

* * * o

*

*

o

MINIM UM VARIANCE CLUSTERING

37

Jonger exists as adseparate entity. In the J.t h cell of h (now the row an column for cluster [l ' 4]) is . en tered t e first row and col umn

q(J, [1 , 4])

=

Q[J, 1, 4] - Q[ J·1

- Q[l,4].

These terms are evaluated for every 1· no t equal to 1 3 and 5. Recall that for any j value or 4, that is, for j

=

2,

ºu, 1, 4J H d 2u' 1) + d 2( ],. 4) + d 2(1,4)}, =

Q[J] =O , and Q[l,4] = !d 2 (1,4).

. turn and t ki 1h Therefore, . . letting j take the values 2' 3, and 5 m 2 reqmred d1stances from the matrix 02, it is found that ' ng e

ª

q(2, [1,4])

=

Q[l,2 ,4] - Q[2] _ Q[l, 4]

=

H d2(1, 2) + d1(1, 4) + d2(2, 4)} -

=

t{881+13+1108}-0-lf

=

660.83.

=

Q[l, 3, 4] - Q[3] - Q[l, 4]

=

67 .33 -

=

60.83

=

468.83.

o -1d2(1, 4)

Likewise,

q(3, [l, 4])

and q(S,

[l, 4])

o-

6.5

It will be seen that the values just computed appear in the first row of Q1 (they would also appear in the first column, of course, if the whole matrix were shown, but it is unnecessary to print the matrix in full because it is symmetric). The remaining elements in Q are the same as in Q1 · . 5 15 The smallest element in Q i; 34 the value of q[2, 5]. Hence [Z, 1 the

~

cond cluster to be formed.

2

'

1

CLASSIFICATION BY CLUSTER.IN<; 38

th terms o f Q 3 . Since poin 1 t 5dis .no . ·1ar process, we calculate f herow an d co lumn are rep ace With By a sum the elements in the ti t e the row and column for the longer separate, d row and column becom 1 sters [l 4] and [2, 5] now . ks The secon member e u ' . hin astens . 5] so that the two two. The increase in w1t -clusnew cluster [2, d 'econd positions in the matruct d to make a four-member occupy first an s esult if they were um e ter dispersion that would r cluster is

[ ] 5 2 5] Q[l 4] Q ' ([1,4], [2,5]) = Q[l,2,4, '

q

=

-!{ d2(1,2) + d2(1,4) + d2(1 ' 5) + d2(2, 4) +d2(2, 5) + d2(4, 5) } - 2ld2(1 ' 4)

- ld2(2 , 5) 2

=

830 .25.

The procedure far compu ting the elements o f the Q matrices should now be The clear.smallest element m . Q3 is . 60. 83 in cell ([l, 4], 3). Therefore, the next 4 3 fusion creates the cluster [l, , ~· . d d by the final fusion beThe gain in within-cluster d1spers1on pro uce . 1 1 t in Q It is tween [1, 4, 3] and [2, 5] is 790.67, the only numenca e emen 4· found from the relation q([l,3,4], [2,5]) = Q[l,2,3,4,5] - Q[l,3,4] - Q[2 , 5] . After the final fusion, when all five points have been umte · d m · t o one cluster, the within-cluster dispersion is

º[

1, 2, 3, 4,

51 = ~ ¿ d 2 u, k)

=

s92,

j
as already derived at the beginning of this section. To summarize, let us consider the step-by-step increases that took place in the total within-cluster dispersion (hereafter called the total dispersion) as clustering proceeded. At the start, there were five separa te points (or one-member clusters), ali with zero within-cluster dispersion. Therefore, the total dispersion was zero. Formation of cluster [l, 4] raised the total disper· sion by the amount of its own within-cluster dispersion, namely, 6.5. Likewise, formation of cluster [2, 5] added 34 to this total, bringing it to 40.5. Next, formation of cluster [l, 3, 4] brought an increment of 60.83 to tbe

I MINI MU

M VARIANCE CLUSTERING 39

al (recall that the elements in the criterion matrices

h .

.

tot . . . . ersion that the d1fferent poss1ble fus1ons would b are . t be mcreases m d1sp . . rmg a out, not the d[ .thin-cluster dispers1ons themselves). The final fusion of [l W1 · numbers, ' 3' 41 an 2• S] brought an increment o f 790 .67 . Thus m

6.5

+

34

+ 60.83 + 790.67

892.

=

In words: the total disp~rsion is the sum of the smallest elements in the successive criterion matnces, the Qs. The clustering strategy consists in augmenting the total by the smallest amount possible at each step. Figure 2.8 shows the data points of Data Matrix # 3 and the dendrogram we have just computed. The height of each node is the within-cluster

•5

30

•2

N (,()

20

•3

w

u w

.,

Q_

(,()

•4

'º 30

20

10

40

SPECIES

z o

892

(/)

cr:

w

Q_ (f)

75

o cr:

w

1-

50

(f)

::) _J l.) 1

~

25

I

1-

~ o

'

4

3

2

5

. # 3 (see Table 2. . f D ta Matnx Figure 2.8. The data pomts .º : the data. by nunimum variance clustenng 0

8) and the dendrograro

yielded

CLASSIFICATION BY CLU STER!~~

40

1673 . 8

765. 2

z o (j) a::

174

w

o...

(j)

i5

a:: w

f-

(j)

::::> _J

u

92 . 5

1

z J: t:

64 .7

~

30.5 16 11 .3

2.5 3

9

7

1

6

2

5

8

4

10

to Data Figure 2.9. The den drogram produced by applying mínimum . .variance clustering h d Matrix #1. The scale on the Ieft shows the within-cluster d1spers1on of eac no e.

dispersion of the newly formed cluster that the node represents. Thus the heights are Q[l, 4] = 6.5, Q[2, 5] = 34, Q[l, 4, 3] = 67 .33, and Q[l , 2, 3, 4, 5] = 892. Figure 2.9 shows the results of applying minimum variance clustering to Data Matrix #l. The steps in the computations are not shown here since nine 10 X 10 matrices would be required. The exact value of the within· cluster dispersion of each newly formed cluster is shown on the scale to the left of the dendrogram to serve as a check for readers who wish to carry out minimum variance clustering on these data for themselves. It is interesting to compare this dendrogram with those in Figures 2.3, 2.5, and 2.6 which ali relate to the same data.

2.6.

DISSIMILARITY MEASURES AND DISTANCES

was remarked in Section 1 of this chapter that the Euclidean distance bItetween the point · ·¡ s representmg two quadrats is only one of m any possib e

01ss1MILARITY MEASURES ANO DISTANCES 41

ways of defining of the two quadrat W . . the . ·1 dissimilarity . istance as a diss1rm anty measure in See1ions 2 3 4s. e used . Euclide an . d ditferent clustenng procedures were describ d ' . ' ' and 5 m which four . ·1 anty · measures a de ·hThis other poss1ºbl e a·iss1rm . adseet.ion describes sorne n t eu vantages. vantages and disadMetric and Nonmetric Measures first it must .be noticed . . . that. dissimilarity m easures are of t ki and nonmetrzc; the d1stmct10n between the . . wo nds, metnc · m is very important A metnc measure .or, more briefly ' a metne . h as th e geom t ·. . . of a distance. In particular, it is subiect to th . . ~ne properties /e mequalzty . . J e t rzang · Thi is ·the common-sense axiom which states that the 1ength of any one axzo~. s1de of s tnangle must be less than the sum of the length f h . ª . s o t e other two s1des. Suppose we wnte d(A, B) for the length of side AB of tnang · 1e ABC and · li ty may ' be analogously for the other two sides. Then the triangle mequa . wntten

d(A, B) ~ d(A, e)+ d(B, e). The equality sign applies when A, B, and C are in a straight line or, equivalently, when triangle ABC has been completely flattened to a straight

line. The triangle inequality is obviously true of Euclidean distances. However, measures of the dissimilarity of the contents of two quadrats (or sampling units of any appropriate kind) are often devised without any thought of the geometrical representation of the quadrats as points in a many-dimensional coordinate frame. These measures were not, when first invented, thought of as distances. Only subsequent examination shows whether they are metric, that is, whether they obey the triangle inequality or, in other words, "behave" as distances. Sorne examples are given after we have considered why metric diss~milarity measures are to be preferred to nonmetric ones. As remarked preVIously, . · ·1 't b t een two when a metric measure is used to define the d iss1IDI an Y e w . quadrats, then the dissimilarities behave like distances. As result, it may b · t · a space of many e possible (sometimes) to plot the quadrats as poID s m 1 h dimensions with the distance between every pair of points being equal 1 t_te d" . . . h nonmetnc d1ss1ID1 an y ISsimilanty of the pair (see page 165). But w en

ª

°

ª

measure is used, this cannot be done.

'

CLASSIFICATION BY CLusn

Rl~C

42

~ E ¡·a

distance as already defined were used as th uc 1thean the pattern of points would b e t h e same as th e . ·mil rity measure,h en. t has as its coordinates the amounts of that d1ss1 . e Produced when . eac · th pom quadrat it represents. But 1f sorne other met . different spec1es m e . . l1c dissünilarity measure were used, it w_ould give a dlfferent pattem of Püiuts. However, if a nonmetric dissimilanty measure were used, no swann 01 Of course' if

ª

points could be constructed of ~n~ p~tt~rn whatever .. To see this, let us invent a d1ss11mlanty measure s1mply. for purposes of illustration. Suppose we define the dissimilarity between pomts A and B as

100

S(A , B) = max(d) - d(A, B).

Here d(A, B) is the ordinary Euclidean distance as previously used in this chapter, and max( d) is the distance separating the farthest pair of points. For concreteness, let max( d) = 100. Obviously, increasing values of d(A, B) within the observed range of O to 100 give increasing values of 8(A , B) and, therefore, 8(A, B) could reasonably be used as a measure of dissimilarity. Now imagine three points, A, B, and C. Let the Euclidean distance between each pair be d(A,

B) = 90;

d(A,C) = 75;

d(B, C)

=

50 .

These distances conform with the triangle inequality; that is,

d(A, B) ~ d(A, e)+ d(B , e)

and, therefore, the points can be 1 d 1. P otte ~ a two-dimensional space (for instance, a sheet of a . Now consider the d~s ~er!l m_ t_he form of a tnangle with sides 90, 75, and 50. smu anties defined previously: 8(A, B) =

100 _ 100 100 - d(A , B) - 100 - 90 = 10;

Likewise, 8(A, C) = 4 and 8(B C -

' ) - 2· Clearly, it is not true that

8(A, B)

~ 8(A, C) + 8(B , C)

and ' as a consequence

-~-----------'-ºn_e cannot

construct a triangle witb 8( A. fi ),

I 01ss1MILARITY MEASURES ANO DISTANCES 43

8(A, C), and 8(B, C) as its sides. It is impossible T . . saving that the 8s, although they could serv .' . his is another way of J ~ e as d1ssuni_¡ ·t an Y measures, are 00 n.metric. To repeat, the merit of metric dissirnilarity . is that th d rats to be represented as a swa measures errnit the qua f . . ey often P . . rm o pomts m many-d· h S sional space. uc a representation is not strictl . imen. . Y necessary if ali w do with the data 1s class1fy them. Nearest-neighb d e want to . 1 or an farthest-neighb clustenng, as examp es, can be done just as well with . . or .h . B . nonmetnc d1ssimilari úes as wit metnc. ut often, mdeed usually we wa t t d. · ' n o or mate the data as well as class1fy them. As we see in Chapter 4 ord· r· . . . ' ma 10n procedures use swarms of data pomts as theu raw · desrra · ble that . . material . · Obviously, 1·t is the two methods of analys1s, ordmat10n and classification (or 1 · ) be . . . . e us tenng, carn~d ~ut. o~ 1d.entical bod1es of data, that is, on identical swarms. Hence metnc dissumlanty measures are to be preferred to norunetric ones. Their use permits a clustering procedure and an ordination to be performed on the same swarm of data points. The following are examples of two dissimilarity measures which do not, at first glance, look very different; however, one is metric and the other nonmetric. The better known of the two is the nonmetric measure percentage dissimilarity PD (also known as the percentage difference or percentage distance ). It is the complement of percentage similarity PS (also known as Czekanowski 's index of similarity; see Goodall, 1978a). Since PS, now to be defined, is a percentage, PD is set equal to 100-PS. The percentage similarity of ·a pair of quadrats, say quadrat 1 and

quadrat 2, is defined as follows. Let the number of species found in one or both q~adrats be s· Let X¡1 and X¡2 be the amount of species i m quadrats 1 and 2, respectively ( i

=

1, 2, ... , s ). Then

PS

=

200 X

i=l

(2.8)

s

L (X¡¡ + X¡2) i=l

· the table, . . . 1 2 10. Since, as shown m A numencal example IS shown 11:1 ~ab _e : PD = 41.33%. PS = 58.67%, the percentage diss1nulanty IS

CLASSIFICATION BY ClUSTERINc;

44

CALCULATION OF THE TABLE 2.10. TO ILLU~i~iiv PD AND THE PERCENTAGE PERCENTAGE DISSIM RATS ª REMOTENESS PR OF TWO QUAD . Species Nwnber i

Quadrat 1

Quadrat 2

1 2 3 4 5

25

7 16

16

25 40

50

18 16

50 22

9

22

Totals

108

66

159

PS = 200 RI = 100

40 18 16 9

min(x;., X;z)

max( x; 1, x;2 )

7

22 22 117

x 66/(108 + 117) = 58.67%. Therefore, PD = x 66/159 = 41.51%. Therefore, PR = 58.49%.

41.33%.

ªThe entries in the table are the quantities of each species in each quadrat.

Calculation of the second dissimilarity measure mentioned, the metric measure, is also shown in Table 2.10. There appears to be no accepted name for it, so it is here called percentage remoteness, PR. It is the complement of Rufüka's index of similarity, RI (Goodall, 1978a), which is s

L min( xil, x¡ 2 ) RI

=

100 X

_i_=_l- - - - -

s

(2.9)

L max(x¡1 , x¡ 2 ) i=l

Then PR

=

100 - RI.

Both PD and PR take values · h 'f the two quadrats have _m t_ e range O to 100. It is easily seen that 1 min(x . x ) a no spec1es m common, then all terms of the forni 11' i2 re zero and thus PD = p _ ·r the contents of the two . R - 100%. At the other extreme, 1. then PD = PR - O°' Thquadrats are identical so that x. = x. for all '' 10. erefore ·th 11 12 ' e1 er measure could be used as a measure

01 ssrM

rLARITY MEAS U RES ANO DIST ANCES 45

dissirnilarity and if it were not for th . h e super · rnetric measures, t ere would be little t h ionty of metric ov no n . . . . o e oose b er r since PR is metnc, it is superior A etween them How eve ' . . · proof that p · · be found m Levandowsky and Wmter (1971) R is metric can · b f d · ' and a dem · PD is nonme tne can e oun 111 Orlóci (lnS) onstration that book). An example of the use of PR in ecol . l(but see page 57 of this og1ca work has b . . evandowsky (1972); h e used it to measure th e d.lSSlilllla . . · t een given by f L pJankton in water samples collected from te n Y o the phytoshores of Long Island Sound. mporary beach ponds on the of

Another metric dissimilarity measure that ha s mueh to comm d · · city-block distance CD (sometimes called the M an hattan metrzc) _en Itit ·is the . . 15 h sum of the . d1fferences m species amounts ' for a11 spec1es . m . · the tt e . sampling uruts bemg compared. In symbols, ' wo s

CD

=

" 1...J i=l

lx 11. -

x 12 . 1·'

here lx 11 - xi21 denotes. t~e absolute magnitude of the difference between x, 1 and X¡z taken as pos1tive irrespective of the sign of (x 11. - x 12 ) . Thus for . the data m Table 2.10,

CD = 125 - 71 + 140 - 161 + 118 - 501+116 - 221+19 - 221 = 18 + 24 + 32 + 6 + 13

=

93.

In a way, CD is the most intuitively attractive of the dissimilarity measures. It amounts to a numerical value for the difference an observer ~onsciously sees on looking at two sampling units, for example, two quadrats ~n a salt marsh, two trays full of benthos from Surber samplers, or two plots m forest. Thus suppose one were to inspect two forest plots and count the number of trees of each species in each plot; then, for many people, the spontaneous answer to the question, "How great is the difference betwe~n the plots?" might well be arrived at simply by adding together the differe b · ll · · t0 ccount Thi nces etween the plots in species content, taking a species m sis the c1ty block distance and it has the metnc property. The units in which city-block distance and Euclidean distance are mea~ surect are the same as the units in which species quantities are mea~ure anct 0 f conunu111ty to ' as explained before (page 11), vary trom one type . . . · a another Wh . 1 1 f 0 r a dissnmlanty 1Il · en an author gives a numenca va ue

ª

ª

·

CLASSIFICATIO N BY

CLusnb

"'~e

46

.t almost always omitted. Purists may disappro er the uru s are . Ve h researc pap ' b ro'versal and leads to no rmsunderstanct· ' lllo t0 m seems to e u . but the cus f nd clearly defined at the outset. With p ª ·d d ali um ts are u11Y a . er. provi e h PD and PR the problem of uruts 3 + 4 • Euclidean distance, city-block distance, and percentage remoteness pro vide a more than adequate armory of dissimilarity measures for use wheneve nonnormalized distances are required. I t is now necessary to consider th1 topic of normalized versus nonnormalized (" raw") data and to conside whether, and if so how, ecological data and dissimilarity measures derive1 from them should be normalized for analysis.

Raw versus Normalized Data Data are said to be nonnalized wh distance from the ori . ~n every point is placed at the sam points from one anothgm ?f thh~ co?rd1~ates so that all that distinguishes th er is t err drrect10n fr th . . . . . al to disregardíng the absolute uan . . om e ~ngm. This 1s eqmv e~ the relative quantities. q tlties of each spec1es and considering onl · sometimes thought d . . .To see why thi s is pomts A B and e esrrable, consider Figure 2.10. Tb .' ' represent three d . . commuruty and it is 1 qua rats laid down in a two-spec1e P . e ear that d (A B) d . ropor~ions of the two s e . . ' < (B, C ). However, the relatl~ great d1ssimilarity as p c1es m quadrats B and C are identical Tbei the total quantity ~f t:;,etawsured b?' d ( B ' C ), arises solely from -'the f;ct th8 B Con · verse1Y, the very slightspec1es d' . comb.med is much greater in C thaJl ~ lSSunilari ty represen ted by the short dista11

°

orssrM'

L.ARITY MEASURES ANO DISTANCES 47 1.0 (\J (/)

A(0. 2,0.5)

W l)

w a.. (/)

0.5

•

C( 2 . 1 , 0 . 3)

B~(0~ .7~,0.~l~l~~~~~~~

1.5

SPECIES

Figure 2.10.

2.0

1

Points A, B, and C described in the text Cl .

early, d(A, B) < d(B, C).

d(A B) arises from the fact that both the quadrats A d B . ' . . . an contam small . amounts of both spec1es, although the ratio of species 1 to sp · . . . . . ec1es 2 is much greater m B than m A, this d1fference is not refiected in their dissimilarity as measured by d(A, B). It can be argued that dissimilarity should be measured in a way that lays more stress on the relative proportions of the species in a quadrat and correspondingly less stress on absolute quantities, in other words, that the raw observations should be normalized. However, this is a matter of opinion and is one of the decisions that must be made before data are analyzed. It should be emphasized that there is no single answer to the question: Should data be normalized before analysis? Whatever the decision, it is a subjective choice. Sorne guidance towards making the choice is offered in the following. First we consider two ways of measuring the dissimilarity between a pair of data points so that only the relative proportions, not the absolute amounts, of the species are taken into account. . . These dissimilarities are the chord distance and the geodesic metrzc. Both are metric measures. They are shown diagramatically in Figure ~.ll. As is used for illustra. · ) . . a1ways, the simple, two-dimens10nal ( two spec1es case f -~ liz d to the s-d11Ilens10nal ( s ion, ªQP the resulting formulas are then genera e species) .. case. . 2 11 ). Let the ongmal The chord distance is derived as follows (see Figure · . A' and B' for data · · f ·1 adius and wnte Pomts be proiected onto a cuele o uru r E ¡·aean distance th · . J d(A' B') the uc i e ProJections of points A and B. Then ' ' d B which we shall between A' and B' is the chord distance between A an 'ts a quadrat in de ' . e represen note by c(A B) In the figure since pomt . as in quadrat Which ' . ' 1 tive proport10ns the species are present in the same reª

CLASSIFICATION BY CLUSTER¡~<:;

A' f.

1.0

. '-....

(\J

\ (/)

w u ~

\

\

' ',

g(A,B) \

\

\

\

c(A,B)

0 .5

\

\

\'\

(/)

\

\

\ ..y

1.5

1.0

0.5

2.0

SPECIES

. 1 p ·nts A B and e are the same as in Figure 2.10. A' and B' are the projections F1gure 21 . . 01 , ' h " d · of A and B onto a circle of unit radius. The chord distance and t e geo es1c metric" (oi geode ic distance) separating A and B are c(A , B) and g(A, B) , respectively.

B, the point C' is identical with B' so that d(A', C') = d(A', B') and d(B',C') =O. Equivalently, c(A, C) = c(A, B) and c(B, C) = O. We now derive c(A, B) in terms of the coordinates of points A and B. Let these coordinates be (xlA, x 2 A) for point A and (x 1B, x2B ) for point B (Recall that the first subscript always refers to the species and the second tJ the quadrat or other sampling unit.) First, for brevity, put OA = a, OA' = a', OB = b, and OB' = b'. Writ(

Ofor angle AOB. Obviously, AOB = A'OB'. From the construction, we know that a' = b' = l. Applying Apollonius's theorem (see page 78) to 6. ABO and ~A'B 'O respectively, shows that . 1

d (A, B)

=

ª

2

+ b2 - 2ab cos 8

(2.10

and

c2(A,B)

=

a'2

+ b'2 - 2a'b'cos8

= l2 + l2 =

-

2 cos 8

(z.11

2(1 - cos8) .

N ow use (2 10) . to express cos (} in t

erms o

f xlA, X2A, X1s '

and Xzs·

01ss1MILARITY MEASURES ANO DISTANCES 49

from Pythagoras's theorem, 2 2 a2 -xlA+x2A;

d2(A, B) = (x1A -

xlB)2 + (x 2A -

x 28 )2 .

Then since from (2.10),

cos8=

ª 2 +b 2 -d 2 (A,B) 2ab

it follows that

cosO

=

(

xfA + x~A) + ( xfB + x~B) -[ (x,A - xlB)2 +(x2A - x2S] 2J(xi_A + xL)(xf8 + x~ 8 ) (2.12)

Hence to evaluate c 2 (A, B) for any pair of points A and B with known coordinates, first find cos (J using (2.12) and then substitute the result in (2.11). For example, in Figure 2.11 the coordinates of A and B are, respectively,

-(0205)

(X lA' X 2A ) -

• '

and

·

(x 18 ,x 28 )=(0.7,0.l).

Therefore,

cosO =

= 0.4990,

(0.2 X 0.7) +(0.5 X 0.1) {(0.2 2

+ 0.5

2

)(0.7

2

+ 0.1

2 )

B) - 1 0010 from (2.11); (A 1.0021 and e ' - · also, fJ = 1.0484 radians or 60.07º · li d to the s spec1·es case when the The equations can be directly genera ze . tual space of s 10 data points form an unvisualizable swarm conc¿ lar axes). Thus dimensions (a coordinate trame with s mutuallY perpen icu

from (2.12), whence c2 (A, B)

=

ª

CLASSIFICATI ON BY CLUSlERlt-.jC

so

{t x?A ;t x?o}

1/ 2

1

. ( ll) . s already given whatever the number of species. Equatlon 2. is a ' h h d d. · 1stance The maximum an d Illi·n1·mum possible values for t e. e or. · of poi.nts in a space of any number of drmens1ons are ¡2 between a parr and o, respectively. This follows from (2.11) and the fact that cos (} must lie in the range [ -1 , l]. Thus when OA and OB are parallel, cos (} === 1, c2(A , B) = o, and c(A, B) = O; when OA and OB are perpendicular to each other, cos (} = O, c2 (A, B) = 2, and c(A , B) = ¡2. Another obvious dissimilarity measure is the geodesic metric, shown as g(A, B) in Figure 2.11. It is the distance from A' to B' along the are of the unit circle; to be exact, one should stipulate that the distance is to be measured along the shorter are, not the longer are formed by the rest of the circle. It is seen that, since the circle has unit radius, the are distance g(A, B) is the same as angle (} measured in radians. To find the angle, om must first evaluate cos(} = SAB' say, and then find g( A, B) from g(A, B) == arccosSAB· SAB is known as the cosine separation of the quadrats (Orlóci, 1978 P·. 199). In the two-species example considered previously and shown ll: Figure _2.11, where the coordinates of A and B are (0.2, 0.5) and (0.7,0.l)

respecl!vely, coslJ

= SAB = 0.4990, as already determined. Hence,

g(A, B)

=

arccosSAB

= 1.0484 units of length.

The range of possible 1 f () 7T/ 2 = 1.576. In the sim ~a ues o ~ and he~ce of g(A, B), is from Ot geodesic metric is th 1 p ~ two-spec1es case illustrated in Figure 2.11 thi (three-dimensional): entght of a~ ~re of the unit circle. In the three speciel . ase e metnc is th 1 h o unit radius and the d e engt of a geodesic on a sphere wor geodesic has its customary meaning, namely, ti

01ss1MILARITY MEAS U RES ANO DIST ANCES 51

TABLE 2.11.

DATA MATRIX #4. THE QUANTITIES OF TW

ELEVEN QUADRATS.

O SPECIES IN

Quadrat

1

2

3

4

5

6

7

8

9

10

11

Species 1 Species 2

3 3

4 7

5 7

5.5 5.5

6 4

6 6.5

11 11

11.5 13.5

12

14 11

13.5 15

13

shortest on-the-surface dis~ance, or great circle distance, between two points on a sphere. In the s spec1es case the geodesic metric is a great circle on an s-dimensional hypersphere.

ExAMPLE. We now examine the outcomes of clustering the same set of data twice using the centroid clustering method both times. First, the data are left in raw form and Euclidean distance is used as the dissimilarity measure. Second, the data are normalized and the geodesic metric is used as the dissimilarity measure. The data (Data Matrix # 4) are tabulated in Table 2.11 and plotted in Figure 2.12. There are two "natural" clusters but they differ from each other chiefiy in the quantities of the two species they contain; the relative proportions of the species are not very different. Thus if the points represented randomly placed vegetation quadrats, one would infer that the ar ea sampled was a mosaic of fertile areas and sterile areas, but that the vegetation in these two areas differed mainly in its abundance and hardly at all in its species composition. As one would expect, if clustering is done using Euclidean distance as the dissirrúlarity measure (upper dendrogram in the figure), the two natural clusters are separated clearly; whereas if one uses the geodesic metric as the dissirrúlarity measure (lower dendrograrn), the two clusters are intermingled. Which is "better" is obviously a rnatter of choice. Even the rneaning of "better" is undefined unless the investigator has sorne definite object in view, that is, sorne clearly formulated question for which an answer is sought. Then whatever clustering method gives an unarnbiguous answer to the question (if any method does) is obviously the bes t. Communities which differ from place to place only in overall abundan~e, and not at all in species cornposition, are most unlikely to be found m nature. For instance the abundance of the marine rnacroalgae on rocky '

ª

CLASSIFICATION BY

CLus

TER¡~~

52

seashore varies markedly with the exposure of the shore to waves; sheitere shores support a much Jarger crop than wave-battered _shores. But the,sd ed communities are not sparse and dense vers10ns of the e co ntrast . . . .. salll species mixture; they d1ffer, also, m spec1es compos1t10n: 1 Likewise' the luxuriance of the ground vegetat10n m regions sev . . . . ere[) affected by air pollut10n is consp1cuously less than that m clean areas. B . . . . h . . Ut11 is not only less m amount; it 1s also mue poorer m spec1es. Thus if clustering is done to disclose differences of an unspecified kin raw data are better than normalized data. Differences in overall abundand are not "meaningless" and are not (usually) unaccompanied by at sorne qualitative differences in the community of interest. Normalizing:1 data may inadvertently obscure real, but slight, differences among them the same time as it (intentionally) obliterates the quantitative differenc ª . ~ however, that there may not be situations in hi Th at is .not .to say, 11 w el normaliz at10n is ca ed for; for example' one might wish to classi'fY samph

le~

8.28

3

2 11

•

15

ª• • 9

N l/)

w

•

7

10

u

•10 o

w

2

a_

o 2

3

o o

o 3

o

o

o

4

o

5

6

1

o

°4

06

l/)

40

5

0.10

o ~

o 1

5

10

15

0.05

SPECIES

Figure 2 12 T b · · · he data · o tamed by cluste . pomts of Data Mat .

o

5

10

1

0

7

o • 6

9

11

B

o

2 3

was obt · nng the data e . nx # 4 (see Tabl aJll betw amed using the Euclid . e~tro1d clustering wa e 2.11) and two deodrogr een-quadrat similarity; th:a;:,!istance between for both. The upper dendrograrn u d p of raw data points as measure se the geodesjc metric.

-----~-----~--er

- .......

o

eac~ us~~

dendro~

orssrMILARITY MEASURES ANO DISTANCES

53

Iots of the vegetation of a polluted a p . . rea so as to d. repollut10n clustenng. Then, provided th . isclose the probabl P . . 1 e quahtativ d. c:r e e iuerences among reexistmg e usters exceeded the qualitat" ct· P . . Ive Ifference . d tion, normalizat10n would be desirable. It . h s m uced by pollu. · mig t prevent ct·.i::r uantity of vegetat10n m the sample plots f . Iuerences in the q . . . rom overnd · . mg, and masking the ualitative differences that persisted from th q e prepollut" · ' To repeat: whether to use raw or normalized d _10n penod. judgment. ata IS always a matter of

Presence-and-Absence Data In sorne ecological investigations it often seems b tt · . . . . '. e er simp1Y to list the species present m each sampling urut than to attempt to . .. . . measure or estimate the quantitles. When this Is done ' the resulting data are known as presence-absence data, binary data, or (O, 1) data, and the elements in the data matrix consist entirely of Os and ls. Suppose the community being sampled appears to vary appreciably from place to place; then for a given outlay of time and effort one may be able to acquire a larger amount of information, or more useful information, by examining many quadrats quickly rather than a few quadrats slowly and carefully; the quickest way to record a quadrat's contents is, of course, just to list the species in i t. Again, suppose the organisms comprising the community vary enormously in size. They might range from tall trees to dwarf shrubs, for example. It rnight then be impossible to find a quadrat size that was large enough for -use with the trees and, at the same time, small enough for it to be practicable to measure the amounts of each species of ground vegetation. In such a case, use of binary data overcomes the difficulty. As Goodall (1978a) has written, in highly heterogeneous co~un ities, "quantitative measures add little useful information" to that yielded

by a simple species list for each quadrat.

. Now consider the graphic representation of a binary data matnx. In the simple, visualizable tw~ and three-species cases, all the data points must ~all · 1 This amounts to saymg on the vertices of a square or a cube, respective Y· d 'gh .bl positions for the ata . f th that there are respectively only four or ei t possi e . , ' . the coordmates o e . pomts in these cases Thus in the two-spec1es case (l 1) In the three-spec1es · four possible data points are (O, O), (O, 1), (1, O), and . ' · (O O) (1 o O) · h ordmates , 0' ' ' ' ' F 2 13) case, the eight possible data pomts ave co (O, 1, O), (O, O, 1), (1, 1, O), (1, O, 1), (O, 1, 1), and (1, 1, 1) (see igure . .

54

argument to binary data from an s-species comm. . ow ex ten d the . . . . . UIUt N (conceptually) in an s-duneos1onal plotted trame. It 1smtuitive¡) ciear that the possible data p01nts are the_ ve~uces of an s-dunensiona

coor~ate

hypercube and the number of these veruces is 2 . Tbere is no objection to using the clustering methods described in earli sections of tbis chapter to the clustering of binary data. However, unless 1 total oumber of species is large, the result of a clustering process may see somewhat arbitrary. Tbis is because only a few values are p ossible for;

1~

distance separating any pair of points. e Consider the simple cases in Figure 2.13. In the two-species case th distance between any pair of noncoincident points must have one of t values, 1 or !i, depending on whether the two points are at the ends 0

;c

SPECIES 2

SPECIES

1

SPECIES 2 (0,1 ,0 )

(0,1,1)

lw ~

/

(1,1,0)

~1: --

V-

:

/1 (1 ,1,I)

"V'3" (0,0,0~-1

/

/

SPECIES

/

/ /

(0,0,I)

/ ~ 1 ----7(1,0,1)

Figure 2.13 SP_EClES 3 sional (u . Ali possible bin pper) and three-dime~ data(lo points,) and the d'istances s ns1onal . . p· coordinate frames . eparatmg them, in two-d1J1le

------------~_'_~wer

01ss1MILARITY MEASURES ANO DIST ANCES 55

side of the square or at the ends of a diagon 1 1n the thre · case e-spec1es ·"ere are three poss1"ble nonzero distances· th u1 · ese are 1 1·f th ' e two points are at the ends of one edge of the cube·, fi if they are at the d f . of one of the square faces of the cube; /3 if the a en s 0 a diagonal diagonal that crosses the cube. Again the argum t Y re at the ends of a . en can be ge liz d s-spec1es case. When there are s species the d. t nera e to the is anee between a . f . . ' d" . ny parr o noncoinc1dent pomts must have one of only r;, ¡;:;e Th d. . s istmct values namel 1 y.t., v3, ... , vs. e 1stance 1s the square root 0 f th ' y, ' e number 0 f · found in one or the other (but not both) of the quad t species . . ra s represented by the pomts. Thus the distance between the points (1 o 1 0 1 · · d. . ' ' , ' , 1, 1, 1, O) and (O, 1, O, .O, 1, 1, 1, 1 , 1) m nme- 1mens10nal space is v'4 = 2, smce · 't there are four nnsmatches between . these two lists. In the s-speci·es case, 1·f the two quadrats together contarn all. s species but have no speci·es m · common, then . the d1stance between the pomts representing them is IS. It follows that if a clustering procedure starts with construction of a distance matrix (e.g., like that in Table 2.1, page 18), whose elements are the distances between every possible pair of points, then unless the number of species is very large there are likely to be severa! "ties" in the matrix. That is, severa! elements may ali have the same value. If this also happens to be the smallest value, then severa! fusions become "due" simultaneously. The same thing happens with minimum variance clustering; if two or more elements in a criterion matrix (such as Q1 in Table 2.9, page 36) are equal to one another and smalier than ali the others, then again the indicated fusions are due simultaneously. When this happens, the "due" fusions should be carried out simultaneously before the next distance matrix (or criterion matrix) is constructed; otherwise, errors will occur. We now consider other ways of measuring the dissimilarity between pairs of quadrats (data points) when the data are in binary or (O, 1) form. The Euclidean distance between two points which, as we have seen, is always the distance between two vertices of a hypercube, is not the only way of measuring the dissimilarity of the points. One can also use percent~ge dissimilarity PD a~d percentage remoteness PR which were defined earher

ª·

(see page 43). These measures can be calculated as already shown in Table 2.lO (page . 2 X 2 table as we shall now 44) ; altematively they can be denved from a d ' .t t o chosen qua rats see. Suppose a 2 x 2 table is constructed to perfill w d (quadrats 1 and 2 say) from a sample of severa! quadrats to be comparled. 11 the quadrats samp e , ' . d f Assume that species lists have been compile or

ª

CLASSIFICA TI O N BY CLUSTEn "I N~

56

. spec1es . that are not present 1. and . d ts contam . hin quadrats d t and sorne qua ra 1 f pecies represented m t e a a matnx sorne2 d of the to ta 0 s s '' · In other wor s, b 0 th the quad ra ts being compared . H ence th ese . Joint . are absent from t d N ow consider th e following 2 >< ab sences " can be spec1fied and coun e . table: Quadrat 2 Number of species

Quadrat 1

Present

Present a

N umber of species

Absent

e

Absent b

d

. (2.8) on page 43 ' the definition of percentage Recall Equat10n . (. similarit) f e· ther o or 1 for all values of z i.e., or aU PS. Clearly, when xil an d X;2 are 1 s species), s

L min(x;

1,

x;i) =a ,

(2.13)

i=l

the number of species present in both quadrats. Similarly,

t (x

i=l

;i

+ x ;,) =

(a

+ b) + (a + e) =

2a

+ b + e;

(2.14)

this is the number of species in quadrat 1 plus the number in quadrat. 2, counting the a "joint presences" (species present in both quadrats) twrce

over.

Substituting from (2.13) and (2.14) into (2. 8) gives

ab 2ab PS = 200 X 2 = 100 X a+ + e 2 a+ + e as the percentage similarity between two quadrats when the data are in binary fonn. This is identical with Serensen 's similarity index (as a per· ~en~age), one of the best known and most widely used of the similant mdices available to ecologists. It follows that, with binary data, the per· centage dissimilarity PD is the complement of S0rensen's index.

01ss1MtLARITY MEASURES ANO DISTANCES 57

. ind RI ( Next recall (2.9), the ·formula · .1anty . for RuZicka's suru The term in the denommator 1s ex page 44). s

L max(x¡¡, x; 2 ) i=l

=a+ b + . e,

(2.15)

this is the number of . species in the two quadrat s comb'med n t . the joint presences tw1ce. ' o countmg Substituting from (2.13) and (2.15) into (2.9) gives RI

=

100 X

--ª-a+ b + c ·

This is. J accard 's index (as a percentage) ' the oldest simi·lan·tYm · dex used by e~olog1sts (Goodall, 1978a) and as well known as S0fensen's. Thus with bmary dat~ t_he percentage re~oteness PR is identícal with the complement of Jaccard s mdex. lndeed, this complement (as a proportíon rather than a percentage), namely, 1-

a a+b+c

b+c a+b+c

is known as the Marczewski-Steinhaus distance (Orlóci, 1978). It is the ratio of the number of "single" occurrences (species in one but not both of the two quadrats being compared) to the total number of species (those in one or other or both of the quadrats). The numerical example in Table 2.12 illustrates the relatíonships among the various measures discussed. To summarize: when the data are binary, percentage dissimilarity PD is identical to the complement of S0rensen's index, and percentage remoteness PR is identical to the MarczewskiSteinhaus distance MS. (I t is assumed that the measures are either ali in the form of percentages or all in the form of proportions.) This statement enables one to choose wisely between the competing measures. It has already been mentioned (page 45) that PR is metric and PD nonmetric. 1t follows that MS, which is no more than a particular form of PR, is metric; a proof, which is rather long, has been given b;' Levandowsky and Winter (1971). Similarly, the complement of S~rense~ ~ · d D · t e· Orloc1 in ex, which is no more than a particular form of P , is nonme n ' . s, p. 61) demonstrates the truth of this with an example. Hence MS is 97

n

the better dissimilarity measure of the two.

TABLE 2.12. ILLUSTRATION OF THE CALCULATION OF DISSIMILARITY MEASURES WITH BINARY DATA. l. The presences and absences of 12 species in 2 quadrats (compare T~ Species ). Number Quadrat 1 Quadrat 2 min(xil, x; 2 ) max(x . ... ) - - - - - - - - - - - - - - - - - - - - -- -

1 1

1 1

1 1

o

o

o

1

1 1 1 1

1

1 2 3 4 5 6 7 8 9 10 11 12

o o

o

Totals

7

9

o 1 1 1

o 1 1

o

o

o o

1 1 1

1

1

-

o o 6

PS = 200 X 6/(7 + 9) = 75%. Therefore PD = 25% RI = 100 ' º· X 6/10 = 60%. Therefore, PR = 40%. 2. The same data in the forro of a 2 X 2 t abl e. Quadrat 2 Species present Qu;drat

¡~~:se~:~ Species absent

Speeies absent

a=6

b=1

e= 3

d= 2

Complement of S0rensen's . d m ex = 100( b MS dist + c)/( 2 a + b + e)= 25% ~-- anee= lOO(b + c)/(a + b + e)= 40% ..

1

1 , ...

1 1

o 1 1 1 1 1 1 1 1

o 10

i2

01SSI M

ILARITY MEASURES ANO DISTANCES 59

Since MS and the complement. of. S0rensen's ind ex can both be d unctions of the cell frequenc1es m the 2 x 2 tabl . . . . expresse as f e given 11 1s mt · enquire whether (assuming binary data) the Euclid d'. erestmg to . . ean 1stance bet wo data pomts can also be expressed m terms of thes f . ween t e requenc1es Rec n hat the distance between quadrats 1 and 2 d (l 2) is th · t · . ' ' ' e square root of the number of spec1es that occur m one or the other ' but no t both , of the quadrats. Hence

ª

d(l, 2) =¡¡;+e. We must now compare the Euclidean distance d with the MarczewskiSteinhaus distance MS in an attempt to decide which (if either) is the better. Since both are metric, sorne other criterion is needed for judging between them. The distinctive characteristic of MS is that it takes no account of species that are absent from both the quadrats being compared. This is regarded as a great advantage by ecologists who argue that presences and absences should not be given equal weight, especially for a community made up of sessile organisms. A "presence" conveys the unambiguous information that the species concerned can and
o

, l . - - - - - - - , ( 1. 1, o)

1 1

~---

¡

/ e

SPECIES 1

(0,0,0)

/ / /

A (0,0,1)

SPECI ES 3 d b tween points C and D Figure 2.14. The Euclídean distance between points A ~d B anMSeare not equal. See text. are equal. But the corresponding Marczewski- Stei·nhaus distances

CLASSIFICATION BY CLUS

lER.1~C

60

that it is absent merely by chance. Thus a dissímilarity measure, or" . tance," that ignores joint absences appears to have an advantage. Euclidean distance . treats presences and absences equally, as dem~ strated in the followmg. The disadvantage of MS, a fatal disadvantage accor.d~g to Orlóci (197 p. 62), is that it has no uniform scale of measure. This is most easily se&, from Figure 2.14 which shows four data pomts A , B , C, and D e three-space. Clearly, the distan ce between points A and B is equal to th distance between points C and D, and both are equal to That is ' e

di~

ti.

d(A,B) = d(C,D)

=

fi.

This . is obvious geometrically. The same result can be derived by con. structmg 2 X 2 tables for each pair of points; thus Pair ( C, D) Quadrat e

Pair (A, B) Quadrat A

~

~

+ Quadrat { + 1 B

-o

+

Quadrat { + O D - O

2 O

( + denotes presences and - absences). Hence d(A, B) = d(C, D) = llJ+C = 2 +

2 1

_

Now consider the MS di'st ff+O - fi as before. anees, say MS(A B) the two pairs of points. From th f ' . .' and MS( C, D ), between e requenc1es m the 2 X 2 tables

'

MS( A, B) =

b + e a+b+c

2+0 1+2+0

2 3

and

MS( C, D) =

2+O 0+2+0 = l.

so that MS(A B) . ' =I= MS(C, D) The d1fferen ce betw . . (the number of s . een MS(A, B) and MS( In general p~cies present in both d e, D) IS due to the terlll a ' even if tw 0 . qua rats) in th d "'S matches" ( b + pairs of points h e enominator of 1v'. · e), the pair with th 1 ave the same number of ''llllS' e arger nu m b er of species for tbe

LARITY MEASURES ANO DISTANCES 01 ss1M I

61

cornbined pair (a + b + e) will seem to be the "closer" . . s measures of dissimilarity. If MS distances are useda . In spite of the theoretlcal contrast between E 1.d . . · uc I ean distance and MS ·stance the d1fference IS probably unimportant in . . . . di ' _ practice; It IS unlikely to ave much eftect on the form of the dendrogram p d . h ro uced by a clustenng procedure. · EXAMPLE· For instance, Figure 2.15 shows the results of . . app1ymg nearest. neighbor clustenng to Data Matnx # 5 (see Table 2 13) Th 1 . . . . · · e e ustenng was performed tWI~~' once with Euclidean distances in the distance matrices (see Section 2.2), givmg t~e dendrogram on the left, and once with MS distances in the distance matnces, giving the dendrogram on the right. As may be seen, they are very similar.

To conclude this section, here is a dendrogram showing the way in which the dissimilarity measures described are related. The measures in boldface are metric, those in italics nonmetric. The arrows lead to dissimilarities usable with binary data from their quantitative "parents." 1

Quantitative Binary ~~

Euclidean distance Marczewski-Steinhaus dist. S(í}rensen complement

~

N onnormalized 1

Normalized

Euclidean distance Percent remoteness Percent dissimilarity

Chord distance Geodesic metric

1

Recall that four of the dissimilarity measures described here are the complements of similarity measures. The way in which they are paired is ·sted below: Similarity Measure

Dissimilarity Measure

Percentage similarity, PS

Percentage dissimilarity, PD

RuZicka's Index, RI

Percentage remoteness, PR

Jaccard's Index

Marczewski-Steinhaus distance, MS

S0rensen's Index

Complement of S0rensen's Index (no other name has been devised)

0.67

,J5 ,.,/4

,./3 ,J2.

0.40

QJ \)

QI

e: o

u

e ~ 1/)

0.33 ~

~

,JI

"O

0 .25 ~

e

o

QI

"O

u:::J w

o

o 2

3

4

8

6

7

1

5

2

3

4

8

6

7

1

·5

Figure 2.15. Two dendrograms produced by applying nearest-neighbor clustering to Data Matrix # 5 (see Table 2.13). Euclidean distance was used as dissimilarity measure for the dendrogram on the left, MS distance for the dendrogram on the right.

DATA MATRIX #5. PRESENCES (1) AND ABSENCES (O) OF 10 SPECIES IN 8 QUADRATS.

TABLE 2.13.

Quadrat

1

Species 1

2 3

o o o

4

5 6

1

7

1

8

o o o

9

10

62

2

3

1 1

4

5

6

7

1

1

o

1

o

1

1

1

1

o

1

o

o o o

1

o

o

o o o

1

1

1

1 1

1

1

1

1

o

1 1

1

1

o

1 1 1

1 1

1

o

1

o

o o o o

o o o o

1

o

o

o

1

o

1

8

o o o

---

E LINKAGE CLUSTERING AVERA G

2.7.

63

AVERAGE LINKAGE CLUSTERING

In Section 2.4 it was remarked that there are m~ny possible ways of defining the distance between tw~ clusters. In the clustenng method described in that section (centroid clustenng) . the distance between two clusters is defined as the distance between theu centroids. I.n this section we consider other definitions and their properties. The Average Distance between Clusters

The most widely used intercluster distance is the average distance. This distance is most easily explained with the aid of a diagram; see Figure 2.16. Tue five data points with their coordinates given beside them show the amounts of species 1 and 2 in each of five quadrats. There are two .obvious clusters, [A, B, C] and [D, E]. The average distance between these clusters, which will be written du([A, B, C], [D, E]), is defined as the arithmetic average of all distances between a point in one cluster and a point in the other. There are six such distances. Therefore, du([A, B, e], [D,

E])

=

H d(A, D) + d(A, E)+ d(B, D) + d(B, E) +d(C,D) + d(C,E)}.

---

15 ./

/ /

N

10 (j)

1

.

\

w

CL

(f)

5

\

\

/

eC(6,IO)

\

I

I

1

\

/

/

...........

'\

• E (16,9) \1 •

0(15,6)

"

/ ........

/------

\

/

8(4,7)

\

.......

__

/

I

/ /

/

5

.

"\

eA(4,11)

1

LLJ

u

......_

/

SPECIES

20

15

10

1

.

.

Figure 2.16. Data points showing the quantities of two spec1es m fi the definition of d u ([A, B' C], [ D, E]) . See text.

ve uadrats to illustrate q ,

CLASSIFICATION BY

CLusTE

64

RIN~

A D) for example, denotes the Euclidean dista Here, as always, d( ' ' . nee between the two individual pomts A and D.

Now

d(A, D) = {(4 - 15)2 + (11 - 8)2 = 11.4018; d(A, E)= V(4 - 16) 2 + (11 - 9)

2

=

12.1655;

.................................. and

2 2 d( e, E)= /(6 - 16) + (10 - 9)

=

10.0499.

It is easily found, after calculating all six interpoint distances, that

du([A, B, C], [D, E])= 11.0079. N ow consider the general case. W e require an equation for ,the averag distance between an m-member cluster [M1 , M 2 , .•. , Mm] andan n-member cluster [N1 , N2 , ... , Nn1· There are clearly mn point-to-point distances to be averaged. Therefore,

or, equivalently,

AVER AG

E UNl
N tíce that the order of summation is immaterial and th o Thus we may write e 1arge brackets are unnecessary.

(2.16) it being understood that the summations are over all values of j and k.

Equation (2.16) is the symbolic form of the definition of average distance. But when these distances are used to decide which pair of clusters should be united at each of the successive steps of a clustering process, it is much more economical computationally to derive each successive intercluster distancematrix from its predecessor rather than by using (2.16) which expresses each distance in terms of the coordinates of the original data points. This may be done as follows (Lance and Williams, 1966): Consider three clusters [M1 , M 2 , .•. , Mm], [N1 , N2 , •.. , Nn], and [P1 , P2 , ... ,PP] with m, n, and p members, respectively. In what follows, the clusters are represented by the more compact symbols [M], [N], and [P]. Suppose [M] and [N] are united to form the new cluster [Q], with q = m + n members. Then, from (2.16),

(2.17)

Now recall that the points belonging to the new cluste~ [Q~ are M1, M2, ... ,Mm, Ni, N2, ... ,Nn. Therefore, we can separate the nght s1de of (2.17) into two components and write

. . m/m = 1 and the second N ow multiply the first term on the nght side by the value of the term by n/ n = 1. This maneuver obviously
ª

CLASSI FICATION BY CllJSlt 66

· n . That is, express10

n

t

d ([ P] [QJ) = m __!_ u ' m pq J = i

t d(~,

k=1

Pk) +

~1~c

p

~ :q j~I k~/(~,Pd n

p

L k=l L d(Mj , Pk) + q np t'1k':I d ~' Pk) m

m J_ q mp J=l

=

nl" "

P

(

1

From (2.16) it is seen that _l

and

f f d(Mi, Pk) = du([M], [P ])

mp

J=l k=l

_!__

L L d(~, Pk) =

np

n

p

)

du([N], [P] ·

J=l k=l

Recall that q = m + n and that [Q] contains ali the members of [M] and [N] , the two clusters that were combined to form it. Thus

du([P], [Q]) =

m: ndu([M], [P]) + m: ndu ([N], [P]).

(2.18) As a numerical example, consider the points in Figure 2.16 again. When these points are clustered by any method, it is obvious that points D and B will be united first and then points A and C. After these two fusions have been done there are three clusters, which will be labeled as follows: [M] is the one-member cluster consisting of point B;

[N] is the two-member cluster consisting of points A and C; [ Q] is the three-member cluster consisting of points B A and C forroeO by uniting clusters [M] and [N]. ' '

[ P] is the two-member cluster consisting of points D and E. d Thus m = 1 n = 2 q _ 3 ' ' - , an P = 2. From the definition of (2.16),

= 1{d(B,D) + d(B , E)} = 11.6054; d.([N], [P]) = Hd(A, D) + d( A E) + d( e, D) + d( e, E) l

d.([M],[P])

=

10. 7092.

__/ 7

AVERAGE UNKAGE CLUSTERING 67

Hence from (2.18),

du([P], [Q])

=

t X 11.6054 + i

X 10.7092 = 11.0079.

This is, as it should be, the same as du([A, B C] [D E])

!

'

1 64.

'

'

. as given on page

Unweighted and Weighted Distances As Clustering Criteria

The average ~istance between clusters [P] and [Q], previously denoted by du([P], [Q]), IS not the only way of measuring intercluster distance. Recall and compare equations (2.18) and (2.7) (page 32). They constitute two different answers to the question: What is the distance between two clusters given that the first has just been created by the fusion of two preexisting clusters each of which was at a known distance from the second cluster? (Observe that the question asked is not the simpler one: What is the distance between two clusters? The reason is that the answer sought is a formula for computing the elements of each distance matrix from its predecessor.) Let us write [Q] for the first cluster, [M] and [N] for the preexisting clusters from which [ Q] was formed, and [ P] for the second cluster. The numbers of points in these clusters are q, m, n, and p, respectively, with q = m + n. The answer to the preceding question depends on how intercluster distance is defined. As we shall see, the defining equations are sometimes 2 expressed in terms of distance d, and sometimes of distance squared d . To make the relationship among the definitions more apparent, the w~rd 2 "dissimilarity" is used here to mean either distance or distance , according 2 to context. The symbol S is used in the equations to denote either d or d ' and after each equation its current meaning is specified. . a· t e the answer to the lf dissimilarity is defined as the average IS anc ' question is given by (2.18), rewritten with S in place of d, namely,

8.([P), [Q]) =

m: n8.([M), [P]) + m: n8.([N] ,

[P]). (2.19)

Here 8 denotes d. . . . . . , the distance2 between On the other hand, if dissirmlanty IS define~ as the answer to the cluster centroids (i.e., as the squared centroid distance ),

CLASSIFICATION BY Cl\Js 68

lt~,N~

question becomes

oc([P], [Q])

=

m: noc([M], [P]) + m: noc([N], [P]) -

mn

2

oc([M], [N]).

(2.20)

(m + n) 2

This is Equation (2.7) with x 2 = o/[P], [Q]), a = oc([M], [P]), b2 '=' 1 Sc([N], [P]), and c2 = o/[M], [N]). In (2.20) o denotes d . The subscripts in ou and oc stand for "unweighted" and "centroid," respectively; 8u may be described as the unweighted average distance. Both these dissimilarities are described as unweighted because they attach equal weight to every individual point. Therefore, the weight of a cluster is treated as proportional to the number of points it contains. As a result, the centroid (center of gravity) of a pair of clusters is not at the midpoint between the centroids of the separate clusters but is closer to the cluster with the larger number of members (see Figure 2. 7, page 30). We now consider "weighted dissimilarities." These are defined in a way that attaches equal weight to every cluster, and hence unequal weights to the individual points. Therefore, the definitions are very easily obtained by setting m = n = 1 in Equations (2.19) and (2.20). Thus from (2.19) we get

Here cS denotes d and the subscript w stands for "weighted" · d is the weighted average distance. ' w Similarly, (2.20) is replaced by

l rel="nofollow">m{[P], [Q])

=

Hm{[M], [P]) + !Sm([N], [P]) - Hm([M], [N]). (2.22)

Here. 8 denotes d2 . m stands for "median"· d is tbe . and the sub scnpt median dzstance somef kn . ' m E . ' unes own as the wezghted centroid distance. assu!~ª:~~ (2.~2) can be obtained directly by considering Figure 2. 7. lf we formed by ' :'. atever the values of m and n, the centroid of the cluster urutmg [M] and [N] líes midway between them at distance c/Z

AVERA G

E UNt
69

ch then it is clear that frorn ea '

trorn which (2.22) follows in the same w__~y that (2.20) follows from ( .?). * 2 The Four Versions of Average Linkage Clustering

Four ways of measuring intercluster distance have now been described: the unweighted average distance, the weighted average distance, the centroid distance (unweighted), and its weighted equivalent, the median distance. Each of these diiferently defined distances can be used as the basis of a clustering process. At every step of such a process, the pair of clusters separated by the smallest distance (using whichever definition of distance has been chosen) is united. The four clustering methods that use these distances are known, collectively, as average linkage clustering. Centroid clustering, described in detail in Section 2.4, is one of the four. The relationships among the four are most clearly shown by arraying them in a 2 X 2 table thus (below the name of each method is given the number of the equation to be used in the computations): Intercluster Distance

Unweighted

Weighted

Average of Interpoint Distances

Distance Between Centroids

Unweighted group average method (2.19)

Centroid method (2.20)

W eighted group average method (2.21)

Median method (2.22)

The methods were named by Lance and Williams (1966)· *

. . . arit between two clusters, one der is referred to the For summary definitions of these four mea.sures of the dissimil ~ ~ Which has been formed by uníting two ~reexis~in~ clusters, t e rea lossary; see under Average Linkage Clustenng Cntena.

CLASSIFICATION BY CLU 70

SlE~¡~

. ts should be noticed before examples are given; the Three pom Yar discussed in the following paragraphs. Ali four methods have the great computatio~al advantage of be~ . That IS, once . the distance com bma ton·at (Lance and Williams, . 1966). . between every pair of points in the ongma~ swarm of d~ta pomts have bee computed and entered in a distan~e-m~tnx, the co~r~mates of the Poin 1 are not needed again. Each succeedmg distance-matnx Is calculable from iu predecessor, using the appropriate equation as indicated in the precedin~ table. 2. All four methods can quite easily be carried out using either d or di in place of 8 in Equations (2.19), (2.20), (2.21), and (2.22). Thus each method can be made to yield two different dendrograms since d and d2 do not give the same results. However, there seems to be no good reason for using d 2 rather than d for either of the group average methods. For the centroid and median methods, on the other hand, d 2 is preferable todas clustering criterion. As noted on page 32, Equations (2.7) and, likewise (2.20) and (2.22) with 8 set equal to d 2 , have a definite geometric meaning; this is not so if 8 is set equal to d. With the centroid and median methods, therefore, it is best to use values of d 2 as clustering criteria (i.e., to unite the cluster pair for which d 2 is a mínimum at each step). But this
l.

F · . igure 2.17 shows the dendrograms obtained by applying tbe four clustenng method t0 D M . . . . s ata atnx #6 (see Table 2.14). The clustenng en 1enon was d for the g d d' roup average methods and d 2 for the centroid an me Lan methods. The heights of th d . al to values of d Th d e no es lll all four dendrograms are equ clustering is n~tice:bl e~~~ogram produced by unweighted group average however, that this w·n ~ 1 ere~t from the other three. It does not folloW, 1 e true Wlth other data matrices. EXAMPLE

The unweighted group avera . , dure most widely u d b ge method is probably the clustering proce se Y ecologists T 0 ·1 · mention only a single exarople, 1

AVERAGE uNt
71

A

8 50

w

L)

z

~

25

Cfl

o o

1 2

3

4

5

7

8

6

9

10

1 2

3

4

5

7

8

6 9 10

e

D

50

w

l)

z

o

25

o

1

2

3

4

5

7

8

6

9

10

1 2

3

4

5

7

8 6

9

10

Figure 2.17. Four dendrograms produced by applying different forms of average linkage clustering to Data Matrix #6 (see Table 2.14). (A) centroid clustering; (B) median clustering; (C) unweighted group average clustering; ( D) weighted group average clustering. The clustering criterion is d 2 for A and B, and d for C and D .

was used by Strauss (1982) to cluster 43 species of fish occurring in the Susquehanna River drainage of Pennsylvania (this is an example of Q-type clustering; see page 8). As Strauss remarks, "any clustering technique might have been used." It is, unfortunately, true that no one clustering method is better than all the others in every respect. To choose a method wisely, it is necessary to

TABLE2.14. DATA MATRIX #6. THE QUANTITIES OF 2 SPECIES IN 10 QUADRATS. 10 9 8 7 6 5 4 Quadrat 3 1 2 66 82 64 54 51 34 Species 1 32 33 15 21 20 15 42 45 27 32 Species 2 46 58 75 72

CLASSIFICATION BY CLU

72

Sll~¡~C

vantages and disadvantages of each and decide Which balance the ad . . ad ost desirable and which d1sadvantages can be tolerated 'h, vantages are m . . . . llle . . · ften a1·fficult· choosmg the best trade-offs m a given cont dec1s10n is o ' . ext ¡ · the end ' somewhat subjective. We now d1scuss sorne of the rnost8 a1ways, m crucial decisions.

2.8.

CHOOSING AMONG CLUSTERING METHODs

Seven clustering techniques have been described in this chapter: nearest and farthest-neighbor clustering, minimum variance clustering, and the four forms of average linkage clustering among which centroid clustering ~ included. There are many other, less well-known methods, devised for special purposes; accounts of them may be found in more advanced books such as Orlóci (1978), Sneath and Sokal (1973), and Whittaker (1 978b). One or another of the last five methods described in this chapter should meet the needs of ecologists in ali but exceptional contexts. It remains to compare the methods with one another. Nearest and Farthest-Neighbor Clustering

These are rarely used nowadays. In these methods the two clusters to be ~ni~e? at any step are determined entirely by th~ distance between two mdlVldual data points, one in each cluster. Thus a cluster is always repre· s~nted by only one of its points; moreover, the "representative point" (a differe~t one at each step) is always "extreme" rather than "typicar' of the cluster It represents. Minimum Variance Clustering

This is a useful technique wh h . of the quadrats b en t ere IS reason to suspect that sorne (or ali) e1ong to one or more h Je suppose data had b omogeneous classes. For examP ' een coliected by li . ¡ d quadrats, a rather hete samp ng, With randomly p ace uncertain whether ali :~:en:ous tract of forest and scrub. One might be whether on the co t q adrats should be thought of as unique or quadrat~ in any 0 rarly, they f~rmed severa! distinct classes with all tbe . C ass COilStltutin nl'T'IC population. In the fo g a random sample from the Sw': · . rmer case every d . 1s mterestmg and reveals (ºt . h ' no e m a clustering dendrograIJ1 i IS oped) "t " . . ·¡w rue relationships among dissiJlll

:e

•

CHOOSING AMONG CLUSTERING METHODS 73

. thí·ngs. In the latter case the first few fusions do no more than urute of quadrats that are not truly distinct from one h groups . hi anot er; the differences d arnong the qua rats wit n such a group are due enf 1 . h . . . ire y to chance and the order in which a matter of chanee. ' . . t ey are . umted is likewise . With nummum vanance e1ustenng it is possible to d . . . . . o a statistical test of each fusion m order to .Judge whether the points (or clusters) b elilg . uruted . are homogeneous (replicate samples from a single parent . . popu1ation) or heterogeneous (samples from d1fferent populations) This i · · · · 1 h "· f . · s eqmva1ent to judgi~g, obJective y, t e m ormat10n value" of each node in a dendrogram. Thus If the lo~ermost ~odes re~resent fusions of homogeneous points or clusters, they have no mformat10n value; obviously, it is useful to distinguish them from nodes representing the fusions that do convey information about the relationships among the clusters and about their relative ecological "closeness." The reader is referred to Goodall (1978b, p. 270) or Orlóci (1978, p. 212) for instructions on how to do the test, which is beyond the scope of this book. Mínimum variance clustering, like farthest-neighbor clustering, tends to give clusters of fairly equal size. If a single data point is equidistant from two cluster centroids and the clusters do not have the same numbers of members, then the data point will unite with the less populous cluster (proved in Orlóci, 1978). The result is that, as clustering proceeds, small clusters acquire new members faster than large ones and chaining is unlikely to happen. This is a great advantage when clustering is done to provide a descriptive classification, for mapping purposes, for instance. Of course, it
. now to the four average li·nkage e1us tenn · g techniques ' the first Turrung 1 choice that must be made is between unweighted and weighted ~ethods. ~ the great majority of cases an unweighted method, which assI~ns eq~ weight to each data point and hence weights each cluster a~c.ordmgd knto Its . . . . t f commumties an ew size, Is better. But if one were studymg a rrnx ure 0 h weighted that they were very unequally represented in th~ data, ~ en ttheir sizes method, which assigns equal weight to the clusters irrespecuve o ·ty fro~ d ntly sampled commum How large would be useful· it would prevent th e a b un h · ' h h e of the dendrogram. . Choosing wisely . avmg an overly large infiuence on t e s ap is "overly large" IS, of course, a question of JUdgment.

ª

1

CLASSIFICATION BY

74

CLusn

Rt~c

between weighted and unweighted clustering is not always easy, but When ~ doubt, unweighted clustering is to be preferred. . I t remains to choose between group average .clustenng h and centrO¡. clustering (or its weighted equivalent, median clustenng). T e pros and con are very evenly divided. Each method has a notable ~dva~tage that the othe lacks and, at the same time, a notable weakness which is consequence 0

ª

the advantage. The strong point of centroid clustering is that each cl~ster as it is forme is represented by an exactly specifiable point, its centr01d, and the distance between two clusters is the distance between their centroids. In grouP. average clustering, there is no such geometrical realism: the clusters cannoi be identified with precise representative points and, therefore, the concept 0 intercluster distance is unavoidably fuzzy. The device of using the average 0 all interpoint distances between two clusters as a measure of intercluste distance is just that, a device. The weakness of centroid clustering, a weakness not shared by grouP. average clustering, is that it is not monotonic. This term is most easil explained with a figure (Figure 2.18). The upper panel shows six data points and their coordinates in a two-dimensional space. Below are two dendrograms obtained from the data. The dendrogram on the left results from centroid clustering with d 2 as the clustering criterion; the scale shows the square root of the d 2 value corresponding to each node. The dendrogram on the right results from group average clustering with d as criterion. As may be seen, the centroid clustering dendrogram contains two so-called reversals. For example, the height of the node (the intercluster distance) rep.resenting. the fusion of E with [D, F] is less than that representing the fusion of pomts D and F. This is because (see the upper panel) although D an~ F are nearer to each other than either is to E, so that D and F are uruted first, the centroid of the new cluster after the fusion (the hollow dot labeled [D, is nearer to E than either of its component points were before the fus1on. The distances are easily found to be

FP

d(E, D) and Clearly,

= 9.604;

d(E, F)

=

d(E, [D, F])

9 .002 ; =

d(D, F)

8.163.

d(E, [D, F]) < d(D, F) < { d(E, D) d(E, F)" The reversa] where B joins [A C] h ' as the same cause.

=

8.944 ;

G AMONG CLUSTERING METHODS

CHOOSI N

75

There are no reversals in the dendrogram on the right. Indeed, it can be proved that reversals cannot occur in a dendrogram obtained by the group average clustering methods (Lance and Williams, 1966), and hence these methods are preferred by those who regard reversals . a dendrogram as a fatal defect.

If a clustering method is incapable of giving reversals, the measure of . tercluster distance that it uses is said to be ultrametric; with an ultrametric lil f . 1 d" meaSure' the sequence o mterc uster istance values between the pair of clusters united at each successive fusion is always a monotonically (continuin

A(6,24)

T 1

20

• e e 16.4 ,10.0 l

1

[A,B]~

(/)

• E(34.2, 15)

1

w

1

•

ü

~

0(26,10).__

8(6, 12)

10

(/)

rr

]

'~,F

'-.... F( 34,6)

30

20

10

40

SPECIES

11

23 .3

24.2

12. 2 11.6

11.6

~I ~-

w

10 .7

u

z

w

ü

z

(/)

4

9.3

~

01

(/)

o

8 .9

8 .2

o

1

1

o

F

8.9

o

o

e B E o F . dendrogram (left) a centroid clustenng t d from the sals appear m d gram construc e 11 Figure 2.18. 111ustration of how rever average clustering den r~ted at the top. The ho although they are absent from the _grou~at were clustered are ~o f the dendrograms, wbic same data (right). The six data pom~] and [D, F). Note the se es 0 dots are the centroids of clusters [A' als conspícuous. have been adjusted to make the revers A

e

B

E

A .

?:

CLASSIFICATION BY CLUST

76

En1N

ously) increasing sequence. Therefore, the clusterin~ method is c~lled mon tonic. In group average clustering the measure of rntercl~ste~ distance (tH average of all the point-to-point distances between a po~t m one cluste and a point in the other) is ultrametric; hence the method IS monotonic an the dendrograms it gives are free of reversals. If a clustering method can give reversals, the me~sure of intercluste distance that it uses is not ultrametric and the method IS not monotonic • ¡1 centroid clustering, the measure of intercluster distance (the distance be tween the two cluster centroids) is not ultrametric, as Figure 2.18 show Hence centroid clustering is nonmonotonic and it can give reversals. In sum: group average clustering gives clusters with undefined center and monotonic dendrograms; centroid (including median) clustering giv clusters . with exactly defined centers and dendrograms that may contau reversals. lt is logically impossible to have the best of both worlds if ¡ guaranteed absence of reversals is indeed "bes t." Reversals, where thei occur, suggest that the difference between the clusters being united ü negligible; unfortunately, one cannot make the converse inference, that an absence of reversals implies distinctness of ali the clusters.

2.9. RAPID NONHIERARCHICAL CLUSTERING All . . so far d escnbed in this chapter have been · theh.clustering. techniques hzerarc zca1. A hierarchical clu t .

unite data points ·nt s enng procedure
1. ffierarchical . clusterin b c_omputer time and me g y any method makes heavy demands oJJ tlon n mory · F or very 1 a y more economical pr d . arge bodies of data, a computa· ~· A dendrogram w'th oce ure IS desirable. 1 a very I ultlill at e branches is too big t arge number (100 or more say) o¡ o comprehend. '

RAPID N

ONHIERARCHICAL CLUSTERING

77

data matrix usually contains data f 3. A Iarge . . hi h f h . . rom numerous replicate aropling umts w1t n eac o t e d1stmguishabl d·i:r .. s . . b . . . Y Iuerent commuruties hose relat10nships are emg mvestigated. Hence th li . . W . . . . e ear est fus10ns m a hierarchical clustenng are likely to be urunformative Th . . · ey mere1y have the effect of pooling replicate quadrats, and the order of th f . . . . . e us10ns which brmg this pooling about is of no mterest. Therefore, it is . desirable to subject very large data matnces · to nonhierarchical cluste~g at the outs~t of an analysis. The clustering should be done by as econonucal a method, m computational terms , as poss"bl Also, 1 e. so far as possible,. the clusters it defines should be homogeneous. This preliminary clustenng. should have the effect of condensing a large data matrix. lt should pernut batches (or "pools") of replicate quadrat records in the large matrix to be replaced by the average for each pool. Then the centroids of these batches, or pools, of virtually indistinguishable quadrats can become data points for ·a hierarchical clustering that will reveal their relationships. lt is unfortunate that the word clustering is at present used both for the hierarchical clustering procedures discussed in earlier sections of this chapter, and also for the rapid, preliminary nonhierarchical clustering that we are considering now. The objectives of the two operations are entirely different. Nonhierarchical clustering is just a way of boiling down unmanageably large data matrices in order to make hierarchical clustering (or other analyses) computationally feasible and ecologically informative. To avoid ambiguity, there should be different names for the two operations; perhaps rapid nonhierarchical clustering could be called pooling, since that is what it does, and a cluster defined by such a process could be called a pool, as in the preceding paragraph. These newly coined terms are used in what follows. Severa! methods of data pooling have been devised; pr~bably th~ be~; is Gauch's (1980, 1982a) technique which he calls "compos1te cluste~ng. A computer program for doing it is available (Gauch, 1979). In outline, the process is as follows. There are two phases. In the first phase, points are selected at rand~m · from the swarm of data points, and all other pomts wi·thin a specified. radms . 1 t 0 f each selected point are ass1gned to a poo cen ered on that pomt. The . f course centr01ds) 0 t .' . random points that act as pool centers (they are no , . h rsphencal m shape are chosen one after another. The earliest poo1s are ype

LLA~~•F•c~noN

BY

e

LlJs1l~1~

78

. lar in the two-dimensional case). Later pools are not all . . . O\VC{j (CJICU earlier pools (i.e., overlap a pomt must remam a member of the pool it .. t first) and hence these later pools tend to be small and " spikey" sin Jo occupy the interstices. among pools. Therefore '.the pr ocedu ce thei . earlier formed . has a second phase m which pools w1th. fewer than a. specified quOta Or points are broken up. Their member pomts are reass1gned to the n Jarge pool, provided that they witbin a predetermined distance of on the second round the radn of sorne pools are slightly increased. Tll number of pools formed is under the control of the investigator' who mus choose the radius to be used at each phase of the process. The smaller radü, the smaller and more numerous the pools, and the more confident lb can be that they are homogeneous. Points that fail to become any pool are rejected as "outliers." After pooling has been each pool can be replaced by an average pomt (the centroid of all the p · . the pool) and these average points then become the data point ~f m 0 further investigations. s

~e

it.e~:

number~ complet~

1

t~e

APPENDIX Apollonius's Theorem

a

To Prove: c2 = ª2 + b2 - 2ab cos(J

EXERCISES

79

where e

C1 + C2

=

and

()

= () 1 + ()2.

Proof: First recall that cos( 81 + 82) = cos () 1cos 02 - sin 01sin () 2·

Observe, from the figure, that h 2 = ab cos 01cos ()2,.

h = a cos 81 = b cos (} 2,. C1

=

a sin01

C2 = b sin()2·

and

Also,

Now e 2 -- ( C1 + C2 )2 =e?+ c2

ci + 2c1c2.

=

(a2 - h2) +(b2 - h2) + 2absin01sin02

=

ª 2 + b 2 - 2( h 2 -

=

a 2 + b2

-

2ab ( cos 01cos ()2 - sin () 1sin ()2)

=

a 2 + b2

-

2abcos0.

ab sin01sin02)

QED

EXERCISES 2.1.

Given the following data matrix X, what is the Euclidean distance

between: (a) points 1 and 5; (b) points 2 and 3; (e) points 3 and 5? (Each point is represented by a column of X, which gives its coordinates in four-space.)

X=

l~

-2

8

2

-1

o

4 -1

6

3

-4

-2

-2

o

CLASSIFICATION BY CL

80

USltR1~

using farthest-neighb 2.2. Cluster the five quadrats in Exercise 2.1 . or cJ tering. Tabulate the results in a table like Table 2.5 .

Us

2.3. Suppose the data points whose coordinates are given by the col

of x in Exercise 2.1 were assigned to two classes: (1, 2] and [3~lllni 5 Find the coordinates of the centroi~ of each of these classes. .I 1 the distance between the two centro1ds?

Wh;

2.4. Let M ' N ' and P denote the centroids of clusters of points in five•sPac with, respectively, m = 5, n = 15, and p = 6 members. Find th distance 2 between P and the centroid of the cluster formed by Uniti ! clusters [M] and [N]. The coordinates of points M, N, and pare ~ follows: M

1

-8 8 9

7

p

N -5 2 5 1 4

10 11

-2 -5 -6

2.5. What .is the withi~-cluster dispersion of the swarm of five points whos( coordmates are given by X in Exercise 2.1? 2.6. For the two points m' s·ix-space wh ose coordmates . . by th¡ are given columns of th~ following 6 X 2 matrix, find: (a) the chord distance (b). the geodes1c metric·' (e) the angular separation between the tW( pomts.

3 4 -2 1 5 2.7.

-1 -3 -4

o

4

Obtain J accard' s and S ' . . following three . 0rensen s md1ces of similarity (J and S) for tb paus of quadr t · D a 62): (a) quadrats 1 and 2· a s m ata Matrix # 5 (Table 2.13, Pªj Prove that S must alwa ~ (b) quadrats 3 and 4; (e) quadrats 2 a11d y exceed J except when S = J = l.

EXERCISES

2. 8.

81

. The columns of the following matrix give the co a· . . or mates m four-space of seven pomts grouped mto clusters as shown. [Q] [M]

[N]

[P]

~

~

~

M1

[!

M2 N1 N2 N3 3 3 -1

o

8 7 8 6

9 5 8 9

7 5 6 6

PI

-3 -2

o 1

P2

-1) -1 4

-2

Find the following measures of the dissimilarity between clusters [P] and [Q]: (a) The unweighted average distance; (b) the centroid distance; (e) the weighted average distance; (d) the median distance.

1

Chapter Three

Transforming Data Matrices

3.1.

INTRODUCTION

This chapter provides an elementary introduction to the mathematics necessary for an understanding of the ordination techniques described in Chapter 4. But to begin, it is desirable to demonstrate a crude form of ordination to show what the purpose of ordination is and how this purpose is achieved. Consider Data Matrix #7 (Table 3.1) which shows the quantities of four species in six quadrats (or other sampling units ). Suppose one were asked to list the .quadrats "in order" or, equivalently, to rank them. Clearly, there is no "natural" way to do this; the data points do not have any intrinsic order. The task would be simple if one species only had been recorded; then the quadrats could be ranked in order of increasing (or decreasing) quantity of the single species. When two or more species are recorded for each quadrat, however, the data points do not, usually, fall in a natural ~equenc~. Although such natural ordering is not logically impossible, in pract1ce one is far more likely to find that a set of observed data points represents a diffuse swarm in a space of many dimensions. Therefore, if one wis~es.to r~ the points, it is necessary first to prescribe sorne method of assigmng s~gle numerical score to each quadrat. Then, and only th~n, can the pomts (quadrats) be ranked using the seores to decide the ranking. ' .. . D M tnx· # 7 are cover values To illustrate suppose the quantities m ata tree spec1es 2 a ' . 0 f four species of forest plants. Let spec1es 1 be a canopy '

ª

ª

.

83

TRANSFORMIN G DATA M

Al~IC~

84

. t tree species 3 a tall shrub, and species 4 a low shrub. On subdonunan ' · dd e Way . score to each quadrat would be sunp1Y to a the cover valu to ass1gn a f . .. es o ali four species. Thus Iet~in_g x iJ denote the cover o spec1e_s i m ~Uadrat j the score of quadrat J is (x1 1 + X21 + X31 + X4)· Usmg tbis scorin. the seores of quadrats 1 through 6 are found to be, respectively me th o, , d 124; 130 150 83 183 118 the ranking of the quadrats, from that with the smallest to that with th largest score, is then #6,

#1,

#3,

#5,

#4,

#2.

Alternatively, one might choose to weight the species according to the· sizes instead of treating them ali equally. There are infinitely many ways · which this could be done. For example, one might assign to quadrat j th score (4x 1j + 3x 2 j + 2x 3j + x 4 ) . Using this formula, the seores of quadrat 1 through 6 are, respectively,

340

335

221

400

234

375

and the ranking of the quadrats becomes #3,

#5,

#1,

#2,

#6,

#4.

~his is a d~ere~t order from that given in the preceding paragraph. Bot lists ~re ordmatwns of the data in Data Matrix # 7 and the fact that the are different s~o':s that the result of an ordination depends on the metho chosen . . for ass1grung seores to the quadrats or, equivalently on the we1ght ass1gned to the differen t spec1es. · The vanous . ' ordination techniques de TABLE 3.1. DATA MATR IN n = 6 QUADRATS. IX Quadrat

Species 1 Species 2 Species 3 Species 4

:t:t?. THE QUANTITIES OF s =

4 SPECIES

1

2

3

4

5

50 11 45 12

20 16 65 82

25

45

15

20 23

33

14

49

31

15

23

70

--6

60 17 37 10

vECTO

R AND MATRIX MULTIPLICATION 85

·bed in Chapter 4 are all procedures f sen . f . or deter · . biectively mstead o choosmg them arbitr il Illlrung. these weights oJ di ar y and b done in the prece ng. su ~ectively as was

3.2. VECTOR AND MATRIX MULTIP

LICATION

The operation just performed, that of transfor · Illlng a data mat · · seores that can be ranked, can be represented s b li nx to a list of Ym o cally. Vector X Matrix Multiplication

Let us write X for the data matrix which in the prevI· . ous examp1e Is an array of numbers ( elements) arranged in four rows and six· e 1 . . o umns enclosed m Iarge parentheses. It IS a 4 X 6 matrix.

X= rxll

X12

X13

XI4

XIS

XI6

X22

X23

X24

X25

X26

X3I

X32

X33

X34

X35

X36

X4I

X42

X43

X44

X45

X46

X21

The element in the ith row and jth column is written x¡J· Notice that the first subscript in X¡¡ is the number of the row in which the element appears, and the second subscript is the number of the column; this rule is invariable and is adhered to by all writers. In this book, and in most but not all ecological writing, data matrices are so arranged that the rows represent species and the columns represent sampling plots or quadrats. Therefore, when this system is used x .. means the amount of species i in quadrat J. ' lj . . The single symbol X denotes the whole matrix, made up m this case of 24 distinct numbers. I t does not denote a single number (a sea/ar). Boldface type is used for X to show that it is a matrix, nota scalar. Now let us write y' for the Iist of six seores; y' is a matrix with only one row, otherwise known as a row vector. 1t is

y'= (y1 , Ji, y3, y4, Ys' Y6). A b . The lowercase letter s efore, the boldface type shows that Y' is matnx. only one (y not Y) shows that it is a vector (a matrix with only one row, or

ª

TRANSFORMING DATA M

AlR1cEs

86

column). The prime shows that it is a row vector. If the same array elements were written as a column instead of a row, they would fo of .h . ) rm a column vector and be denoted by y (w1t out a pnme . Finally, let us write u' for the list of coefficients by which each elemen . a column of X is to be multiplied to yield an element of y'; u' is a :0lll vector and the number of elements it contains must obviously be the sa as the number of elements in a column of X. (The number of elements: column of X is, of course, the number of rows of X.) Hence (u 1 , u 2 , u3 , u4 ). Recall again the example in Section 3.1. The first list 0 quadrat seores, namely,

u':

y'= (118, 183, 83, 150, 130, 124) was obtained by adding the elements in each column of X. That is, the score for the j th quadrat was given by 4

YJ

= X1j

+ X21 + X3j + X4j

=

L

X¡j•

i=I

Altematively, this can be written as

with u1 = u 2 = u3 the row vector

=

u4 -- 1• Therefore, th ese seores were obtained using

u' = (1 , 1, 1, 1). The elements in the second list of seores, namely,

y' = (335, 340 , 221 , 400, 234, 375) were obtained from the same formula - 4 . 4 = 3, u3 = 2, and u = H . (yJ - L¡=1U¡X¡) but with u1 === , 1 4 . ence m the second case we had

U2

u'= (4 ' 3 ' 2 ' 1) . The operation by which 'wa

ª form of matrix mult.lp 1zcatwn . Y_

. . 15 s obtamed from u' and X in the two cases 1t e11t 1·¡ • s the multiplication of a matrix bY ª

ª

ECTOR ANO MATRIX MULTIPLICATION

87

w vector (or one-row matrix). Thus y' · O

is the pr d o uct of u' and X. Written

. . . asan equat10n, this IS

u'X =y'. n words: the coefficient vector u' times the d . . .d . al . ata matnx X . h '. This IS I entIC m meaning to the much 1 . Is t escore vector

e umsier equation

(u1,U2,U3,U4)

[Xn X21 X31

X41

=

(Ji, Y2,

Y3, Y4,

X12

X13

X14

X15

X22

X23

X24

X25

X26

X32

X33

X34

X35

X36

X42

X43

X44

X45

X46

X16

Ys, y 6 ).

This of the equation u'X = y' is itself a con densed form of . extended version . . six separate equat10ns, of which the first and last are:

...........................

Thus the rule for calculating each of the six elements of y', that is, for calculating the elements in the product u'X, is the formula already given: 4 Yj =

L U¡X¡j

for j= 1,2, ... ,6.

i=l

To generalize: suppose an s x n data matrix X records the amounts of s species in n quadrats. Let u' be an s-element row vector (i.e., a 1 X s matrix) of weighting coefficients; these are the weights to be assigned to each species in order to calculate the score for a quadrat. Let the resultant seores be listed in the n-element row vector Y'. Then

u'X =y' in which s

Yj =

L i=l

U¡Xij

for j = l,2, ... , n.

(3.1)

TRANSFORMIN G DATA

88

MAi-~IC

Let us reWrite Equation (3.1) with the sizes of the three matrices sho1,1¡ below them: u'

X

(1 X s) ( s X n)

= y' . (1 X n)

For the multiplication to be possible, tl;te number of columns in the firs factor, u', must be the same as the number of rows in the second factor, Sin ce u' has s columns and X has s rows, the product y' = u'X can indee be formed. It has the same number of rows as the first factor, u', and th same number of columns as the second factor, X. In other words, the size 0 y' is 1 X n. As should now be clear, the factors in a matrix product must appear · correct order. Equation (3.1) cannot be written as Xu' = y'. The produc Xu' does not exist, since n, the number of colum·1s in X, is not equal to 1 the number of rows in u'. The product u'X is described as X premultiplied by u' or as u' postmulti plied by X.

Matrix X Matrix Multiplication The preceding paragraphs showed how to premultiply a data matrix X by vector u' of weighting coefficients to obtain a vector y' of quadrat seores Before proceeding, it is worthwhile to recall the purpose of the operation. 1 is to replace a large, perhaps confusing, data matrix by a list of seores that i much more easily comprehended. To put the argument in geometric terms an s X n data matrix is equivalent to a swarm of n points in s-dimensiona space. Therefore, unless s s 3, the swarm is impossible to visualize. How ever, if the original data matrix is transformed into a list of "quadra seores" by the procedure previously described, the multidimensional swar of ? 0 ints is transformed into a one-dimensional row of points tbat ca easily be ?lotted on one axis to make a one-dimensional graph. . Reducmg an s-dimensional swarm to a one-dimensional row entail considerable sacrifice of information, of course. This raises the questíon Need multidimensional data be so severely condensed to make thern coJll prehens~ble? The answer is obviously no. A two-dimensional swartJl (a conventwnal scatter diagram) is quite as easy to understand; it e~ ~J plotted on a sheet of paper. How, then, can the original s-dirnens1on swarm be transform d t0 . a two-d1mensional swarm? e

i

vECTOR

AND MATRIX MULTIPLICATION

89

carry out the described pr d . An obvious way is to .gh . . oce ure tw1ce over usin f different vectors o we1 tmg coeffic1ents u'1 and , g tWO b · d . ' U2, say. Two vectors , and y'2 of seores are o tame , each w1th n element Th ' Y1 s. us each of the n ' · .lll ts now has two seores which can be treated as the e d. . po . . oor mates of a pomt . . two-dimens10nal space enabling the data to be plott d 1l1 e ~m~mry scatter diagram. Data Matrix # 7 again It has 1 d b To illustrate, consider . . . . · a rea y een condensed to a one-d1mens10nal list of seores m two different ways. The first condensation use? the vector (1, 1, 1, 1) = uí, and gave the result (118, 183, 83, 150, í30, 124) = y{. The second used the vector (4, 3, 2, 1) = u' 2 and gave the result (335, 340, 221, 400, 234, 375) = YÍ· It is straightforward to combine these two sets of results. W e let quadrat 1 be represented by the pair of seores (118, 335), quadrat 2 by the pair of seores (183, 340), and so on. Eaeh pair of seores is treated as a pair of coordinates and the points are plotted in a two-dimensional coordinate frame, with the first seores measured along the abscissa and the second seores along the ordinate. The result, a two-dimensional ordination, is shown in Figure 3.1.

•

400

• •

• 300

•

• 200

º

'ºº

zoo Y1 e transformation of their original Figure 3.1. The data points of Data Matrix # 7 af~er th . two-space. The solid dots show eoord'mates in four-space given in Tab1e 3·1' t0 coordinates .b dm·n the text. The h0 11ow half-dots th . ' · · al d ta descn e 1 . Id d by vectors e two-d1mensional ordination of the ongm s of the data y1e e . · al ordinatwn on the Yi and y axes show the two one-d1menswn u'1 2 and u'2, respectively.

ª

. ·

TRANSFORMING DATA

90

MAlRtc~

The two operations just performed on data matrix X could be . repre sented symbolically by the two equat10ns · U1'X

and

= Y1

I

'X-y' U2 2·

However, there is a still more compact representation, namely,

UX=Y

(3.2)

Here U has two rows, the first being ui and the second 2 X 4 matrix

u=(!

1

1

3

2

1) = (

1

uí. That is, U is the

Uu U21

N otice that the 2 X 4 matrix U is denoted by a capital letter since lowercase letters are reserved for vectors. Also, the elements of U now require a pair of subscripts to define their locations in the matrix; the first subscript specifies the row and the second the column. Likewise, matrix Y in Equation (3.2) is a matrix with two rows and six columns. It is

y= ( 118 335

=

(Yu Yi1

183 340

83

221

Y12

Yu

Y22

Y23

150 400

130 234

124) 375

To generalize: Equation (3.2) specifies that X is to be premultiplied by V to give Y. Suppose data matrix X were of size s x n. Then writing (3.2) with the dimensions of each matrix shown gives

U

X =

(2 X s) ( s X n)

Y (2 X n)

The factors are so ordered (UX, not XU) that the number of columns in tbe first factor is equal to the number of rows in the second · both are s. The ' and the saJll e pro duct Y has the same number of rows as the first factor number of columns as the second factor. W e can gener alize further. So far we have discussed one-dimensIO · nal ordination and two-dimensional ordination. There is no need to stop at tWº

/ VECTOR AND MATRIX M U LTIPLICATION 91

. . dimensions. I t is true that a three-dirnensio na1 ordmat . dimensional swarm of points that can onl b ion y1elds a three. fi Y e plotted 0 perspect1ve gure or as three separate two-a· . n paper as a . d· · . 1Illens1onal gra h . hi h dimens10na1or mat10ns requue even more two-a· . P s, g er1Illens10nal g h f h . . . _rap s or t err portrayal. However, the process of ordinating a · mu1tidunens10n 1 data points obVIously entails a trade-off One m t b ª swarm of · . · us a1anee the ad of condensmg the data agamst the disadvantages f .fi . . vantages . o sacn cmg mfor t" It is often desirable to keep more than three d · . . ma wn. . . rrnens10ns m transformed data, eqmvalently, to do a more-than-three-dimensional a· . · · · d · Ch or mat10n The top1c 1s discusse m apter 4.. For the present ' let us cons1·aer the symbolic · . . . representat10n of a p-d1mens10nal ordination of an s x n data matrix. · X. . . The requued r~presentat10n has already been given: it is Equation (3.2), unchanged. A difference appears only if the sizes of the matrices are shown. We then have U

X

=

(pXs) (sXn)

Y

.

(3.3)

(pXn)

Each of the p rows of U is a set of s weighting coefficients (how to find numerical values for these coefficients, objectively, is considered in Chapter 4). Each of the n columns of Y is a set of p seores for one of the quadrats; treating these seores as coordinates permits the points to be plotted (conceptually) in a p-dimensional coordinate frame. The rule for calculating the value of the (i, j)th element of Y, tha~ is, the i th score of the j th poin t, is summed up in the equation s

Yij

= U¡¡X¡j

+ ui2x2j + ...

+u¡sXsj

=

2: U¡,Xrj•

r=l

· · d ts of the n words the ( i j)th element of Y is the sum of the pairwise pro uc lements' in the' ith row of the first factor, U, and the Jth column of the second factor X ' · hink 0 f each of the p rows of u .Equivalently, as was just done, one can t t y' One w of y as a row vec or . . d s a row vector u', and the correspon mg ro . p· ally the p h . . · 'X ' p separate tlffies. m ' en performs the mult1plicat10n u = Y ' ther to give ectors y', each with n elements, are stacked on top of one ano he P X n matrix Y.

TRANSFORMING DATA 92

MA¡~IC~

Linear Transformations In what follows, a matr~ of size s X n, that is, with s rows and n colurnni is called an s X n matnx. lt was shown in Equation ~3.3) that when an ~ X n data matfix Xu premultiplied by a p x s matnx U, th~ pro_duct ~IS a _P X n matrix. Now matrix x specifies the locations of n pomts m s-d1mens10nal space (s-spa for short). Indeed, each column of X is a list of the s coordinates of one ~ the points. Likewise, Y specifies the locations of n points in p-space; eac of its columns is a list of the p coordinates of one of the points. We can, therefore, regard Y as an altered form of X. Both matrice amount to instructions for mapping the same swarm of n points. X map them in s-space; Y maps the same points in p-space. Therefore, if p < s, the p-dimensional swarm of points whose coordinates are given by the columns of Y is a "compressed" version of the original s-dimensional swarm of points whose coordinates were given by the columns of X. In other words, premultiplying X by the p X s matrix U has the effect of condensing the data and, inevitably, of obliterating sorne of the information the original data contained. Now suppose that p = s or, equivalently, that U is an s X s matrix (a square matrix). Premultiplying X by U no longer condenses the data since t?e product Y is, like the original X, an s X n matrix. But the premultiplica· ~10n d~es affect the shape of the swarm represented by X, and it ii mt~restmg ~o see how a very simple swarm is affected, geometrically, by a vanety of d1fferent versions of the "transforming" matrix u. To make the demonstration as clear as possible, we shall put

10

1

1

10

10) . 10

Thus X is a 2 x 4 t · . . ma nx representmg a swarm of n = 4 points in s === two· space. The pomts are at th We now evaluate UX e· corners of a square (see Figure 3.2a ). · s are given in the followin u~mg_ seve~al different Us. The numerical equat1~J1P and is left as the s b lgX, X IS wntten out in full only in the first equattº. ym o subse 1 · 111s the matrix u h . quent y. The first factor in each equatt 0 . w ose effect is b · . d 10 Figure 3.2 All th emg exammed. The results are plotte . · e transformar ·1 ts 10 spaces of more than t d" IOns 1 lustrated have their counterpar 0 wo imensions, of course, but these are difficult (wbe

VECTOR AND MATRIX MULTIPLICATION

c----

93

(b)

1

"f

D

o A---8

o

10

(e)

(f)

.~

Vº

e

t

Figure 3.2. (a) The four data points of the matrix X and also of IX = X (see text). ( b)-(f). The same data points after transformation by the five different 2 X 2 matrices given in the text. The lines joining points A, B, C, and D havé been drawn to emphasize the shape of the "swarm" of four points.

s = 3) or impossible (when s rel="nofollow"> 3) to draw. The reader should experiment with other Usas well.

(a) .

10

(~

1

10 1 10 10)=(1

1

10 1

1 10

10) 10 .

. d ted by I which has ls on ' b 1 Here U is the so-called identity matrzx, always eno the main diagonal (top left to bottom right) and Os elsewher\In 1sym0 ; the equation is, therefore, IX = X. It is apparent that premulup ica ion

~

by 1 leaves X, and the square it represents, unchanged.

(b)

(02

o1.2 )x (1.0 1.2 =

20.0 1.2

2 12

20). 12

TRANSFORMING DATA M

AlR1c~

94

. t · . that is ' its only nonzero elements are tho This U is a diagona / ma rzx' . se on . d·agonal lt is seen that each row of y - UX is the correspondih 1 the mam · 1 t · th "1& e same row of lJ . li d b the single nonzero e 1emenf m row of X mul tlp e Y h etrical effect is to change the sea es . o t e two axes, and the becomes a rectangle. Obviously, if we had put

!~:i;~º:uare

u=(~ ~) , . . the ongmal square wouId have remained a square but with sides three time as long.

(e)

( 1.5 0.9

0.1 1.0

)x

=

(

1.6 1.9

15.1 10.0

2.5 10.9

16) 19 .

The square is now transformed to a parallelogram.

(d)

( 0.1 1.0

I.5 0.9

)x

=

(

1.6 1.9

2.5 10.9

15 .1 10.0

16) 19 .

The parallelogram is of the same shape as in (e ) but the corners B and Car in terchanged.

(e)

2.0 ( 0.8

-oA)x =

-1.0

(

1.6 -0.2

19.6 7.0

- 2.0 -9 .2

There is no reason why all the elements of U or all the coordinates of th data points should be positive; although no measured species quantities are negative, of course, it is often desirable to convert these measurements to deviations from their mean values, as shown in Section 3.4, and when this i done sorne elements of X must be negative. This example illustrates the effect of setting sorne of the elements of U negative.

(/)

0.8 ( -0.6

o.6)x= (1.4 0.2

0.8

8.6 -5 .2

6.8 7.4

124) .

The original square is still a square and its size is unchanged but it has been . ' f rot~ted · A matnx U that has this effect is known as orthogonal. Because.º the1~ great importance in data transformations orthogonal matrices requúe detalled description in the following. '

VECTOR AND MATRIX MULTIPLICATION 95

first, however, a word on terminology · Ali the o · ously illustrated and others like them in hi h . perat10ns on X previ. .r. . w e U is an X . known as l mear trans1 ormatzons of X. Th . s s matnx are . . li e word linear . li eleroent m Y 1s a near function of the elem imp es that each . li ents of X that . . the eleroents are mu1tlp ed by constants and dd ' is, one m which . a ed but are t 1· · by each other or squared or raised to higher powers.' In othe no mu . . d tlplied xs are said to be m the first degree. r wor s, all the For instance, recall that

In this equation, which expresses y¡ J. as a function of x lJ'. X2)' · · ·, Xs ·, the . 1 factors un, u¡ 2 , •.• , uis are constants. They are mdependent of j. · 1, that . Now suppose that the. original data had been only one-d.1mens10na is, that onl_y one s~ec1es had been observed so that s = l. Then the transformation equat10n

U

X

(sXs) (sXn)

Y

=

(sXn)

would be reduced to

ux' =y' where u is a scalar (an ordinary number) and x' and y' are both n-element row vectors (equivalently, 1 X n matrices). Written out in extenso, the last equation is

To multiply a vector by a scalar, one simply multiplies each / - sepa_rate . y' lS a e1ement of the vector by the scalar. Thus the equat10n ux condensed form of the n separate equations UX¡

=y¡;

UXn

= Yn·

E h . . H the adJ. ective linear to ac of these is the equation of a stra1ght line. ence describe the transformation.

TRANSFORMING DATA

96

MATRtc~

Orthogonal Matrices and Rigid Rotations

It was already mentioned that the 2 X 2 matrix

u= (

0.8 -0.6

0.6) 0.8

is described as orthogonal. W e saw that the transformed data matrD; y = UX specifies a swarm of points with the same pattem as that specified by the original X; the only change brought about by the transformation is that the swarm as a whole has a new position relative to the axes of the coordinate frame. We are, therefore, free to regard the transformation as a movement of the swarm relative to the axes, or of the axes relative to the swarm (see Figure 3.3). In both cases, as the figure shows, the movement consists of a ri~d rotation around the origin of the coordinates. In Figure 3.3b the swarm of points, behaving as a rigid, undeformable unit, has rotated clockwise around the origin. In Figure 3.3c the axes bave rotated counterclockwise around the origin, relative to the swarm. We now enquire: How can a matrix U be constructed so that its only effect on X is to cause a rigid rotation of the data swarm relative to the coordinate axes or vice versa? To answer the question, envisage a single datum point in two-space with coordinates (x 1 , x 2 ). The data matrix is, therefore, the 2 x 1 matrix (or two-element column vector)

(N otice that since x denotes a column vector it is printed as a Iowercase boldface letter without a prime). N ex t suppose th e axes are rotated counterclockwise around the ongiII ·· through angle 8. Let the coordinates of the datum point relative to the new, rotated axes be given by

y=(~~) (see Figure 3.4).

10

c

D

A 0

c

D

10

10

-5 /

B

/

./ ./

./

(b)

./ ./

\

A/

\

/

B

0 \

c

,.......~~~~~~--D

10

\ \

\

,':>\. \

\ (a)

\

(c)

Scanned by CamScanner

Figur~ 3J. (a) Relative to the axes shown as solid lines, the points A, B, C, and D have coordmates given by the columns of

X=

u

10 1

1 10

10) 10 .

Relative t0 th e dashed axes the coordinates are

y

= (

1.4

0.2

8.6

6.8

-5.2

7.4

14) 2 .

. . ( b) shows the axes c show the transformation of X to Y in two different ways. d ered but h . db t the axes move . t e pomts moved; (c) shows the points unaltere u

(b) and ( )

unait

97

98

TRANSFORMINC DATA

MAl~ICl~

We need to find y 1 and Yi in terms of x 1 , x 2 , and () from str . 1 ward geometrical and trigonometrical considerations. ghtfor, Consider Figure 3.4. The points are labeled so that

ª

OA = BP = x1 ;

OB = AP = x 2 ; OR = SP = y 1 ; OS . h d. 1 AM is t e perpen 1cu ar from A to the y 1-axis.

= RP : : Y¿.

Obviously, OR = OM + MR. It is seen that OM = OA cos (} =

X1COS (}'

and MR=MN+NR = AN sin(}+ NP sin(}

= (AN + NP)sin(} = AP sin(} = x 2 sin().

9

\

A

\ \

\

\

\

\

Figure 3.4. lllustrating the co · f . s to coordinates relative t th nversion the coordmates of point P relative to the x-axe o e y-axes.

°

VECTOR AND MATRIX M ULTIPLICATION

Tberefore, Y¡ = X1COS (}

+

X

2

sin O•

3

Exactly analogous arguments (which the reader should h ( .4a) e eck) show that Ji = - X1 sin(/ + X2COS (/ . f (3.4b) Thus the pair o Equations (3.4a) and (3 4b) . f th · · give the y-co d. · · t pomt m erms o e x-coordmates and th or mates of the · · . e angle O w ·f equauons as a smg1e equat10n representin th . · n mg the pair of column vectors gives g e equality of two two-element

Here the left-hand side is y. The right-hand side is · the matnx · product cos (} ( - sin8

sin(}) ( X1) cos8 X2 =Ux

(3 .5)

in which the 2 X 2 matrix U has elements Un = U22

=

COS (};

u12

=

sinO;

U21

= -sin8.

We have now discovered how to construct ali possible 2 X 2 orthogonal matrices. Ali have the form cos (} ( - sin8

8).

sin cos (} '

O is the angle through which the axes are rotated. In the example on page

94, 8 = 36.87º, whence cos (} = 0.8 and sin 8 = 0.6. An important property of orthogonal matrices must now be describe?· F" · · the matnx rrst, a defi.nition is needed: the transpose of any matnx is obt · · I ntly its columns as amed by writing its rows as columns or, equiva e ' rows. For example, the transpose of the 2 X 3 matrix A, where

TRANSFORMING DATA

100

MAr~ICEs

is the 3 X 2 matrix

Obviously, the transpose of an s X n ~atrix (for instance) is an n >< s matrix. The transpose of a matrix is always denoted by the same symbol as the untransposed matrix with a prime added. Thus the transpose of Ais denoted by A', and the transpose of a column vector, say x, is the row vector x'.

Now let us obtain U' the transpose of U in Equation (3.5) and then fonn the product UU'. -sin O)

coso and, therefore, sin O) ( cos O -sin O) sin O coso

coso _ (

cos 2 0 + sin2 0

- sin Ocos O + sin Ocos 8 )

- sinOcos O+ sinOcos O

sin2 0 + cos 2 0

or

UU' since cos 2 0 + sin2 0

=

=

l

(3.6)

l.

Equation (3.6) is true, in general, of all orthogonal matrices of any size. Orthogonal matrices are always square. Before discussing the general s >< s orthogo 1 · · · n~ matru, It is desirable to make a small change in the symbols. The reqmred change · h · · been relabeled thus: is s own m Figure 3.5. It is seen that the angles bave (Jn is the angle between the Y -axis and the

()

.

i2 is . is . 022 is

()

21

. . x 1-axis, the angle between the y -axis and th . . i e x 2 -axis the angle between the . ' Y2-axis and the x 1-axis. the angle between the . ' Ji-axis and the x 2 -axis. i

VfCTOR

AND MA TRIX MUL TIPUCA TION 101 Xz

\

\ \

\

ij,

::x:2

y2

Y2

\ \

\

\

\

\

\

\ \

9 11

/

\

(a)

\

--

________ .-- y1

\ e22

\ - ----

----

:x::I

,/

/

\

__..- - Y1

9 12

\ \

(b)

xi

\

Figure 3.5. The angles between the x-axes and the y-ax . Th . and 012 with the x 1- and xr axes; (b) The Yraxis makese:.n(~) (}e Y1-axis m~es angles 011 g es 21 and 022 with the x1- and xr axes .

It is obvious from the figure that 811

822 is the same as the original O; also

=

that 812

=

90º - 8

or

8 = 90º - 812'.

821

=

90º

+8

or

8 = 821 - 90º.

and

The reason for giving every angle a separate symbol becomes clear when we discuss the s-dimensional case. Consider, now, how the change in symbols affects U. The old and new versions are as follows: sin8) cos

o

=

(

cos 811 cos 821

cos 812) cos 822 .

(3.7)

This result is reached using the relationships: Un = COS

u

Th O. .

12

8=

COS

811;

= sin 8 = sin(90º - 812) = cos 812; . 8 = - sm · (821 - 90º) = cos821; sm

u 21

=

-

U2 2

=

COS 8

=

COS

822 ·

.

ese equations express each u. · as the cosme

'J"

i1

of the corresponding angle

TRANSFORMIN G DATA

102

MAl~tc~

It is now intuitively clear how an s X s orth?gonal matrix U shou}d b constructed (the proof is beyond the scope of this book). Thus e

Equation (3.6), namely, U

U'

=

(s Xs) (s Xs)

1 (s Xs)

remains true. This equation states the diagnostic property of orthogonal matrices: that is, a square matrix U is orthogonal if and only if UU' === l. When U is orthogonal, the transformation

U

X

(sXs) (sXn)

=

Y (s X n)

brings about a rigid rotation of the s-dimensional coordinate axes on which are measured the coordinates of an s-dimensional swarm of n data points; their coordinates are given by the columns of X. The coordinates of the points relative to the new coordinate axes, which are still s-dimensional, are given by the columns of Y. Finally, it should be remarked that the elements of U are known as the direction cosines of the new axes relative to the old. For instance, the element u¡¡ of U, which is U¡¡ = cos ()ii' is the direction cosine of the y¡-axis relative to the x¡-axis in s-space. The problem of finding numerical values for the elements of an orthogo· nal matrix when s > 2 is dealt with in Section 3.4. One cannot, as in the two-dimensional case described in detail previously, simply choose one angle and derive all the elements of U from it. A rotation in s-space requires that s angles be known, and because they are mutually dependent, theY cannot be chosen arbitrarily.

3.3. THE PRODUCT OF A DATA MATR X ANO ITS TRANSPOSE In t~s section we consider matrices of the form XX' and X 'X. These are tbe matnx ?r~ducts formed when a data matrix X is postmultiplied alld premultiplied r t. 1 · X11 ' espec ive y, by the transpose of itself. If X is an s

TRANSFORMING DATA

MATRICES

TABLEJ.2. DATA MATRIX #8, IN RAW FO

------------~

RM X, AND CENTERED X(e)· -

X1

x+~

8 11 5

l!)

10 3 5

(-5

9

-2

X1¡

j

X

2

X2¡

j

X XR =

1 =-t 4 =9 =-41 t =8 =-41 t =4

3

X3¡

j

-1 3 1

1 -5 1

-~)

The SSCP matrix R and the covariance matrix (l/n)R. R = XRXR =

(-5

9

-2

-( -~~

10

~R =

var( x1) cov( x 2 , x 1 ) ( cov(x x 3, 1)

-1 3 1

1

10)

-88 164 -20

cov(x 1 , x 2 ) var(x 2 ) cov(x 3 , x 2 )

-~)(=~

1

-5

9 3

-5 -7

-ri

-2~

cov(x1, x 3 ) ) cov(x 2 , x 3 ) var(x 3 )

(

=

13

-22

-22

41 -5

2.5

2.5)

-5

1.5

Notice that if the right-hand side were divided by n, * it would give the variance of the observations xil, x¡ 2 , ••• , xin' that is, the variance of the variable "quantity of species i per quadrat." lt should be recalled that the variance of a variable is the average of the squared deviations of the observations from their mean. In symbols, the variance of the quantity of species i per quadrat is var(x¡)

1 = -

n

L (xiJ -

n J=l

The standard deviation of this variable, say

-

2

x;) ·

<1;,

is the square root of the

*n is . . . d ts examined constitute the total po ul u~ed as d1v1sor since we are assurrung that the n. qua ra a lar er "parent population" f P ~tion of interest. If the quadrats are a sample of size n from g or Which the variance is to be estimated, the divisor would be n - l.

THE PRODUCT OF A DA TA MATRIX ANO ITS TRANSPOSE

matrix, then XX' is an s x s matrix and X'X an n X n. matrix. These matrices are needed in many ordination procedures as explamed in Chapter 4. For clarity we consider XX' in detail first, and note the analogous properties of X'X subsequently.

The Variance-Covariance Matrix As always, we denote a data matrix by the symbol X. Its elements are in "raw" form ; that is, they are the data as recorded in the field. The (i, J)th element is the quantity of species i in quadrat j; i ranges from 1 tos and j ranges from 1 to n. We now require a centered data matrix XR.* Its (i , j )th element is the amount by which species i in quadrat j deviates from the mean quantity of species i in all n quadrats. Thus the (i, j)th element is x; - X¡ where 1 X¡= (l/n)L)= 1 xiJ. That is, X¡ is the mean quantity of species i averaged over the n quadrats; equivalently, it is the mean of the n elements in the ith row of X. A simple example in which X is a 3 X 4 matrix (Data Matrix # 8) is shown in the upper panel of Table 3.2. Because of the way in which it is constructed, all rows of X R must sum to zero. N ow form the product

where XR is the transpose of XR. R is a square s x s matrix; the ith element on its main diagonal (i.e., its (i, i)th element) is obtained, as usual, by postmultiplying the s-element row vector constituting thé ith row of XR ~y the s-element column vector constituting the ith column of Xá. Since XR is the transpose of XR, these vectors are" the same" except that one is a row vector and the other a column vector. Thus their product is

(x '.1

-

x.1' x z2.

- x-

¡, · · ·, X¡n -

_)

X¡

X¡¡ -

X¡

X¡2 -

X¡

n

L (x;1 - x; )

2

(3.8a)

J=l

*The symbol R is used as subscri t in X described are part of an R-t pal . R and Sa , and also by itself because the procedures YPe an ys1s. '

THE pRODUCT OF A DATA MATRIX ANO ITS TRANSPOSE 105

variance, or var(x I.) =

0 I.2 •

Next consider the (h, i)th element of R Th . h . e switch from the f .li . (. .) symbol pair z, J to t e unfamiliar ( h , i) is b ecause h and ¡ b h arru ar species, namely, the h th and i th species where . h . ot represent . . ' as m t e pa1r ( · ·) hitherto m this chapter, i denotes a species and . z, l as used 1 quadrat. The ( h, i )th element of R is

ª

n

L

(xhj - xh)(xij -

X¡)

(3.8b)

j=l

This is n times the covariance of species h and species i in the n quadrats, written cov(xh, x¡). When two variables (such as species quantities) are observed on each of n sampling units (such as quadrats), the covariance of the variables is the mean of n cross-products. The cross-product for species h and i in quadrat j is the product of the deviation from its mean of the amount of species h in quadrat j (which is xhJ - xh) and the deviation from its mean of the amount of species i in quadrat j (which is x¡ 1 - x¡). For given h and i, there are n such cross-products, one for every quadrat, and their average* is the covariance cov(xh, x¡). There are s(s - 1)/2 covariances altogether, one for every pair of species. Notice that if we put h = i and calculate cov(x.,, x 1.) ' it is identical with var(x¡). Writing R out in full, it is seen that

L(xlj - l\)2 R= (

...

L(x¡j - X¡)(x,j -

[(~:J·~ ·;,)(~1> ~:; ............. i(~,~ ~ ~~;,°

x,))

.. .

. . . n when the n quadrats are 15 the vanance, the divisor used to calculate this average d as a sample from tre~ted as a whole "population ,, and n - 1 when the quadrats are treate Which the covanance · . a larger 'populat10n · is · to be est1mate . d· m

*As

.

With

TRANSFORMIN G DATA MA

106

lR1cls

where ali the summations are from J = 1 to n · R i~ known as a surr¡. · or an SS CP matrzx for short · I t is· 0an1, squares-and-cross-products matnx s X s matrix. Alternatively, one may write var( x 1 ) R

=

n cov(x 2 , x 1 )

cov(x 1, ~2)

cov(x 1 , xs)

var(x 2 )

cov(x 2 , xs)

. . . . . . . . . . . . . . . . . . . . . ... ..........

(3.9)

ª

Observe that when, as here, a matrix is multiplied by scalar (in this case the scalar is n ), it means that each individual ele~ent of the matrix is multiplied by the scalar. Thus the (h, i)th term of R is n cov(xh, X;). R isa syrnmetric matrix since, as is obvious from (3.8b ),

The matrix (1 / n )R is known as the variance-covariance matrix of the data, or often simply as the covariance matrix. The lower panel of Table 3.2 shows the SSCP matrix R and the covariance matrix (1 / n )R for the 3 X 4 data matrix in the upper panel. To calculate the elements of a covariance matrix, one may use the procedure demonstrated in Table 3.2, or the computationally more conve· nient procedure described at the end of this section. The Correlation Matrix

In the raw data matrix X previously discussed the elements are the mea· sured quantities of the di.tferent species in each of a sample of quadrats or other sampling units. Often, it is either necessary or desirable to standardize these data, that is, rescale the measurements to a standard scale. Standardization is necessary if di.tferent species are measured by different methods in noncomparabl · p . . 1; ... g it e uruts. or example ' m vegetat10n samplJJJ · mayb · e converuent to use cover as the measure of quantity for sorne spec1es, ~nd numbers of individuals for other species · there is no ob1ection to usíng mcommensurate u ·t8 h ' J d·zed bef ore analys1.s. ru suc as these provided the data are standar 1

Tt-ff

pRODUCT OF A DATA MATRIX ANO ITS TR ANSPosE

107

Standardization is sometimes desirabl .. d 1 e even wh robers of ind1v1 ua s) are used for the m en the same units ( nu l=J' f . . easurement f e.g., the species . o all species quant· t1·es · It has the euect o . we1ghtmg . accordm . iare species have as b1g an mfluence as cornn-. g to theu rarity so that r . hi . ..uiuon ones o th n e results of an Ordination. Sometunes t s is desirable somet· ' unes not o wish to prevent the common species from d . . · ne may or may not matter of ecological judgment. A thorough di.ºsinin~tmg an analysis. It is a cuss1on of the . . of data standardizat10n has been given by Noy-M . pros and cons (1975). eu, Walker, and Williams The usual way of standardizing, or rescaling th d t . . . e ha a isbto d1v1de the observed measurements on each species after ' they . . ' ave een centered (transformed. t~ deviat10ns fr?m the respective species means), by the standard deviatlon of the spec1es quantities. Thus the element x .. in X is IJ replaced by Jvar(x¡) say. We now denote the standardized matrix by ZR, and examine the product ~RZR. = SR, say. (ZR is the transpose of ZR.) The (h, i)th element of SR is the product of the hth row of ZR postmultiplied by the ith column of ZR. Thus it is

X·in -

CJ¡

f J-l

(xhJ - xh)(xiJ - .X¡)

X·1

TRANSFO RMINC DATA M

108

Al~1c~s

. h ere rhr. is the correlation coefficient between species h and species i tn the n quadrats. . . Observe that the (i , i)th element of SR 1s n. This follows from the fac¡ that cov(x¡, X¡)= var(x¡)· The correlation matrix is obtained by dividing every element of SR by n. Thus W

lt is a symmetric matrix since, obviously,

rhi = r ¡h·

Table 3.3 shows the standardized form ZR of Data Matrix # 8 (see Table 3.2), its SSCP matrix ZRZ~ , and its correlation matrix. The elements of the correlation matrix may be evaluated either by postmultiplying ZR by ZR and dividing by n, or else by dividing the (h , i)th element of the covariance TABLE3.3. COMPUTATION OF THE CORRELATION MATRIX FOR DATA MATRIX #8 (SEE TABLE 3.2). ' The standardized data matrix is

ZR

=

-5

-1

1

5

v'IT

v'IT

v'IT

9

3

-5

m

v'4f

141

-2

141

1

1

v'f5

v'f5

ru

-7

141 o

The SSCP matrix for the standardized data is

sR = zRza

=j -i.sm

The correlation matrix is

2.2646

~~:) 1

= ( -

-3.8117 4 -2.5503

0.~529

0.5661

2.2646)

- ~ . 5503 .

529 - 0.6376

º·{

__?~1~~~6).

--------~----~~~~----

~ooLJCT Of A DATA MATRIX AND ITS TR

rt-IE P

ANSPOSE

109 roa tfÍX, wb.ich is cov(xh,x¡), . . by /var(x h )var ( x . .,,T1ces trom the mam diagonal of th i) ' taking the v 1 va.flOJ-• e covaria a ues of h a.rnple, the (1, 2)th element of the n~e matrix. Thu . t e ex 2/. 'ÍfX4l = - 0.9529. correlation mat . . s m the -2 VlJ -"" nx is r12 _ Yet another way of obtaining the correlaf . ion matnx . . . . nu.rnerical examp1e) it is g1ven by the product is to note that (in the "T.1.

o

/41"" o

o ~

l(

-22

13 -22 2.5

41 -5

_;.sl(~ o

1.5

o fü

o

l

o . {E

In the general case this is the product (1/n)BRB where B is th d· rnatrix whose (i, i)th .element is Jvar(x i.) =a¡. N ot.ice that when e iagonal . . three rnatnces are. to be multiplied (e.g., when the product LMN is · to be found) 1.t malees no difference · . ' 1.t . whether one first obtains. LM . and then postmu1tiplies by N, or first obtams MN and then premultipbes it by L· Ali that matters is · that the order of the factors be preserved. The rule can be extended to the evaluating of a matrix product of any number of factors.

The R Matrix and the Q Matrix Up to this point we have been considering the matrix product obtained when a data matrix or its centered or standardized equivalent is postmultiplied by its transpose. The product, whether it be XX', XRXR., or ZRZR., is of size s X s. Now we discuss the n X n matrix obtained when a data matrix. is premultiplied by its transpose. First, with regard to centering: recall that to form XR, the matrix X was centered by row means; that is, it was centered by subtracting, from every element, the mean of its row. Equivalently, the data were centered by species means since each row of X lists observations on one species. . T B t this time, centenng by o center X' we again center by row means. u 0 f X' ro · ' . d t eans since each row means is equivalent to centenng by qua rad fm of X' w1ll . be deno ted* li w t s s observations on one quadrat. The centere orm . h the quantity by X' Th . h unt by whic xJi' . Q· e(}, i)th element of X 1s t e amo fty 1 of all s species in quadrat J of species i, deviates from the average quan

0

*Th

t

e symbol Q is used because the procedures described are par

of a Q-type

analysis.

TABLE3.4. DATA MATRIX #8 TRANSPOSED, X', ANDTttEN CENTERED BY ROWS, Xo.

X'= r10: 14

X'Q --

17 11 3 1

r-3.67 O 4

7.67

x 1 = 7.67

l) 9.33 3 -3 -5.33

-----------

x2 = 8.00 x3 = 6.00 x4 = 6.33

-5.67)

-3 -1 -2.33

The SSCP matrix Q and the covariance matrix (1 / s )Q

1

;Q =

44.22 15 [ -12.33 - 21.56

15 6 - 2 -3

45 18

-37

-6

-64.67)

-6 -9

26 49

49 92.67

-12.33 -2 8.67 16.33

-9

- 21.56) -3 16.33 30.89

in the quadrat. (If a species is absent from a quadrat, it is treated as "present" with quantity zero.) Table 3.4 (which is analogous to Table 3.2) shows X', the transpose o Data Matrix #8, and its row-centered (quadrat-centered) form XQ in the upper panel. In the lower panel is the SSCP matrix Q = X XQ (here XQ is the transpose of XQ) and the covariance matrix (l/s)Q. 0 . The (j, j)th element of (l/s)Q is the variance of the species quantities in quadrat j. The (j, k)th element is the covariance of the species quantities in quadrats j and k. These elements are denoted, respectively, by var(xj ) and cov(x1, xk); the two symbols j and k both refer to quadrats. Notice that var(x) could also be defined as the variance of the elements in the jth row of XQ, and cov(x1, xk) as the covariance of the elements in its jth and ktb rows. Next XQ is standardized to give ZQ; we then obtain the producl SQ = ZQZQ ZQ is the transpose of ZQ. Finally, is_ 1:; correlatJon matnx whose elements are the correlations between every pi!ll quadrats. Table which is analogous to Table shows the steps in the calculations for Data Matrix # 8.

wh~re

3.5,

(ljs)~

3.3,

RooUCT Of A DATA MATRIX ANO ITS rHf p TRANSPOSE

,,,

TABLEJ.5. COMPUTATION OFTHE pOR THE TRANSPOSE OF DATA MATCORRELATION MA -RIX # 8. TRIX

Tbe standardized form of X' is - 3.67 v'44.22

o

9.33 v'44.22 3

- 5.67 ~ v44.22 -3

16

16

4

-3

v'8.67 7.67 v'30.89

v'8.67 - 5.33 v'30.89

-1 i/8.67 -2.33 v'30.89

The SSCP matrix for the standardized data is 3 2.7626 -1.8900 -1.7497) 2.7626 3 - 0.8321 - 0.6611 Q Q Q 3 2.9948 . ( - 1.8900 - 0.8321 -1.7497 -0.6611 2.9948 3 The correlation matrix is 0.9209 -0.6300 -0.5832 1 -0.2774 -0.2204 1 0.9209 1 0.9983 . -; Q -0.6300 -0.2774 1 ( 1 0.9983 -0.5832 -0.2204

s - Z' z _

l

s -

Table 3.6 shows a tabular comparison of the two procedures just discussed. These procedures constitute the basic operations in an R-type and a Q-type analysis. It is for this reason that the respective SSCP matrices have been denoted by R and Q. Computation of a Covariance Matrix

his subsection is a short digression on the subject of computations. . · 1 · iples and who do not eaders who are concentratmg exclusive Y on pnnc . 3 ish to be distracted by practica! details should skip to Sectwn .4. product 1 · It is the ana ysis. . Consider R, the SSCP matrix used in an R-type hi oduct as it stands, it..is f aXR. Instead of evaluating the elements o t s pr sually more convenient to note that

X RX'R = XX' and

(3.10)

XX'

· ·de of this equation. to evaluate the expression on the nght si

TABLE 3.6. Centered matrix

A COMPARISON BETWEEN PRODUCTS OF THE FORM XX' AND X'X.ª

XR is matrix X centered by rows (species). Its X¡¡ - X¡ where 1 n

(i, j)th term is

X¡= -

L

XQ is matrix X' centered by rows (quadrats). Its (j, i)th term (in row j , column i) is

xiJ.

n J=l

SSCP matrix

Covariance matrix

R = XRXR where X Ris the transpose of X R. R is an s X s matrix; each of its elements is a sum of n squares or cross-products.

Q = X 0XQ where XQ is the transpose of x 0. Q is an n X n matrix; each of its elements is a sum of s squares or cross-products.

.!:_R

lQ s

n The (i, i)th element var(x;) is the variance of the elements in the ith row of XR (quantities of species i). The (h, i)th element cov(xh, X;) is the covariance of the hth and ith rows of XR (quantities of species h and i).

The (j, j)th element var(x) is the variance of the elements in the jth row of X 0 (quantities in quadrat j). The (j, k)th element cov(x./, x;.) is the covariance of the j th and k tb rows of X é, (quantiti es i n qua dra t s _/ a.n d k ).

ZR Standardized

Z'Q

Its (i, j)th term is

matrix

x,, - x, Jvar( x,)

a,

lts (j, i )th term is x1,

-

xj

vvar(xj)

where a, is the standard deviation of the quantities of species i.

where ªJ is the standard deviation of the quantities in quadrat j.

1 1 - SR= -ZRZR n n The (h, i)th element of (l/n)SR is r 17 ¡ , the correlation coefficient between the h th and i th rows of XR (i.e., between species h and species i). The matrix has ls on its main diagonal since rhh = 1 for all h.

The (j, k )tb element of (1/s )S0 is ')k, the correlation coefficient between the jth and k tH rows of X 0 (i.e., between quadrats j and k ). Tbe matrix has ls on its main diagonal since '.iJ = 1 for a11 j.

I

Correlation matrix

ªSymbols h and i refer to species; syrnbols j and k refer to quadrats.

114

TRANSFORMING D AlA~A

Here

X is

an s X n matrix in which every element in the ith

l~I(¡,

.

row is 1 n X¡= X¡J' n ;= . 1

L

X has

the mean over all quadrats of species i. Thus columns. It is

n identical s-elern en

X= with n columns. Hence

nx-21 -

nx- 1x- 2

-

~~2.~1 . . . ~~~

rnxsxl -

-

- nxsx2

-¡

nx- 1xs

-2

-

-

n.x.2 :~

••• '. '.'. • •

...

-2 nxs

which, like XX', is an s X s matrix. . The subtraction in (3.10) is done simply by subtracting every element ~ XX' from the corresponding elemen t in XX', that is, the ( i, j )th element 0 the former from the (i, j)th element of the latter. These computations are illustrated in Table 3.7, in which matrix R f:r Data Matrix # 8 is obtained using the right side of Equation (3.1~). \: table should be compared with Table 3.2 in which R was obtained using 1 left side of (3 ..10). As may be s~en, the results are the s_ame. i)lh Now cons1der the symbolic representation of this result. The (h, element of XRXR is [see Equation (3.8b)) n

L j=l

(X hJ

-

X,;) (X ij

-

X;) .

RODUCT Of A DATA MATRIX ANO ITS T

rt-IE P

RANSPOSE 115

TABLE3.7.

COMPUTATION OF THE sscp

~:~(81;S~G:;:;¡¡~IO~t~XX)~(~::R::; 2

xx' =

(

5

5

4

4

9 9

10

I

=

(

5

=

200

420

256 128

128 64

108

r:)1-(:24 ::: 1:) 70

4~ 4~ 4~ ~ )(~

R =XX ' - XX

3

8 8

52 -88 10

4 4

-88 164 -20

-

288 144

10) -20 6

The (h, i)tb element of XX' - XX' is n

L:

xhjxij -

nxhxi.

j=l

We now show that these two expressions are identical. In what follows, ali sums are from j = 1 to n. Multiplying the factors in brackets in the first expression shows that

Now note that Lx x .. = x I:x 1.. and LX¡XhJ = X¡LXhJ since xh and X¡ are h 11 h 1 ¡ 0 f ·) Similarly onstant with respect to j (i.e., are the same for all va ~e~ 1 · ' Cxhx¡ = nxhx; since it is the sum of n constant terms xhxi. Thus

ext make the substitutions and

"X· . = nX¡· Í-J I}

TRANSFORMING DAT

116

AMAlR1ct~

Then

= '"'Xh ~ J·X·· lj

- nXhX1·

as was to be proved.

3.4. THE EIGENVALUES AND EIGENVECTORS OF SQUARE SYMMETRIC MA TRIX

A

This section resumes the discussion in Section 3.2, where it was shown how the pattem of a swarm of data points can be changed by a linear transformation. It should be recalled that, for illustration, a 2 X 4 data matrix was considered. The swarm of four points it represented were the vertices of a square. Premultiplication of the data matrix by various 2 X 2 matrices brought about changes in the position or shape of the swarm; see Figure 3.2. When the transforrning matrix was orthogonal it caused a rigid rotation of the swarm around the origin of the coordinates (page 94) ; and when the transformation matrix was diagonal it caused a change in the scales of the coordinate axes. Clearly, one can subject a swarm of data points to a sequence of transformations one after another, as is demonstrated in the following. The relevance of the discussion to ecological ordination procedures will become clear subsequently. Rotating and Rescaling a Swarm of Data Points

To begin, consider the following sequence of transformations:

l.

Premultiplication of a data matrix X by an orthogonal matriX (}. (Throughout the rest of this book, the symbol U always denotes ª11 orthogonal matrix.)

2·

Premultiplication of UX by a diagonal matrix A (capital Greek lambda: the reason for using this symbol is explained later).

3 · Premultiplication of AUX by U', the transpose of U, giving VJ\VX·

HE flGE

NVALUES ANº t1uc1H e'""'º"" vt A SQu

ARE SY MMET RIC MATRIX

117

is an example. As in section 3 2 f{ere . · , we us enting a swarm of four pomts in two-sp e a 2 X 4 data mat . pres ace. This time let nx

X= (11 15 14 45) . s the data swarm consists . . of the vertices of a rectangle ( . hu e first transformahon 1s to cause a clockw·1se rotati seef Figure 3.6a) · Th und the origin through an angle of 25 °. The 0 th on the rectangle rO . . r ogonal t· produce this rotat10n 1s (see page 101) ma nx required

°

u - ( cos 25º cos 115º

cos 65º) - ( 0.9063 -0.4226 cos 25º -

0.4226) 0.9063 .

Jt is found that the transformed data matrix is

ux =

(

1.329 0.484

4.954

2.597 3.203

-1.207

6.222) 1.512 .

:X:2

::C2

(a)

5

D 5

(b)

5

:x:,

10

x,

X2

~

(d)

(e)

10

10

5

5

10

15

:x:,

5

'º

. ( d) U' AVX· The lines ~e 16· The data swarms represented by: (a) X; ( b) UX; (e) A~, The eleroents of U and ar:g .the ~oints are put in to make the shapes of the swarms apparen ·

.

given m the text.

TRANSFORMING DAT

118

A"1Al~IC¡1

The swarm of points represented by this matrix, which is mer .. . h . p· 3 e1y tn riginal rectangle in a new pos1tton, 1s s own m 1gure .6b. e o 1 . The second transformation is to be an a terat10n of the coordinate . (th e a b sc1ssa . ) b y a factor of ~ scalei Let us increase the scale on the x 1-mas 1 :::: 4· and on the x 2 -axis (the ordinate) by a factor of A2 = 1.6. This is equival~~ to putting

Then AUX

= (

3.189 0.774

11.890 -1.931

6 .232 5.124

14.933) 2.419 .

The newly transformed data are plotted in Figure 3.6c. The shape of the swarm has changed from ~ rectangle to a parallelogram. The third and last transformation consists in rotating the parallelogram back, counterclockwise, through 25º. This is achieved by premultiplying AUX by U', the transpose of U. It is found that U'AUX

= (

2.563 2.049

11.592 3.275

3.483 7 .278

12.511) 8.504 .

These points are plotted in Figure 3.6d. Now observe that we could have achieved the same result by premultiply· ing the original X by the matrix A where

A= U'AU = ( 2.2571 0 .3064

0 .3064) 1.7429 .

(3 .11)

Observe that A is a square symmetric matrix. We n~w make the following assertion, without proof. Any s X s square symmetnc matrix A is the product of three factors that may be writt~ U'Al!; U and its transpose U' are orthogonal matrices; A is a diagoil d matnx. In the general d. · s all . s- 1mens1onal (or s-species) case all three factor A itself are s X s matrices. ' A (Thle elements on the main diagonal of A are known as the eigenvafues o ª so called the latent va¡ues or roots, or characteristic values or roots, 0

HE EIG

ENVALUES AND EIGENVECTORS OF A SQUA

RE SY MMETRIC MA TRIX

). That is, since A1

A= [

o

O

119

l

¿.. ~: ..::·... ~: ,

. are the eigenvalues of A are A1,. A2, ... ' As. The eigenvalues of ama tnx 1uAenoted by AS by long-established custom· likewise the matrix· f · . . ' ' o eigenva1ues is always denoted ?Y A. This IS why A was used for the diagonal matrix that rescaled the axes m the second of the three transformations performed previously. The rows of U, which are s-element row vectors, are known as the eigenvectors of A ( also called the latent vectors, or characteristic vectors, of A).

In the preceding numerical example we chose the elements of U (hence of U') and A, and then obtained A by forming the product U'AU. Therefore, we knew, because we had chosen them, the eigenvalues and eigenvectors of this A in advance. The eigenvalues are A1 = 2.4 and °"A. 2 = 1.6. And the eigenvectors are

ui

= (

0.9063

0.4226)

and u'2 = ( -0.4226

0.9063 ),

the two rows of U. . tnx· A and .h h e symmetnc ma Now suppose we had started wit t e squar ·ble to W ld it have been possI had not known the elements of U and A · ou · yes The · f A? The answer IS A _ u 'AU in determine U and A knowing only the elements 0 · . . d U d A such that ' analys1s which, starting with A, fin s .ªn n eigenanalysis. In which U is orthogonal and A diagonal, IS kn~wn as s forros the heart . . d an eigenana1ysI nearlY all ecological ordmat10n proce ures, of the computations, as shown in Chapter 4 · . hich this may be · · f one way in w The next step here is a demonstrat10n d and then w1th ª d . · ly constructe . d one, first with the 2 x 2 matnx A previous d be generalize to 3 . · h h metho can >< 3 symmetric matrix. The way m whic t e . .th s > 3 will then be Per · · trie matnx wi rrut e1genanalysis of an s X s symme

ª·

°

120

clear. Of course, with large s the computations exhaust the paf ience or anything but a computer.

Hotelling's Method of Eigenanalysis To begin, recall Equation (3.11) A= U'AU and premultiply both sides by U to give UA = uu~u. N ow observe that, since U is orthogonal, UU' = 1 by the definition of an orthogonal matrix. Therefore,

UA

=

IAU

=

AU.

Let us write this out in full for the s = 2 case. For U and A we write each separate element in the customary way, using the corresponding lowercase letter subscripted to show the row and column of the element. For A we use the knowledge we already possess, namely, that it is a diagonal matrix. Thus Ü )

A2

(Un

U21

On multiplying out both sides, this becomes Un ªn (

U21 ªn

+ U12 ª21 + U22 ª21

which states the equality of two 2 X 2 matrices. N ot only does the left side (as a whole) equal the right side (as a whole), but it follows also that anY row of the matrix on the left side equals the corresponding row of tbe matrix on the right side. Thus considering only the t9p row,

(unan+ U12ª21'

Unª12

+ U12ª22) =

(A. 1un,

A1U1z) ,

an equation having two-element row vectors on both sides. This is the same as the more concise equation

¡Jff LIG

ENVALUES ANO EIGENVECTORS OF A

SQUARE S\'M

METRIC MATRIX

.

121

.ch ui 1s the two-element row vect in wl1l or con . . nonzero element in tl ti stituting the fi an d "A i is the only . le rst r rst row of U ther are an e1genvalue of A and its e ow of A. Hen , , ioge orresponct· . ce 1\1 and ' tJotelling's method for obtaining the . ing e1genvect U1 r1 nuinenc l or. ments of ui when the elements of A ar . values of A d ele . e g1ven p i an the are illustrated usmg roceeds as foll stePs ows. The

ª

A= ( 2.2571 0.3064

0.3064) 1.7429 '

the symmetric matrix whose factors we alread Yk now. Step J. Choose arbitrary tria! values for the e1ements of u' D . . tnal vector by w(Ó)· It is convenient to t . l· enote this s art with w(Ó) = (l, l). / Step 2. F orm th e prod uct Wco) A. Thus ( 1,

1 )A = ( 2.5635,

2.0493 ).

Let the largest element on the right be denoted by 11. Thus 11 = 2.5635. Step 3. Divide each element in the vector on the right by /1 to give 2.5635( 1,

0.7994) = l1wó_),

say. Now wci) is to be used in place of WcÓ) as the next trial vector. Step 4. Do steps 2 and 3 again with wá) in place of WcÓ), and obtain /2 and Wá)· Continue the cycle of operations (steps 2 and 3) until a trial vector is obtained that is exactly equal, within a chosen number ~f dec~al P,~aces, ,~º the preceding one. Denote this vector by wcf-> (the subscnpt F 18 for final ). ~ben the elements of w' are proportional to the elements of ui the first e1g (F) · ' A 15 · ual to ~ 1 the envector of A and / the largest element m WcF) , eq ' ~ ' p, rgest of A. Th eigenvalue . d . h f0 ur decimal places us m the example at the nineteenth cycle an wit \Ve obtain ' ( 1,

0.4663 )A

=

(

2.4000,

1.1191)

=

2.4000( 1,

0.4663 ).

122

TRANSFORMINC DATA

"-'Al~IC¡ That is,

w

= (

1,

0.4663)

and

lp = ~l = 2.4000. We now wish to obtain uí from w<~· Recall (page 102) that UU 1 what comes to the same thing, that the sum of squares of the element~ ~ any row of U is l. Hence uí is obtained from w<~ by dividing each elemen in w<~ by the square root of the sum of squares of its elements. That is, 1 ::::

u;

= (

= (

1i2 + ~.46332 0.9063,

0.4633 ) 2 /1 + o.4633

,

2

0.4226 ).

These steps are summarized in Table 3.8. Having obtained ~ 1 and uí, the first eigenvalue and eigenvector of A, iti easy (when s = 2) to obtain the second pair. TABLE3.8. EIGENANALYSISOFTHE2 X 2MATRIXA BY HOTELLING'S METHOD.ª Cycle Number

Tri al Eigenvector

w{i)

o

I

W(i)

A -- I i+IW(i+l) I

(1,

1)

(2.5635,

2.0493) = 2.5635(1,

0.7994)

(1,

0.7994)

(2.5020,

1.6997) = 2.5020(1,

0.6793)

(1,

0.6793)

(2.4652,

1.4904) = 2.4652(1,

0.6046)

19

(1,

0.4663)

20

(2.4000,

1.1191) = 2.4000(1,

0.4663)

(1,

0.4663)

(2.4000,

1.1191) = 2.4000(1,

0.4663)

1 2

Hence A1

=

2.4000

and ui is proportional to (1, ªGiven A = ( 2.2571 0.3064

0.3064) 1.7429 .

0.4663).

fNVALUl:~ ANLJ CIUC l"IVtl...IURS

rHE flG

OF

A SQuARE SYM METRIC MATRIX 123

We knOW (page 101) that U has the form U

an

= (

sin o)

c?s () -sm()

coso

d we have just obtained uí the first row f U o Which is 0

í

=

(0.9063,

0.4226).

Therefore,

u= (

0.9063 -0.4226

0.4226) 0.9063

and u'2 is the second row of U. To find A2 , the eigenvalue corre3ponding to u;, recall Equation (3.ll), namely, A= Ul\U. Premultiply both si des by U and then postmultiply both sides by U'. Hence

UAU' = UU1\UU'. Since U is orthogonal, UU'

=

I; therefore,

UAU' = IAI = A.

(3.12)

Hence we may find A by evaluating UAU'. lt is found that 2

UAU'

=

(o2.4

O ). 1.6

Thus A1 = 2.4 (as was found before) and A1 = 1.6. h d to a 3 X 3 matrix, . 1 Next consider the application of H otelling's met od with a numenca sa B te the proce ure Y · Again it is convenient to demonstra example. Let

B

=

{~-º ~-2

0.2 5.6

-0.4

2.4)

-0.4 . 5.2

TRANSFORM ING DA 124

TA MAl~ICt\

. b tinding Ai and ui as was done with a 2 X 2 matrix. As b Begm Y . Th t · ef "th the trial matnx (1, 1, 1). e compu at10ns proceed as f 0rel start w1 . oU0 decimal places are shown, but 12 were used to obtam these w (on1y 3 results). (1, ( 1,

l)B=(8.6,

1,

0.628,

5.4,

7.2)=8 .6( 1,

0.628,

o. 837 )

0.837 )B =

(8.135,

3.381,

6.502) = 8.135(1,

0.416 ,

0.799);

• • . • . . • . • • . • • • . • • • • • • • • • • • • • • • • • • • • • • • • . . . . 1.

( 1,

0.854 )B =

-0.058 ,

-0.466 ,

( 8.038,

6.864) = 8.038( 1,

-0.058 ,

0.854).

Therefore,

A1 = 8.038, and ui is proportional to (1, - 0.058, 0.854). Dividing through the elements of this last vector by

/i

2

2

+ ( -0 .058) + 0.854 2 = 1.316

shows that

ui

= (

0.760,

- 0.044,

0.649 ).

Now, to find the second eigenvalue and eigenvector, A2 and u;, proceed as follows. Start by constructing a new matrix B · it is known as the first 1 residual matrix of B and is given by ' B¡

=

B -

=

0.760) B - 3 .o 33 -0.044 ( 0.760,

=

A1U1Ui

(

6.0 0.2 ( 2.4

= (

0.649

0.2 5.6 -0.4

1.361 0.469 -1.562

-0 .044,

2.4) ( 4.639 -0.4 - - 0.269 5 .2 3.962 0.469

4.035 -0 .170

-1.562) -0.170 . 1.817

0.649)

-0 .269 1.565 -0.230

3.962) -0.230 3.383

ft·IE EIG

ENVALUES AND EIGENVECTORS OF A

.

n1

.

A. 2 = 5.671 . .

and

SQUARE SYM

METRIC MATRIX 125

Note: this is o y approximate; for accurate r esults at the next step m (more decimal places would be needed ·) The values of A2 and u'2 may now b b . ' any A d ' . e o tallled f sarne way as 1 an "1 were obtamed from B. lt . rom B1 in exactly th is found that e u'2

= (

o. 144

'

0.984,

-0.102).

finally, smce B is a 3 X 3 matrix there ;_ 5 a third . pair still to be found, A3 and u3. To find th e1genvalue-eigenvector residual matrix of B from the equation em, compute B2 the second

or, equivalently,

and operate on B2 in exactly the same way as B and B1 were operated on. It is found that

A. 3

=

3.092

and

u'3

= (

-0.634,

0.171,

0.754).

The eigenanalysis of B is now complete. As a check, recall that, applying Equation (3.12), we should have

UBU' =A. Substituting the numerical results just obtained in the left side of this equation gives

UBU' =

(

0.760

0.144 -0 .634

( X

= (Tb .

0.760

-0.044 0.649

8.043 -0.001 (

o

-0.044 0.984 0.171 0.144 0.984 -0.102 -0.001 5.667 0.001

(6·º

0.2 5.6

0.649) 0.2 -0.102 0.754 2.4

-0.4

2.4) -0.4

5.2

-0.634) 0.171 0.754

O ) 0.001 3.091

~

( 8.0

33

o

o

~.o671 ~3.092 ) =A. d

. al Jaces were use .)

e mexactness is because only three decnn

P

TRANSFORMING DATA

126

MATRICts

' I t should now be clear how a square symmetric matrix of any s· . . ' s method . The eigenva . 1ues are always obtainect tze .is analyzed by Hotelling decreasing order of magnitude; that is, given an s X s matrix, we alway~ have A. 1 > A. 2 > · · · > A. s . A way of reducing the large numbers of computational cycles that are often needed to find each eigenvalue- eigenvector pair is outlined in Tatsuoka (1971); it is beyond the scope of this book. Finally, it is worth noticing, though it is not proved here, that the sum of the eigenvalues is always equal to the sum of the diagonal terms of the matrix analyzed, which is known as the trace of the matrix. Thus the trace of the 2 X 2 matrix A analyzed before is tr(A)

=

a 11 + a 22 = 2.2571 + 1.7429 = 4

and 2

L A¡ = 2.4 + 1.6 = 4. i=l

F or the 3 X 3 matrix B we have tr(B) = b11

+ b22 + b33 = 6.0 + 5.6 + 5.2 = 16.8

and 3

LA¡= 8.038

+ 5.671 + 3.092 = 16.801

i=l

(the discrepancy is merely a rounding error).

3.5.

THE EIGENANALYSIS OF XX' AND X'X

It was remark.ed m · section · 3.3 that, in ordinating ecolog1cal . data, 011e 11 frequently begms by forming the product XX' or X'X where X is an s '/... data matrix · Ob VIous · 1y, th ese two products are related ' in sorne way. Eacb 15 *Th e theoret.Ical · · a pau . of equal eigenvalues is ignored lll · thí5 eJeIJJ eotaíY 1 ty of finding . . possi'b'li dlSCUSSlOil.

~~

oduct of the same two factors and

Let us put F

=

1~

1 ~Y~m~

of the factors differs.

XX' and G = X'X.

Q are not used here sm· R (The syrobols R and · ce and Q w-centered matnces; see Table 3.6. The ro f are the products f ro ws o X and X, o in forrning the products F and G.) are not centered Clearly, F is a symmetric s x s matrix d G . an is a s . rnatrix. Suppose we were to do eigenanalyses 00 both F Ymmetnc n x n the results be related? and G. How would We first answer this question in symbols and th . en exanune a nu . 1 example. At every step the reader should check th t th . menea side of an equals sign are of the same size. e matnces on each

ª

Let "A¡ be the. i th. eigenvalue of F ' and let the s-element row vector u'. be the corresponding e1genvector. I t follows that '

(3.13) or, equivalently,

¡~

!~i

l

Postmultiply both sides of this equation by X to give

Then, since X'X = G by definition, the equation becomes ( u~X)G

=

A¡ (u~X) ·

(3.14)

The factor u'X on both sides is an n-element row v~ctor. · value of Gas z • • • h t A. is an eigen Comparing (3.13) and (3.14), it is ev1dent t a i f Gis either equal to wU d. ·genvector o e as of F, and that the correspon mg ei or proportional to u'.1 X. b derived froro those of Thus the eigenvalues . t rs of G can e espec1allY . 1·r ei"ther .n • 0 and e1genvec · F · ·d t omputation, . alys1s or vice versa. This fact is a great 1 0 e A direct eigenan . or s · tly exceeds s· f ons 1f n o is very large. Thus suppose n grea . ve long computa I f the n X n matnx . G = X'X would entail ry

ª

TRANSFORMINC DATA M

Al~ICts

128

e results could be obtained much faster by analyzin were large. Th e sam g the · F -- XX' · smaller s X s matnx · are g1ven. here to 3 Now cons1.der a numerical example. . The results . . . decimal p1aces, although 12 were used m the ongmal computations. The 2 x 3 data matrix is

8 2

X= (1i Then F =XX'= ( 122 105

and

105 ) 189

G

=

X 'X

=

130 46 ( 109

46 68

72

109)

72 . 113

The first eigenvalue and eigenvector of F, which can be found by Hotelling's method (or by other methods not described in this book), are ;\ 1 =265.714

and

ui=(0.590,

0.807).

This result can be checked by evaluating both sides of Equation (3.13) and finding that uiF = A1ui = (156.755,

214.552).

From the previous argument we know that A¡ = 265.714 is also a eigenvalue of G, and that the corresponding eigenvector is proportional t uiX which is 8 2

~) =

(

10.652,

6.334,

10.589 ).

liZ Then to find this eigenvect , · . the vector u'1 X· thi 18 · d or, say V1, of G lt is only necessary to norma ' s one by dividing every element in u' X by the squar root of the sum of f· 1 squares 0 lts elements, namely, 16.301. Thus v{ = ( 0.653,

0.389,

0.650 ),

and, as is necessary f or an e. t unity. igenvector, the squares of its elements sull1

129

Ví are an eigenvalUe-e1g.

ns a check that ;\1 and A

that

envector pair of G

, note

v{G

=

;\1ví

= (

173 .631,

103.255,

172.610 ) .

finding the second eigenvalue of both F and ective eigenvectors u'2 and v{ is straightf G, say "-2, and their res P . . . orward Th . arises as to what is the third e1genvalue A. of th 3 · e _quest10n now 3 2 matrix F has no third eigenvalue ~he a e x_ matnx G, since the . . · nswer is A. = o F 2X eiaenanalys1s of G gives as the third eigenvector , 3 .' urther, an º V3 correspondmg to A 3'

v; =

( -

0.4S 6 ,

-0.483,

0.748)

and it will be found that

v{G = A.3G = (O,

O,

o)

(disregarding minar rounding errors). To summarize: suppose F = XX' is an s X s matrix and G = X'X is an n X n matrix. Suppose n > s and let n - s = d. Then F and G have s identical eigenvalues and the remaining d eigenvalues of G are all zero. The seigenvectors of G that belong with its nonzero eigenvalues can be found from the corresponding eigenvectors of F. To do this, note that the ith eigenvector of G, say, v/, is proportional to u~X, where u~ is the ith eigenvector of F. The elements of v/ are obtained by normalizing the elements of the vector u~ X.

EXERCISES .l.

Consider the following three matrices:

A=

C=

uo

1). -1 '

2

[-!

1 1 2 6

B= (

j

-2

1 -1

o

1

o 3

-!);

130

What are the following products? (a) AB; (b) BC; (e) AC; (d) C .

A, (e)

CB; (f) BCA; (g) CAB. 3.2.

See Figure 3.7. Four data points, A, B, C, and D, have coorct · 1nates · two-space given by the data matrix X, where in

B 2 1

e

D

-1).

2

-1

-1

Find the 2 X 2 transformation matrices U1 , U2 , and U3 that wi respectively, transform X into Y1 , Y2 , and Y3 , as shown graphically ¡ separate coordinate frames in Figure 3.7. (The coordinates of ali th poin ts are shown on the axes in the figure.) 3.3.

Which of the following matrices U1 and U2 is orthogonal?

-0.61546 0.50709 -0.51412

( 0.49237 0.84515 0.30985

U1 =

(

U2 =

0.61546) -0 .16903 0.77814

-0.61546 0.50709 0.60338

0.49237 0.84515 -0.20806

0.61546) -0.16903 0.76983

B A

4

4 D

2

o

o e

- 1

-2

-4 D

-5 - 3

-1

3

5

-6

e -3

-1

-7

o

2 -2

y

y2

1

Figure 3.7.

See Exercise 3.2.

y3

J.4. J.5.

131

Prove that XX' is symmetric. Given the 2 X 3 data matrix X, where 4 1

find the 2 X 2 correlation matrix. 6 B. ·

Suppose A

A IS.

=

UAU ', where U is orthogonal and th d'

. e iagonal matnx

whose diagonal elements are the eigenvalues of A. What are the eigenvalues of A5 ? 7.

Eigenanalysis of a 5 X 5 matrix showed that its first eigenvector was proportional to (1, 0.87, 0.63, - 0.20, - 0.11). What is this eigenvector in normalized form?

8.

Eigenanalysis of the following matrix

s=

r

10.16 6.16 -7.48 0.24

6.16 5.36 -4.28 -1.16

-7.48 -4.28 5.84 -0.12

0.241 -1.16 -0.12 1.36

shows that the first, third, and fourth eigenvalues are

i\

=

A4

19.71;

What is A. 2? [Note: there is no need to

=

0.02.

d 0 an eigenanalysis of S to

answer the question.]

. le of your own . h a numenca1examp . XY)' . the transpose of XY, Show, using symbols, and test wit

devising, that (XV)' = Y'X'. _:'he;e

i

Show, likewise, that (ABC) - C B needed in Chapter 4.]

[N::e: these results will be . The second eigenvector

8. . 1 on page 12 . 1 e and Consider the numencal examp e the second eigenva u of F =XX' is A2 = 45.285. What are eigenvector of G = X 'X?

Chapter Four

Ordination 4.1.

INTRODUCTION

Ordination is a procedure for adapting a multidimensional swarm of data points in such a way that when it is projected onto a two-space (such as a sheet of paper) any intrinsic pattem the swarm may possess becomes apparent. Severa! different projections onto differently oriented two-spaces may be necessary to revea! all the intrinsic pattern. Projections onto three-spaces to give solid three-dimensional representations of the data can also be made but the results, when reproduced on paper as perspective drawings, are often unclear unless the data points are very few in number. This definition of ordination may, at first glance, appear to contradict that given at the beginning of Chapter 3. According to the earlier de~nition, ordination consists in assigning suitably chosen weights to the different . . . h " " can be calculated for spec1es m a many-species commumty so t at a score d h adrats can be ordere . Th each quadrat (or other sampling umt). en t e qu . . 1 " d h lt is a one-dunens10na ( ordinated") according to their seores, an t e resu stems ordination · · . ·~ nt species-we1ghtmg sy , . · are obtained. Often one wants to use two or more di ere . · al ordmat10ns an d then two or more different one-dimenswn. h wn in Figure 3.1 Th . 1 b mbmed as s o . d h ve been combme . ese separate results can convement Y e co . . 1l1 Ch · · 1 ordmat10ns apter 3, where two one-dunens1ona

ª

133

134

ORD1NAl1a~

to give a two-dimensional ordin~tion in which every ?ºint (quadrat) ha 1·t coordina tes two seores obtamed from the two dtfferent weighr s as

s 1eros.

8Ys, Obviously' if one were to use s different weighting systems (whlllg . . ere .

the number of species), the result would be an s-d1mens10nal ordinar .8 is ion , the . . 1coordinate fram swarm of data points would occupy an s- d1mens10na . h . . 1 eand by projecting the pomts ~nto. eac ~s m tum, one cou d recreate each of the ene-dimensional ord~nat10n~ y1elded by one o~ th~ chosen species. weighting systems. More mterestmgly, one could proJect 1t onto one of h · o f axes an d ob tam · a two-ctun· te two-dimensional planes defined by a pair en. sional ordination. There are s(~ - 1)/2 such pla~es, hence s(s _ l)/ 2 different two-dimensional ordinattons would be poss1ble. In practice, proba. bly only a few of them would be interesting. N ow let us consider how species-weighting systems can best be devised. Clearly, if they are chosen arbitrarily and subjectively, there are infinitely many possibilities. What is required is an objective set of rules for assigning weights to the species. A way of arriving at such a set of rules is simply to treat every species in the same way, and conceptually plot the data in s-space; the result is the familiar swarm of n data points in which the coordinates of each point (representing a quadrat) are the amounts it contains of each of the s species. One then treats the swarm as a sin~e entity and adapts it (e.g., by one of the methods described in the following) in a way that seems likely to reveal the intrinsic pattem of the swarm, if it has one, when it is projected onto visualizable two or three-dimensional spaces. Many methods of adapting a swarm of raw data points have been invented and sorne of them are described in succeeding sections of tbis chapter. What unites the methods is that each amounts to a technique for adapting raw observational data in a way that makes them (or is intended to make them) more understandable. Since the initial output of each method is an "adapted" swarm in s dimensions, we again have n data points eacb with_ s coordinates; each coordinate of each point is a function of the spec1es qua~tities. in the quadrat represented by that point. The relattonship of the two definitions of ordination should now be cle~r. When the n columns of an s-row data matrix are plotted as n points 111 s-space, the patt~m of the swarm can be changed by assigning diff~reo! sco~es to the ~pec1es (equivalently, by multiplying each row of the matnx bY a different we1ghf1n f t ) 0 . und, g ac or · r, looking at the process the other way ro t he swarm as a whol b . ore e can e modified (to make its interna! pattern 111

¡NfflO oucr10N

ns

erceptible) and, provided this is _done appropriately, the effect is to

cJearlY P t weights . differen .. to the several spec1es.

1

giveI 15 . worth notlcmg the. parallei a swarm of dat a . between ordinating . t d drawing a two-dimenswna] geographic map of the whole earth or iJltS an h h b. . poJarge par t of it. In both cases . t e o llject is to .representa . pattem in a space a dimensions t at 1t actua y occup1es, while at the same time of fewer 8 muchas possible of the infonnation it contains and (sometimes) retauung;stortion to a minimum. (Distortion is not always abad thing; see keeping 9 page i o.) apher's problem is, of course, much simpler than the ecologist's The geogr apher always starts with a visualizable pattem in only three · e the geogr · liz bl · swc . s10ns, wh ereas the ecologist starts with an unv1sua a e pattern h m Th s · diJI1en . 8 • s is often Jarge, and differs from one case to anot er. e princíple diroenswnis ' the same, however. And just as one can choose among a large

/

/

D

( b)

. t. ns illustrating the 10 . al globe, . different map ProJeCee-dimens1on America usmg art of the thr . swarm of data Figure 4.1. Two maps of s.o uth dun·ensional map of p f an s-dimens10nal d [The map · a two. · n) o · ue use · Parallel between constructmg (an ordmatIO the techroq . t 0°w in (a ) · ·onal map ngly intluence d by tral roen·a· an 1s a 6 and constructing a two-dunensi 1 . . d 15 . very stro Pomts. The result obtame hi (equaton"al)·' the cen Projection for both maps is stereograp e and at 120ºW in ( b ).] (a)

ÜRDINAl

136

I()~

number of map projections when drawing a geographic map (see Figure 4 examples) so one can choose among a large number of ordin . .1 ' . Th . ati00 for two e ments and drawba k . when ordinating ecolog1cal data. tech mques e s 0¡ the various ordination methods have been debated for years and the debate is Iikely to continue. The motives for drawing geographic maps are, of course, multifarious· show climatic data, geological data, bird migration routes, shipping rou~:o ocean currents, population densities, and so on; the list is endless. But th~ motive for doing ecological ordinations is always the same, namely, to revea] what is hidden in a body of data, and it is at this point that the parall~J between ecology and geography breaks down. A map of South America like that in Figure 4.lb, although obviously "wrong" in a way that would require severa! paragraphs to define precisely, would not mislead a sophisticated map reader. This is because the true shape of South America is thoroughly familiar, and if the distorted two-dimensional version
4.2.

PRINCIPAL COMPON ENT ANAL YSIS

Principal component analysis (PCA) is the simplest of all ordination meth· od~. ~he data swarm is projected as it stands, without any differential weightmg of the sp.e~ies, onto a differently oriented s-space. Equivalently, the axes of the ongm~l coordinate frame in which the data points are (conc~ptually) plotted is rotated rigidly around its origin. This rotation is done m such a way th ª,t re¡ative · to the new axes the pattern of the data swarm shall be, colloquially speaking, as simple as' possible. Ordinating an "Unnatural" Swarm With a Regular Pattern

Before defining the phrase "a . . . a1 terms, it is instruct· .s simple as poss1ble" exactly, in mathemauc · I tbe ive to contmue th d.1 account .of PCA that f e scuss10n at an intuitive level. n 0 11ows we env · · ht pomts . at the · tbe eig isage, as a swarm of data points, .18 corners of a cub 1·d " rJll utterly unnatural · · or box." The fact that such a swa is irre1evan t '· u nnatural assumnfl()TI" ~TP \T~lW:lhle 1·¡ tbe

°

pRINCIPAL

coMPONENT ANALYSIS 137

argumen t easier to comprehend s b ake an . · u sequently ( J1l d es devised for analyzmg "unnatural" d t page 142) the oce ur a a (the com f pr lied to more believable data swarms that i ers o a box) are app , s, ones that are diffuse and irregular. . sider the eight pomts at the corners of the box . p· on . . . . 10 igure 4.2a (only of the comers are V1Slble m the d1agram since for th k . seven . ' esa e of clanty the box is s~own as an opaq~e solid). The center of the box is at the origi~ of the coordmates. If the pomts were present alone, without the edges, and onto the plane defined by the x 1- and x 2-axes (the xv x were Pro1ected J • plane), there w~uld be a confusmg pattern .ºf points with no irnmediately2 0 bvious regulanty; the same would be true if they were projected onto the x plane, or the x 2 , x 3 plane. However, if the box were rigidly rotated the origin of the coordinates until it was oriented as in Figure 4.2b , and its comer points were then proJected onto the three planes, each

e

~~o:i

::x¡2

s

1

1

R - - - - - :X:¡

p

( b)

G- - - - - - - yl R

p

1

Q

1

:

. two different coordina te traro e in. entation that F· h dimensional .on to an on s are l~e 4.2. A box (cuboid) plotted in ~ t ~ee~t appears aiter. rotaU of the box's co~e~epth onentations. In (a) the box is oblique; m ( ) The coordinates ·dth height, an bnngs · its edges parallel with the coord.ma te . axes. the Jower grap b· Tbe w1 • ~enoted by xs in the upper graph and b~ ys 1ll f the box are PQ, QR, and RS, respect1vely. .1

.

138

. . Id show one of the three faces of the box as a rectangl pr0Ject10n wou Id b d. e; the "true" pattern of the points in three-space wou e isplayed as clearly as "bl d the fact that they formed the corners of a box would be poss1 e an come obvious. . . We now describe the task to be p~rformed m ma_thematical terms. To repeat, we wish to rotate the box relativ~ t~ the coordmate f:ame or, Which comes to the same thing, rotate the coordmate frame relative to the box However, this sentence does not specify in operational terms exactly wha; needs to be done (unless, of course, one were to do the job physically, with a wood and wires model). The actual operation to be performed consists in finding the coordinates in three-space of the corners of the newly oriented box as shown in Figure 4.2b from a knowledge of their original coordinates as in Figure 4.2a. We denote the original coordinates by xs and the new coordinates after the rotation by ys. The axes in Figure 4.2a and 4.2b are labeled with xs and ys accordingly. The original coordinates (three for each of the eight points) form the columns of a 3 X 8 data matrix. Therefore, to rotate the box, it is necessary to premultiply the data matrix by a 3 X 3 orthogonal matrix (see Section 3.2) that will bring about the rotation required. The problem, therefore, boils down to finding this orthogonal matrix. To see how it can be found, notice that the projections of the width, height, and depth of the box (the lengths PQ, QR, and RS, respectively, in Figure 4.2) have their true lengths only when they are projected onto axes parallel with the edges of the box, that is, onto the axes of the coordinate frame when it is oriented as in Figure 4.2b. Given this orientation, the projections of the edges on the axes can be seen to be as follows: Projection of edge PQ ( and the other three

( PQ =

edges parallel with it)

o

on the Ji-axis on the y -axis

O

on the J3-a~s;

Projection of edge RS (O ( and the other three = RS o edges parallel with ¡t) Projection of edge QR ( and the other three = edges parallel with it) . But given an 0 bli

2

on the ri-axis on the y2-axis

(º

on the y 3-axis;

0

on the Yr~s on the y 3-axis.

QR

on the y 1-axis

que orientation lik · rY ' e that m Figure 4.2a for instance, eve

NorAL co

MPONENT ANALYSIS

pR I

139

dge has a nonzero "true"projection lengths. on ali three axes, and ali these projections are e the . · f h · 1 ess tban therefor e requue a rotation o t e coordmate . . frame that will cause d e of t he box to have a .nonzero . h PIOJection (equaJ to its true length) each e gaJCIS . This require. Only, and zero .proJectlons on t e other two axes. 0 o one t spec1fies m . . mathematical terms exactly what the desued rotation is to

we

men

acbieve. . the next stages of the discussion, considera numerical examp_le graphica1 representation. The box to be rotated 1s the oblique box m andToitsclanfY.

f2 1

(a)

1 1

A

1

1

e

( b) A

E

------yl

e 1

G

i· ue

1 1

•

an ob iq

tabl~;,

The boX 1.11 (b) The '. in Table 4.1. (a) X in tbe corners are cnven . dinates ts of the matnxoord;.,ates of· ular to the Figu,,, 4.3. The box whose_ coor ers are the elernen ate axes; tbe eare perpendic tbe widtb Position· the coordinates of its coro 'th the coordin d y3-axes . therefore, b rotated ' until its edges are parallel Wl h x an tb page, . Observe that t. e plan< ol eoreshortenuig are the elements of Y in the table. ACGE 1s m t without f P Iane of the page. Hence m · ( b ) the tace f the box are shown d(Af:)"" 12 and height d(AC) = 10 0 1

~

º'

~

b~

140

DATA MATRICES POR THE POINTS (CORNERS OF A B

41

~~~~URE 4.3a.(MATRIX X) AND FIGURE 4.3b (MATRIX Y).

O)()

The data matrix X.ª B -8.66 1.41

A

X=

(-4.04 7.07

e

D -2.88 1.41 -8.16

1.73 7.07 -4.89

E 2.88 -1.41 8.16

F -1.73 - 7.07 4.89

G 8.66 -1.41

o o 3.26 The matrix y giving the coordinates of the points as shown in Figure 4.3b , af ter rotation of the box. e D E F G H B A 6 -6 -6 -6 6 6

y=(~

lI 4.04 -7.07 -3.26

-6)

5

-5

-5

5

5

-5

-5

-4

4

-4

4

-4

4

-4

ªThe capital letter above each column is the label of the corresponding point in Figure 4.3a.

Figure 4.3a. The three coordinates of its eight comer points are the elements in the columns of the 3 X 8 data matrix X shown in Table 4.1. (The reason for choosing these coordinates becomes clear later.) In Figure 4.3 (in contrast to Figure 4.2) the three-dimensional graphs have their third axes perpendicular to the plane of the page. Therefore, what the drawing in Figure 4.3a shows is the projection of the oblique box onto the x 1 , x 2 plane. The size of the box can be found by applying the three-dimensional forro of Pythagoras's theorem. Thus d(AB), the length of edge AB whichjoins the points A = (xn, xw x 31 ) and B = (x 12 , x 22 , x 32 ), is

2

= /( -4.04 + 8.66) + (7.07 - 1.41-) 2 + (3.26 - 0) 2 =

8.

Likewise, d(AE) = /(xll - X15)2 +(x21 - X2s>2 +(x31 - X35)2 2 = /( - 4 .o4 - 2 .88) + (7.07 + 1.41)2 + (3.26 - 8.66)2 = 12.

In the same wa

y,

.t 1

may be found that d(AC) = 10.

AL coMPONENT ANALYSIS

pRrt•.iCIP

,...r w suppose 1~º

that the box is rotated ~w· dínate axes. Let the rotation brin t lts edges are

141

'ºº~re 4.Jb. The width d(AE) and hei~h~~ box into the p¿~:•llel With the figthe rotatíon and are still 12 and lO u . (AC) of the box ion shown in bY lllts, res · are uncha i.; h cannot be seen because it is at right Pectively· its d h nged w1vC . · angles to h ' ept d(AB) stil l 8 units. It 1s easy to see . that the coord"mates of tthe plane of the page 18. ' newly oriented box are given by the columns of the 3e corner points of ~he . Table 4.1. Therefore, we now need to fi nd the orthoX 8 matrix y shown in which gonal matrix U for UX=Y.

To do this notice first the form of the product of y . . transpose Y'. It is the 3 X 3 matrix postmultiplied by its

YY'

=

(

288

o

O

200

o

o

l

o '

125

a diagonal matrix. 1t is diagonal because the points whose coordinates are the columns of Y form a box that is aligned with the coordinate axes. (This last point is not proved in this book; it is intuitively reasonable, however, and should seem steadily more reasonable as the rest of this section unfolds.) Since we are to have UX

=

Y, we must also have

UX(UX)'

=

YY'

(4.1)

t ·x product UX. Now where (UX)' is the 8 X 3 transpose of the 3 X 8 man , X'U' Thus (41) becomes recall (from Exercise 3.9) that (UX) = · ·

(4.2)

UXX'U' = yY'. -,..._ . . of the same forro as Next observe that both XX' and YY' are SSCP matnces d synunetric. We R. f rse square an in Section 3.3. Both matrices are, 0 c~u ' Then (4.2) becomes use the symbol R for XX' and denote yY by Rv· (4.3 )

. Finan

URU' = Ry· ·hE

Y, compare this equation w1t

qu

ation (3.lZ).

I

It is ciear that U, U ,

142

TABLE 4.2. THE EIGENANALYSIS OF R = XX'.ª

The SSCP matrix is 205.21

- 65.21 3.74) 207.89 -46.06 . ( 3.74 -46.06 202.25 The eigenvectors of R are the rows of U where

R = XX' = _ 65.21

u= The

0.70711

-0.57735 - 0.57735 ( 0.57735

o

0.70711

-0.40825) 0.81650 . 0.40825

eigen:~e:~~ ~e( ;e8no~:º elr)en~ :~~w;,e

o o 128 The coordinates of the box's comers when it is oriented as in Figure 4.3b are given by the columns of

UX=Y=(~

4

~ -~ -~ -~ -~ =~ =~)

-4

4

-4

4

-4

4

-4

ªX is the data matrix defining the comer points of the box in Figure 4.3a.

and R v can be obtained by doing an eigenanalysis of R; the nonzero elements of Ry which, as we have seen, are ali on the main diagonal are the eigenvalues of R. Table 4.2 shows the outcome of doing an eigenanalysis of R. As may be seen, the eigenanalysis gives the eigenvectors of R; these are the rows of U. U is the orthogonal matrix we require. The product UX gives Y, the matrix of coordinates of the comer points of the box in Figure 4.3b, which has been rotated to the desired orientation with its edges parallel with the coordinate axes. Thus the original problem is solved. To summarize: the solution is found by doing an eigenanalysis of the SSCP matrix R = XX' where X is tbe original data matrix.

Ordinating a "Natural", Irregular Swarm N ow consider th li · ar111s· S e app catwn of this procedure to "realistic" data sw · uppose one had an s x d . cíes JU n ata matnx X listing the amounts of s spe

L coMPONENT ANALYSIS

pfllfloJCIPA

drats (or other sampling units) and th

n ~ua represented by a swarm of n points . at these data are th

143

beJJlg Th d in s-spa OUght of . regular. e proce ure for perfor . ce; the swarrn. . as and 1r b 1 d mmg a PCA lS d1[us nows (the sym o s use are the same, and h on such data is e fo in Section 3.3 and Table 3.6. A num . ave the same rneanm· as rhose . eneal exa 1 g, as data IIlatrix is shown m Table 4.3). mp e using a 2 ><

11

data by species (rows). Do tbi b 1. Center the h f h s y subtracting f eleIIlent in X t e mean o t e elements in the same r rom every data IIlatrix XR. ow. Call the centered

2. Forro the s 3. Forro the s

X

s SSCP. matrix R

XRXR.

=

X s covanance matrix (l/n )R As w h . · · · e s a11 see, tbis step is not strictly necessary, b ut 1t 1s usually done.

4. Carry out ~ eige~analysis o~ R or (l/n )R. The eigenvectors of these two matrices are 1dentical. Combme these s eigenvectors, each with s elements, by letting them be the rows of an s X s matrix u; u is orthogonal. Tue eigenvalues of R are n times those of (l/n )R; hence it is immaterial whether R or (1 / n )R is analyzed. Let A denote the s x s diagonal matrix whose nonzero elements are the eigenvalues of the covariance matrix (1/n )R. Then [compare Equation (3.12)]

It follows that

URU'

=

nA

the eigenvalues of R. tnx· y= uxR. Each f · g the s X n roa 5a. Complete the PCA by f ornun . f of the data points. l 0 oordmates one th column of y CY1ves the new set o f s e . ·t is found that e O" coordmates, i h the points are plotted using these new . h nged. The only e ange . nother 1s une d ound pattem of the points relat1ve to one .t has been rotate ar a smgle entl Y produced is that the whole swarm as a·nate frame. · f h ew coor 1 . · 1swarm tts centroid, which is the origin o t en . Table 4.3. The ong~na 4 4a· · h sults m · figure · ' f X is plotted in 1 nts of Figure 4.4 shows graphically t e re of . the elements o t the e eme Pomts whose coordinates are h as coordina es f the whole the transformed swarm which after pCA as the centroid o y . ' ay be seen, ' is plotted in Figure 4.4b. As ro

and the nonzero elements of n A are

ª .

ª

TABLE4.3.

The 2

X

THE STEPS IN A PRINCIPAL COMPONENTSANALYSIS OF DATA MATRIX #9.

11 data matrix is

X =(20 26 27 28 31 33 39 41 42 48 50) 50 45 60 50 46 55 35 25 33 24 28 . The row-centered data matrix obtained by subtracting x1 = 35 and x 2 = 41 from the first and second rows of X, respectively, is -9 -8 -7 -4 -2 4 7 6 Xa = (-15 9 4 19 9 5 14 -6 -16 -8 The SSCP matrix is ( 934 R = -1026 The covariance matrix is _!_ R = (

84.9091 n -93.2727 The matrix of eigenvectors is

13 -17

15) -13 .

-1026) 1574 . -93.2727) 143.0909 .

u

= ( 0.592560 -0.805526) 0.592560 . 0.805526 The eigenvalues of the covariance matrix are the nonzero elements of A= ( 211 704

0

16~95 ).

The transformed data matrix ( after rounding to one decimal place) is

y=( -16.1 -6.7

-8.6

-20.0

-11.4

-6.4

-12.5

7.2

16.4

10.6

21.4

-4.9

4.8

-0.3

- 0 .3

6.7

-0.3

-4.6

0.9

0.4

19.4) 4.4

145

a~.x"

F1gur~.

bown by a cross), which is X2) = (35, 41) in Figure 4.4a, has swaIJTI (\ed to the ongm at (Y1, Y2) - (O,?! m 4.46, and the swann been sJuf has been rotated so that its I_ong aius is parallel with the as a .whole bis statement is expressed more prec1sely later). . 1

(t Sa describes PCA as a process of rotatmg a swann of points . centr01 paragraph .d · lt is instructive to rephrase the. paragraph, calling it Sb, ar0 that 1·t descn.b es the process as one of rotatmg the coordinate frame und its so . the swarm. •aJ(!S

elat1ve to

X2

•

60

•

50

• •

• • +

40

(a)

••

30

•

• •

40

50

20

10

o

o

30

20

10

X¡

60

Y2 20

(b)

10

•

•

•

•

-10

-20

•

•

Y1

-20

- 10

d data whose

( b) Af ter . inal un transforme warm. The ong .d of the s has been # 9. (a)marksarm's the centrmd 'fbe swarm igure 4.4. Two plots of Data M4ª3trixThe cross centro1 .are given by Y in Table . . · Tabledinates · · . at al thengswaxes Y1 an d Y2• are given Xm 1s eordmates . . of tby be ne w coor. t measure d o A. The ongm tated. The coordinates o f each pom , - 20

.3.

146

ORD1t-.¡Al

I()~

5b. Forro the matrix product Y= UXR as already directed 1' · the raw data by rows· ·th henp¡ the original data swarm after centenng 01 nates are the elements of XR (see Figure 4.5). We now wish ~o e coorct¡, . . Thi . . 1 . rotate h axes rigidly around the ongm. s 1s eqmva ent to drawmg new te and y2-axes) which must go through the origin and be perpend~Xes (the Y1 1 . h icutar t each other in such a way that, re ative to t em, the points sh ll o · to find the equaf ª have coordina tes given by Y. The pro blem, th eref ore, 1s . h ionsor the lines in the x 1 , x 2 coordmate frame t at serve as these new axes. This is easily done. Note that any imaginable point on the Ji-axis h . . as a coordinate of zero on the y1-axis, and vice versa. Hence the y1-axis is thes of ali imaginable points, such as point k, of which it is true that et

Indeed, the set of all points conforming to this equation is the y1-axis, and its equation is

To draw this straight line, it is obviously necessary to find only two points on it. One point is the origin, at (x 1 , x 2 ) = (O, O). Another pointcan be found by assigning any convenient value to x 1 and solving for x2• In the numerical example we are here considering, for instance, let us put X1 :::: lü. Then

-0.805526 0 .592560 X lO

=

-

l3 .5 9 .

Hence two points that define the Y1 -axis are

(x1, x2)

=

(O, O)

and

(x 1, xi)= (10, -13.59).

This li~e is shown dashed in Figure 4.5 and is labeled .Y1 · The Yfa,Xis ¡, found m the same way. It is the line

coMPONE NT ANALYSIS

r.JClpAL

pfl J

147

Figure 4.5. An:other V:ªY of portraying the PC~ of Data Matrix # 9. The points were plotted using the coordinates rn the centered data matnx XR (see Table 4.3) with the axes labeled x1 and Xi- The new axes, the y 1- and y 2 -axes, were found as explained in the text. Observe that the pattern of the points relative to the new axes is the same as the pattern in Figure 4.4b.

We have now done two · PCAs, the first of the comer points of a three-dimensional box, and the second of an irregular two-dimensional swarm of points. It should now be clear that, in general (i.e., for any value of s), doing a PCA consists in doing an eigenanalysis of the SSCP matrix

or of the covariance matrix

where Xa is the row-centered version of the original data matrix. . f . f h data pomts rom Then one can either (1) find new coordmates or t e n the equation

y= UXR, Whe

. he eigenvectors of R and re U is the orthogonal matrix whose rows are t a· ate axes from the (1 /n)R· h t ted coor in ' or (2) find the equations of t e ro

ª

148

equation

Ux =O, 1 ment column vectors where x an d Oare the s-e e

x=

and

o=

rn

. Ux = 0 denotes s equations of which the i tb is and the equat10n

The ith axis is defined by the s - 1 simultaneous equations u~x =O with = 1,2, .. .,(i -1),(i + l), ... , s. Figures 4.4 and 4.5 amount to graphical demonstrations of alternatives (1) and (2) when s = 2. k

Regardless of how large s is, there are s mutually perpendicular axes, both for the original coordinate frame and the rotated frame. The new axes, namely, the y 1-axis, the Yrax.is, ... , and the Ys -axis, are known as the first, second, ... , and s th principal axes of the data. The new coordinates of the data points measured along these new axes are known as principal compo· nent seores. For example, the ith principal component score of the jth poinl IS Y;J = U¡1X11

+

U;2X2¡

+ ... + U¡sXsj·

Thus it is the weighted d by species m ) sum of the quantities (after they have been centere eans of the s s · · the

weights. After PCA . pecies in the jth quadrat. The us are . each pomt h . · f ach spec1es in a quadrat b . as as coord1nates not the amount o e quadrat. ' ut vanously W~ighted sums of all the species in the The term princi 1 pa cornponent d po·

nent score for any d . enotes the variable" the principal coJIJ data is ata pomt"; hence the ith principal componen! o! the

oMPONENT ANALYSIS ..i[IPAL C

p~lr•

149

al step in an ordination by PCA th '[he fin d . . ' e step that b CA to be interprete ' is to mspect the pattern of ena les the result of aP projected onto planes defined by th the data points when are d . e new t tbeY . . al axes). The ata pomts can be proiect d ' ro ated axes (the nnc1p 1 . J e onto the P Jane the Yi, Y3 P ane, and, mdeed any pl Y1' Y2 plane, the y3 P ' ' ane spe ·fi J'P. rnetiroes helpful to look at a perspective d . ci ed by two axes. 15 1 U s~uck in cork-board that shows the pattern ;~: ~ or solid model of 5 pins 0f the principal axes, usually the y y and e ata points relative to 111ree . i, 2, Y3-axes. We rnust now define prec1sely the consequences f . . 1 d o rotatmg a diffuse, . gular s-dimens10na ata swarm so as to make its P . . irre . . (. . fOJections onto spaces d' . 0f fewer dunens10ns m practice, onto spaces of two or thr 'bl ,, h ee unens10ns) "as simple as poss1 e or, per aps more accurately " lin . , as revea g as possible." Rec~ that .1f one does ~ PCA of the comer points of a box-and it can be an s-dllllens10nal box w1th s taking any value-the pattem of the points when projected onto any of the two-dimensional planes of the rotated coordinate frame is always a rectangle. What can be said of the projections onto different two-spaces, defined by principal axes, of a diffuse swarm? The answer is as follows. Consider the new SSCP matrix and the new covariance matrix calculated from the principal component seores after the PCA has been done and the coordinates of the points have been converted from species quantities (xs) into principal component seores (ys). These matrices are

0

Ry

=

ª

YY' and

respectively I · · 3) d Table 4 2 (pages 141 t should now be recalled, from Equat10n (4· . an . · s. all their and 142), that both Ry and (1/n )Rv are diagonal mNatnce ~e already elernents except for those on the mam · d.iagon al are zero. ow, know [see Equation (3.9), page 106] that

l

var(Yi)

cov ( Ji, Y2 )

cov( Y1' Ys) ) cov(yz, Ys

;Rv = c?~~~2.' !:~ .. ~ª:~~2) ........... · · · · · ·. cov( Ys , Yi)

cov( Ys , Y2 )

var(Ys)

()~[)¡~

11

A.t1r,

. . . . . anees of a paus of Principal an h t the COV C()O) _ follows with ne: anP(inl 1 tberefore . t her words, they are all uncorrelated . 0 111 t seores are zero; .in otuence of a PCA, however, . .1s the 1 .following . . 1t (;·ci11ilr The chief conseq 1977) that the first pnne1pa axis is so orient
ª

Returning to the numerical example in Table 4.3, it is easily found thal the covanance matnx' of th · · . e pnnc1pa1 component seores 1s 1 1 -Ry::::: -vY' = ( 211.704 n n O

16~95) =A.

Thus the Vari.ances of the . . the eigenvalues of th · . Pnncipal eomponent seores are equal to . eu covanance . I t is. mtuitively .d matnx. evi ent fr · h t the greatest "spread" of the o·ºlll in~pection of Figures 4.4 and 4.5 t a Jso tlhakt, although !he data p mts is in the direction of the Y1-axis, and ªhen oo ed at · Points sh . ·nw lil the frame of th ow obVIous negative corre1at10 ·on ex 1 an d Xi-axes (Figure 4.4a ), this. correlatt

..1(1PAL

coMPONENT ANALYSIS

p~I,,

151

. es when the points are plotted in the 11a111sh 4 4b ). fra:rne of the Y a d 1 n f1gure · . . Y2-axe ( CA as here descnbed IS often used a s p . . s an ordin . ation method in . aJ work. Such an ordmation is a "success" .f ¡ogic . . I a lar . eco. ge proportion of th dispers1on (or scatter) of the data Is par ll rot al . a e1 With th fi e . cipal axes; for then this large proportion f h . e rst two or three pru1 . . li b . o t e mfor . mation contained . the original, unvISua za le s-dunensional d t J1l a a swarm b ce or three-space and examined This · h can e plotted in Spa 1wo· . · Is w at d. . s out to acbieve: the data swarm is to be pro· t d or mation by PCA se t . . ~ec e onto the t . ·onaJ or three-d1mens10nal frame (or frames) th t wo-dunensi a most clearly 1 the real pattern of the data. When three axes are retained . reveas . . . . ' as IS very often do the result IS shown m pnnt either as a two-dimensional . ne, hr . . perspective (or · · f isometnc) drawmg o a t ee-dimens10nal graph or el . . . . ' se as a tno of two-dimens10nal graphs showmg the swarm projected onto the Y1, Y2 p1ane, the y¡ , y3 plane, and the Yi , y 3 plane, respectively. The statement that such a two- or three-dimensional display of the riginal s-dimensional data swarm reveals the real pattem of the data is tuitively reasonable, but it is desirable to define more precisely what is eant by "real pattem." The observed abundances of a large number of ecies co-occurring in an ecological sampling unit are govemed by two ctors: first, the joint responses of groups of species to persistent features f the environment; second, the "capricious," unrelated responses of a few dividua} members of a few species to environmental accidents of the sort at occur sporadically, here and there, and have only local and temporary ects. In the present context the joint, related responses of groups of · · " nd the ec1es constitute "real pattem" or "interestmg data structure, a . ·. " . ,, (This is not to say that m Pnc1ous, sporadic responses amount to no1se. · they produce may not er contexts, environmental accidents and the nms~ Gauch (I982b) that ª researcher's chief interest.) It has been shown ~ . · ly a few 1 · f 0 rdmat10n, m on PªYmg the results of a PCA, or indeed 0 any ely pennit an e · d re than mer d· it also suppresses . ns10ns (typically two or three) oes isualiz bl . . t be v1sualize ' h . a e s-d1mens10nal pattern mponents of t e ise '' f pnncipa1 co d · This is because the first ew fiect the concerte ª--those with the largest variances-nearly always re of species (hence on · When a group 1t of ses of groups of severa! spec1es. lik ly to be the resu e · · · un e · do /ºus mdividuals) behave in concert, It IS f t that manY species ~ izect, temporary "accidents." Moreover, the ac s of the environrnen ect . t" feature ' respond in concert to the "unportan

°

m? .

..

1 ·~ p

(

152

.

hole contains redundanc1es; therefore body as a w h ... · , the t h the data d structure,, of means t ª . xes nee ed to display t e mterestmg . num ber of coordmate h s the to ta1 number of spec1es observed. t d a is far Iess t an '. from. the. redundancy.lil the a . . dinat10n pernn·ts us to profit . To sununanze. or d cy not much mformat10n 1s lost by rep of redun an ' . . re. fi eld data. Because . t ·n only a few d1mens10ns. And the discard d of data pom s I . . e senting a swarm . d d axes along which the vanances are srnall). . formation (on the d1sregar e is m . (Gauch 1982b). . . . . mostly n01se '. a PCA ordination descnbed m this sect10n can be h0 d of domg . The met shown in the next sect10n. An example of its use . . d. nous ways as mod1fie m va . 1 described may be found in Jeglum, Wehrhahn and · the way prev10us Y . . .. m commuruties in the Swan (1971). They samp led the vegetation m vanous . . boreal forest of Sas k a tchewan and ordinated theIT data and vanous subsets of it using PCA.

ª

4.3.

FOUR DIFFERENT VERSIONS OF PCA

The method given in the preceding section for carrying out a PCA can be modified in one or both of two ways. First, one can standardize (or rescale) the data by dividing each element in the centered data matrix XR by the standard deviation of the elements in its row. The resulting standardizied centered data matrix ZR then has as its (i, j)th element

as we saw in Chapter 3 (

10

.

~age 7). The SSCP matrix divided by n [i.e., da] is the correlation matrix (see Table 3.6, page 112). The PCA ~

(l/n)Z Z' .

now ca R. rne out by domg a · . . . d of the covanance · . n eigenanalys1s of the correlation matnx mstea matnx. The seconct modification co . . . analyzing Cl/n)X X' nsists m usmg uncentered data. Instead of O R a as was do · XX' f course' both th ese modifi ne. 1Il ection 4 .2 , one analyzes (l/n) · one can analyze the m t . cations can be made simultaneously. Thus discussing (dl/n)ZZ' in Which the (i, j)th element is vers1ons f p a vantages d . · us 0 two-d1rne . CA, we eompare the an disadvantages of these. dvanto a th ns1onal swarrn of 10 d resuits they give when apphe e columns 0 f ata po· . are Data Mat · lllts. The coordinates of the pomts nx #10 · tbe given at the top of Table 4.4. In

s ·

~efore . °

~~

x/ª;·

~I

o

fOU"

ptfff

RfNT VERSIONS Of PCA 153

A.BLE 4.4. FOVR DIFFERENT PCAS OF D T . . ATA MATRIX #10. fbe data matnx is

X= (

2 20

25 30

33 13

42 30

55 17

Unstandardized Uncentered PCA 1XX'= ( 3644 ; 1641

u= (

1641) 889

0.906 -0.423

A = ( 4609

0.423) 0.906

1~4)

60 42

62 27

65 25

92 25

99) 43 .

Unstandarct·ized Centered PCN 1

;¡ XRXR = ( 781.9

132.3) 93.8

132.3

u =(

0.983 - 0.183

0.183) 0.983

o)

A= ( 806.4

o

(Axes yí, y5. in Figure 4.6a)

69.2 (Axes Y1, Yz in Figure 4.6a)

Standardized Uncentered PCA

Standardized Centered PCA

!z;z1 =(4.66 n

6.06

u= (

0.561 -0.828

A = ( 13.593

o

6.06) 9.48 0.828) 0.561

O )

0.548 (Axes yí', yí' in Figure 4.6b)

!zRza=( i n 0.488 u= (

0.488) 1

0.707 -0.707

0.707) 0.707

A= ( 1.488

O ) 0.512 (Axes y¡'", y{" in Figure 4.6b)

o

ªThis is the version of PCA described and demonstrated in Section 4.2.

separate sections in the lower part of the table are given, for each of the four forms of PCA: (1) the square symmetric matrix to be analyzed, (2) the matrix of eigenvectors U, and (3) the matrix of eigenvalues A. The results are shown graphically in Figure 4.6. It should be noticed that the effect of standardizing the raw data (as in Figure 4.6b) is to make the Variances of both sets of coordinates equal to unity. Thus the dispersions of the P0ints along the x / a axis and along the x 2/ ª2 axis are the same. 1 Standardizing the data t~erefore, alters the shape of the swarm; after standard·IZation, · the swarm ' IS . noticeably . less e1ongated th n it was before.

ª

PCA Using a Correlation Matrix

Anaiysis 0 f

' . forro of PCA that is the correlation matrix (1/n )ZRZR IS h already been enuy e . 1 lit ture As as expJain ncountered in the ecologica era · d di ed centered ed, the correlation matrix is obtained from the stan ar z

frequ

ª

154

'h.'

• •

~-- ·-)· ····

•

(a)

•

.

··········· ......... y,

•

---+-----,= 00:- ::.X: I 75

y;'

.

,/ Y1'"

•

.............·

(b)

• •

1

2

4

ª

Figure 4.6. Four versions of PCA applied alon Table the 4.x¡,4)x, U• . to Data Matnx · #10 (see · ( axes. ) Unstan· dardized data. The raw, nncentered coordmates are measured gd PCA shifts the ongJJ centered PCA rota tes the axes in to the solid lines labeled YÍ, YÍ- . Cen teredotted lines y , y,. 1bl to the centroid of tbe swarm (marked +) and rota tes the axes mto the the x ¡o,,x,/•1 1 Standardized data. The nncentered but standardized data are measured along d shifts th< axes. Uncentered PCA rotales the axes into the salid lines y{', YÍ'· Centere origin to the centroid and rotates tbe axes into tbe dotted lines y{" , y{" ·

PC~

11 . z•.. In what follows we discuss the ments data matnx pros . of stan dardization t kin . f ª g or granted that the data have first been centered by rows· The

anct cons of data centering is discussect in a subsequent subsection. jrabl• In sorne . . ·bly) des· . . · in . thana1Yses ºPtion · · standard1Zat1on of the data is a (possid .tuauo turn. , o ers ll 1s a necessity. We consider these contraste s1

ns~

fOUR DI f

fERENT VERSIONS OF PCA 155

as a sean dardization is often . desirable . way of p . tlJe uncommon spec1es m a community b reventmg the "swamping" of vnJess data are standardized, the d Y_ the common or abu d ones. al · Thi ºIlllnant . n ant ,..,;nate the an ys1s. s happens beca spec1es are likely t dow. use the q .. o ·es tend to have higher variances (as ll uantities of abunda t spec1 we as high n nntities of uncommon species. Standardizat· .er means) than the qull>' . h . ion equaliz ll re ax.is rotat10n (t e analys1s itself) is c . es a the variances be fo . arned out Th . · us íf one wishes . bordinate spec1es to have an appreciable "' ~u . euect on th o-ood idea to use standardized data. e outcome, it is a º nªowever, this
Yl. =

U.1, 1 Z 1

+

U·I, 2 Zz

+ · · · +U· ¡9Z19 1

'

156

zs are the elements of the original data matrix after they h where the . ave d and standardized. N ewnham scaled the e1genvector ele een cen . ments tere b (the us) so that the !argest element of each was equal to umty (it is th relative, not the absolute, magnitudes of the elements of an eigenvector tha: matter, and one can choose whatever scale happens to be convenient). The first two principal components-those with the largest eigenvalues--. were as follows. Only the five terms with the largest coefficients (" weightq are shown here: y = 0.97z

3

1

+ z7 + 0.99z10 + z11 + 0.99z14 + 14 other terms;

Yi = z + 0.99z5 + 0.94z 13 4

-

0.73z 16

-

0.86z17 + 14 other terms.

The five most heavily weighted variables contributing to these two principal components are as follows: Contributing to y 1 : z3

winter temperature: average daily maximum, ºF;

winter temperature: average daily mínimum ºF· ' ' z10 fall temperature: average daily mínimum ºF· ' ' zu winter temperature: average daily mean ºF· ' ' Z14 fall temperature: average daily mean, ºF.

z7

Contributing to y2 : Z4

Z5

Z13 z16 Z11

spring temperature: average daily maximum ºF· summer temperature: average daily maximum ' ºF· ' summer temperature: average d il ' . a Y mean ºF· spnng precipitation: average m . me . hes· ' ' summer prec1p1tation: · · ' average in inches.

'

These two com the d ponents accounted f . or S7.4% and 29.4% of the variance 111 ata, respectively, for a tot When we consider th . a1 of 86.8%. two. pnnc1pal · · e vanables components .t . th at are weighted most heavily in the firsl 1 pomts) wh ere wmters · and' fallis seen th ªt the weather stations (tbe data component seores, and the sta/ are mild have the highest first principal 10ns where · spnngs and summers are bot a.Ild

RENT VERSIONS OF PCA ll 01ff E f0 U

157

. e the negative coefficients of z16 and z17 ) have the highest second ' (nouc

dr~ . al component seores. Thus we can draw the two-dimensional coordi-

princifpame shown in Figure 4.7a and label the four regions into which the nate r . d . . . divide it w1th a two-sentence escnpt10n of the climate: the first atesten ce puts into words the meaning of high and low values of the first sen . ·pal component, and analogously for the second sentence. pnDCl Axis 2

(a) l. Cold wlnters 2. Hot dry summers

l. Mlld wlnters 2. Hot dry summers

- - - - - - - - - + -- - - - - - --Axis 1

l. Mild winters 2 . Cool wet summers

l. Cold wínters 2. Cool wet summers

Axis 2

(b)

5

-5

+

+ ++

••• • • +• • ••

• + +

++ ++ +

+

• • •••

•• •

•• • ••

8

º~~

o

+

o

o

+ + -5

.

.

Axis 1

o

+ +

5

o

h

o o o o o o o o

.ons in tbe ordination of 70 Britisb

Figure 4.7. (a) A "qualitative" graph labeling t e reg1 bt .ned from a PCA of climatic Co¡ b. · ( b) The first two axes o ai . d um_ ia weather stations show~ m · . Axis se arates the stations into those w1tb 1 ata d1v1de the coordinate frame mto four reg10ns. . p . t tbose with hot dry summers rnild · tes the stat1ons m o and those witb cold winters. Axis 2 separa . . al ordination by PCA of tbe anct th . ( b) A two-d1mens1on ' ose Wllh cool wet summers. d tes a station where ponderosa corr l · ,, · The symbo1• eno . e ation matrix, of the 70 weather stat10ns. . . The two species are never found i~ne occurs; O denotes sitka spruce; +, ne1tber spec1es. gether. (Adapted from Newnham, 1968.)

158

The scatter diagram in Figure 4. ?b is a two-dimensional ordination f o the . . db . 70 weather stations. Each stat10n 1s represente y .ª pomt having its fir and second principal component seores as coordmates. Three diA- st . . uerent symbols have been used for the pomts: one for stat10ns where Sitka s . Pruce (Picea sitchensis) occurs ; one for stat10ns where ponderosa pine (Pin ponderosa) occurs; one for stations where neither tree species is found.: one would expect from a knowledge of these trees' habitat requirements an~ geographic ranges, Sitka spruce occurs predominantly at stations With marine climates (mild winters and cool, wet summers) and ponderosa Pine at interior stations with hot, dry summers. The figure demonstrates very well how an ordination by PCA can clarify what was originally a confusing and unwieldy 19 X 70 data matrix. Two coordinate axes (instead of 19) suffice to portray ~ large proportion (86.8%) of the information in the original data, and concrete meanings can be attached to the seores measured along each axis. This last point deserves strong emphasis. Ordinations, especially of community data, are often presented in the ecological literature with the axes cryptically labeled "Axis l," "Axis 2," and so on, with no explanation as to the concrete meaning,. the actual ecological implications, of these coordinate axes. Without such explanations the scatter diagrams yielded by ordinations are uninterpretable. As Nichols (1977) has written, "The primary effort in any PCA should be the examination of the eigenvector coeffi.cients [to] determine which species [or environmental variables] combine to define which axes, and why."

PCA Using Uncentered Data The great majority of ecological ordinations are done-with centered data but this is not always the most appropriate procedure. Sometimes it is preferable to ordinate data in their raw, uncentered form. The reason for this will not become clear until we consideran example with more than two species, an~ ?ence data ~warm occupying more than two dimensions. First, however, ~ is worth looking at the results of doing both a centered and an uncentere PCA on the same, deliberately simplified two-dimensional data swaflll chosen to demonstrate as clearly as possibie the contrast between the tWO methods. eons~'der Figure 4.8. Both graphs show the same seven data ~ oíntS ·nal plotted m raw form in the frame defined by the X1 and X2-axes. Tbe ong1

ª

DlfffRENT VERSIONS OF PCA

159

fOLJR

(a)

( b) :X:

"•.........

•

.. ..

)('

,

,

·-_, '',,,

"

/

Figure 4.8.

""" y~

Bo~ grap~ s~ow a row of seven data points plotted in an x 1 , x 2 coordina te frame.

(a) The dot~ed lin;s Yi an,d Y1 are the first and second principal axes of a centered PCA; ( b)

the dashed lines Y1 and Y2 are the first and second principal axes of an uncentered PCA.

data matrix is

X =(~

7 3

6

5

4

3

4

5

6

7

~ ).

Figure 4.8a shows the principal axes (the lines y 1 and Yi) yielded by a centered PCA. The intersection of these axes, which is the origin of the new frame, is at the centroid of the swarm (which coincides with the central point of the seven). The coordinates of the data points relative to the y 1 and Jraxes are given by the matrix

-2 .83

o

-1.41

o

o o

1.41

o

2.83

o

4.24)

o .

Clearly, the y 1-axis is so aligned that the points ha ve the maximum possible "spread" along it; their spread relative to the y2-axis is zero. Figure 4.8b shows the principal axes (the lines y{ and yí) yielded by an uncentered PCA. The intersection of the new· axes coincides with the intersection of the old axes; equivalently, the origin has not been shi~ted. The coordinates of the data points relative to the y{ and y~-axes are given

by

y(')= (

7.07

7.07

7.07

-4.24

-2.83

-1.41

7.07

o

7.07

1.41

7.07 2.83

7.07) 4.24 .

160

to show that it
L (Yli ). 2

and

i=I

in the c~n~ered and uncentered cases, respectively. With a centered PCA, because it is centered, thís sum of squares is proportional to the variance of the seven Yii values. With an uncentered PCA thís sum of squares is the sum of squared distances of the points (after projection onto the y{-axi 5) from the unshift d · · · ' e ongm; It bears no relation to the variance of the Yli values, which in the example is zero.

Wf e now turn to ª three-dimensional example to illustrate the ecological use ulness of an u t d p allY f ncen ere CA. In practice of course there are usu ar more than three axes (thr . ' ' rnust li · ee spec1es) and it is unfortunate that we . ffilt ourselves to three a· . . . aliZ' abl Th imensions m order to make the analysis visu e. e reader should fi d · the many-d1m · n it easy to extrapolate the arguments to L.1.uens10na1 case. The three-dimensional e 1 . . . . the three-dime · al xamp e IS shown m Figure 4 9 The pomts J.1l •...u .. ns10n scatter di · · g to agram at the top of the figure clearly belOI1

(a)

•

•

•

10

•

15

•

20

25

xi

AXIS 2

• 10

(b)

•

• AXIS

•

10

1

30

20 •

AXIS 2

(e)

5

•

• -20

•~1t~ters ~ng

•

-10

10

•

•

AXIS 1

20

-5 p·1¡,,, 4 · (.9.b) A( plot th data pomts m d11ferent . . tbree-space. Tbey fof!ll two qualitat1vely . . · ) Seven (e¡ from an u e pomts m the coordinate 1rame fonned by tbe first two principal axes 01 rrespond' e A. One cluster lies on axis 1 and the other very close to axis 2. ine e ncenter d PC .161 0 mg plot alter a centered PCA. Both clusters lie on axis l.

el

ª

162

two qualitatively different clusters; there is evidently a four-member set of quadrats (cluster 1) containing only species 1 and 2, anda three-member set (cluster 2) containing species 3 together with smali amounts of species 2. The data matrix (Data Matrix # 11) is 15

X=

( ~

18 10

o

21

24

o

5

10

1

2

8

15

o o

o

~) .

10

As may be seen, the axes yielded by an uncentered PCA (Figure 4.9b) could be said to "define" the two contrasted clusters. The first axis transfixes one of the clusters and the second axis grazes the other. With a centered PCA, the first axis goes through both clusters (Figure 4.9c ). There is a perceptible gap between the clusters but their qualitative dissimilari ty (cluster 1 lacks species 3, and cluster 2 lacks species 1) is not nearly so weli brought out. It should now be clear that in certain circumstances an uncentered PCA reveals the structure of the data more clearly than
fl orffER

ENT VERSIONS OF PCA

ro U

IC1J

.dentical lists of common species A , · centcrcd PCA 15 · contrast. among the quadrats is less pro called for t11e nounccd a ,, ti1 1· ~111iel1. ee rather than in kind. nu c r contents ·ff r in degr dr e ractice data are often obtained for which 1 ·t . , . In P b is not 1mmediatel . 5 whether the etween-axes heterogeneity ex d . . Y obv1ou . . Wh . cee s t1ie w1 thm-axes eneity or vice versa. en this happens it · b heterog . ' is est to do both a ed and an uncentered PCA. If the between-axes h t . r t cene . . e erogemty of the is appreciable, then there w1ll unipolar (or al mos t urnpo . ar, . be . as many . 1 data see later) axes as th~re are quahtatively d11ferent clusters of data points. Of course, the first axis of ~ uncentered PCA is automatically unipolar, regardless of wh~t~er there is a~y between-axes heterogenity. If there is not, then the first axis is merely a lme through the origin of the raw coordinate frame passing close to the centroid of the whole data swarm as in Figure 4.6 (page 154), for instance. Data are often obtained that do not clearly belong to one type or the other. Then an uncentered PCA is likely to give one or more principal axes (after the first) that, although technically bipolar, are so "unsymrnetrical" that it seems reasonable to treat them as "virtually" unipolar. A bipolar axis is said to be symrnetrical if, relative to it, the totals of the positive and negative seores are equal. Obviously, bipolar axes can range from the perfectly symmetrical to the strongly asymrnetrical; only in the limit, when the asymmetry becomes total ( all seores of the same sign), is an axis unipolar. Therefore, in ecological contexts an axis need not be strictly unipolar to suggest the existence of qualitatively different clusters within a body of data. Noy-Meir (1973a) has devised a coefficient of asymmetry for principal axes that ranges from O (for perfect symmetry) to 1 (for co.mplete asymmetry). He recommends that any axis for which the coefficient of 5Ymmetry exceeds 0.9 be regarded as virtually unipolar. * Given an uncentered PCA the coefficient, a, is defined as follows. Let th ' . d b us· let U+ e elements of the eigenvector defining an axis be denote Y '. . de Th n the coeffic1ent is note a positive element and u_ a negative element. e · g no 111

navin

ª

L:u~ a=l--L:u~ •1his .. PCA since the ~ata points lbems efi!l!tion is not applicable to the axes of a centered a}ues are only of int.erest for e Ves ha . .. d. tes As a v . phcable to Uncent ve negat1ve as well as pos1tive coor !Ila · f to make it ap erect . h f mula or a centen~ct axes, there is no point in adaptmg t e or axes. dl

164

or [u~ a=l---

[u:_

(ir ¿u¡ < ¿u:.).

Let us find the coefficients of asymmetry for axes 1 and ~in Figure 4.9b. An uncentered (and unstandardized) PCA of Data Matnx # 11 yields a matrix of eigenvectors

u=

0.931 - 0.078 ( -0.356

0.364 0.151 0.919

0 .018) 0 .985 . -0.169

Therefore, a = 1 for axis 1, since all elements in the first row of U are of the same sign. This result is also obvious from the figure, of course. For axis 2, O'. =

1-

( -0.078)2 0.151 2 + 0.985 2

=

0.994.

It is clear from Figure 4.9b that axis 2 should be treated as unipolar even though three of the points belonging to the cluster on axis 1 have small negative seores with respect to axis 2. It is these small negative seores that cause a to be just less than l. Using Noy-Meir's criterion whereby a value of a greater than 0.9 is treated as indicating a virtually unipolar axis, we may treat axis 2 in Figure 4.9b as unipolar. A clear and detailed discussion of the use of centered and uncentered ordinations on different kinds of data has been given by Noy-Meir (1973a). For an example of the practica! application of uncentered PCA to field data, see Carleton and Maycock (1980). These authors used the method to identify "vegetational noda" (qualitatively different communities) in the boreal forests of Ontario south of James Bay. Other Forms of PCA

Th~re

are other ways (besides those described in the preceding pages) Íil which data can be transformed as a preliminary to carrying out a pCA Data ca~ be ~tandardized in various different ways and they can be centered in vanou d'A" ' d ne s Iuerent ways. Standardizing and centering can be 0

pRINCIPAL

coORDINATE ANALYSIS

165

ly or in combination. There are num Para te erous pos ·b.li . se t of the methods are seldom used by e s1 i hes. . Ivfos co1og1sts anct . book. They are clearly discussed anct e are not dealt with ·n tlus . . omparect in N M . ~id ]lloy-Meir, Walker, and Wtl!iams (1975). oy- eu (1973a)

4.4. PRINCIPAL COORDINATE ANALYSIS The methods of ordination so far discussed (the various . vers1ons . . s-space and specify opera te on a swarm of data pomts m ct· A' of PCA) all . . Iuerent ways of projectmg these pomts onto a space of fewer than s dimensions. The origin may be shifted and the scales of the axes may be changed, but at the outset each data point has, as its coordinates, the amount of each species in the quadrat represented by the point. Principal coordinate analysis diIBers from principal component analysis in the way in which the data swarm is constructed to begin with. The points are not plotted in an s-dimensional coordinate frame. Instead, their locations are fixed as follows: the dissimilarity between every pair of quadrats is measured, using sorne chosen measure of dissimilarity, and the points are then plotted in such a way as to make the distance between every pair of points as nearly as possible equal to their dissimilarity. It should be noticed that the number of axes of the coordinate frame in which the points are plotted depends on the number of points, not on the number of species. Also, that the value of the coordinates are of no · · · mterest; · · ts shan have the desired mtnns1c they merely ensure that the pom spacing.

Before descnbmg · · · how the coordmates are found' we need an acronym fo r " Pnnc1pal · · . , None has come into common use. 1n coordinate analys1s.'

What follows it is called PCO. . . . . must first be To do a PCO, a measure of interquadrat d1ss1IDll~n~y h measure, cho sen. Any metric measure may be used. w·11hou t spec1fymg .t e d k. We !et u · . . . . b t n quadrats J an .s Wnte 8(), k) for the d1ss11nilanty e wee h t the distance requir . · pace such t ª al b e to find coordinates for n pomts m n-s ( as nearly equ etween Points 1· and k namely d(j, k ), shall be equa1 ore the points so as p . ' ' 'bl t arrang th oss1ble) to 8(J k ). Often it proves imposs1 e. o 1 and one must at th . . . ' h qurred va ues b eir Pairw1se distances have exactly t e re e content · h ¡· · ffices to 1 Wit approximate equa 1ties. . . s always su t shou}d b f 1 dunens10n con . e noticed that a space o n . ed in a one-s pace (a talll n 0 . · be contain P mts. Thus two points can a1ways

166

contained in a two-space (a plane) or. always be . . · ' in a . . three points can b colinear; sumlar1y, n pmnts can ahvays b line), ace if they chance to le dimensions at most. Hence it could be .e one-sp . ace of n . . sa1d n of the n pomts m . ( n . 1)-space rather co tained in a spfi d the coora·nates i that we need to n. . . d d true but the argument 1s s1mpler anct clea This 1s m ee ' . rer than in n-space. . , d Th required .( n - 1)-space is a subspace of thi · considere · e . fl · s 15 if an n-space th t a two-dimens10na1 oor is a subset of the same way a . h a . ( n-space m . room) Equivalently, all the pomts ave a coordinate of three-dimenswnal of the n axes,· the same axis for all of them. There is no need to zero . mm . d.' the required zeros emerge as part of the output of the keep ºº. thisone fact m computations. ri1 · f · t PCA b ecause in lt could be argued that PCO is necessa . " y m ,,enor b o · 5 placed exactly where 1t ought to e, whereas m Peo · t · h PCA eac pom 1 . . . each point is so placed that interpomt d1stance.s are as closel_Y as possible (but seldom exactly) equal to interquadrat. d1stances. ~rov1ded the aproximation is close, the imprecision of PCO is of no practica! consequence. both PCO and PCA, the final step consists in projecting the swarm to be examined onto a visualizable space of two or three dimensions, and far more information is usually lost in this reduction of dimensionality than in slight "misplacings" of the points in the n-dimensional swarm. However, PCO is not suitable for ali bodies of data. We consider later (after describing the method) how to judge when PCO is appropriate. Now fer_ the method. Any metric measure of dissimilarity may be used. The ob¡ect is to find coordinates for n points in n-space such that d(j, k), the d1stance between pom· t · d k h . 0(1,. k ), theu . d1ssunilarity . . shJ an , s hali be as nearly equal as .poss1ble to F ave chosen to measure it. . the argume, t owever or clanty ( hi h we . . numbered paragraphs ' n w e is rather long) is given in the followmg After th t h . . reiterated m· rec1pe · f.orm R d t e operations to be performed are bnefiy . delving into the rea . · ea ers who w1sh to try the method before somng that u d li · d return to the details later. n er es it should skip to the recipe an

~ith

ª'

l. As a pr limi Th e nary, note that ll erefore, to keep th sununations are over the range 1 ton. left u e symbols as u 1 . . However, lt · is . very ne subscrinstated. t . . Uttered as possible ' these limíts are h vanes each f important to observe which of t P s symbol b 1 une a sunun · he "'-,e,1. -_ e e + ow the L. Bear In · oUnd . ation f is done. I t is specified byh t es 11 c21 + · · · +e . ' or example that a sum suc nJ (In Whi h ' lues e r takes the series of va

ª

ª

coORDINATE ANALYSIS

pRl1,..10PAL 167

is not the same as .L .e . = e 2 ... ' !1 ) . J rJ rl + Cr2 + . . . . tl1e senes of values 1, 2, ... , n ). + crn (m Wruch J k~ . 1a The coordmates sough t are to be Writt 2. . h . en as an . h ach column g1ves t e n coordmates of n >< n matrix e . ivhJC e . . . one of the . in be centered. That IS, the ongm of th . Pomts. The points are to . e coordmat . 'd of the swarm of pomts. Equivalently " es is to be at the centro1 2 2 . 'LiJc, . =O f ll The distance , d (J, k), between the ·th a Jd k or .ª r. 3· 1 n th pom ts is J••

d2(J,k)=[(c.-c )2

rk ·

rJ

r

of matrix C. Each row of e . Here. r denotes the r tb row . . corresponds with an ovis lD the n-space that IS to COnta.m the SWarm of poínts b t ( ' u m contrast to pCA) the axes do not represent species. 4. It follows that ClJ'.'

•

2

d (J, k) =

L (e;+ c;k -

2c,Jcrk)

r

=

L:c; + L:c;k r r

2LcrJcrk·

5. Next consider the n x n matrix, say A, formed by premultiplying C by its transpose C'. 1t is

C11

A=C'C =

c12

C21

c22

••·

Cnl ) Cn2

.... ............

r

eln

e2n

I:c;1 r

···

nn

C

L c,¡c,2

C11

C21

C12

. . .

C22

C111 ) C211

. .............c..

n

C l

C11 2

•' .

1111

LCr1Crn

r

_Lc, 2c,1 .Lc?2 r r . . . . . ... .. .. ..... LCrnCrl r

r

_Lc,ncr2

LCr2Crn

r .......

_Lc;

11

r

fust find the Ais oh . . paragraphs we d el v1ously symmetrical. In the followmg . . hi h are calculate · · ·1anues w e fA rement8 of A from the interquadrat dissinu f e trom those 0 · ro1n th fi d h eiements o e eld observations. W e then fin t e

168

lt follows from paragrap

6· A we have element of ' alk

h 5 that if we write alk for thc ( k J' Jth

= L CrlCrk; r

ltemative formula for d2(j, k) is Therefore, an a

d2(J, k)

= ª11

+ akk

- 2alk.

(4.4)

7. Notice, for later use, that L1ª1k = O. This follows from the fact that

since the sum E¡crJ is zero (see paragraph 2). Because A is symmetric, it is also true that l,kaJk = O. 8. We now wish to find a1k as a function of d 2 (j, k). Rearranging Equation (4.4) shows that

(4.5) 9. We now find ªJJ and akk as functions of d 2 (j, k ). To do this, sum every term in (4.4) over all values of j, from 1 to n. Thus

Ld . J

Put "[,1

2

(i,k) = I:a .. +°"a -2°"a ~ kk ~ lk· . n j

J

ª . = x· note that t Jªkk =

L:1.a1k. == ó. Th,ere f ore, 1

td2(J,k)=x+ j

nakk

nakk,

whence

j

and recall from paragraph 7 tbat

akk

=

ni( Ld2(J, k) - ) X •

1

Likewise summi , ng every term . ( ht m 4.4) over ali values of k, it is seen t

ª

L d 2 (J' k) =

na ..

k

11

Observe that " '--kªkk

=

+

) x Whence ajj = ~ ( L d 2 (j , k) - x .

t lall' Which .

k

we have already denoted by x.

cooRDINAI t ANAL ni~

..,0 rAL 1

pRl ~

169

. n (4.5) thus becomes BquatlO

01i

~!{-d 2 (J,k) +¿(Ld 2

~

2

"d2( . 1 -21 d2( ),. k) + l_ 2n Í-J J,k) +-¿d 2 ( 2n

J

an

+.!.(" n '-/2(J,k)-x))

(J,k)-x)

J

.

J'

k

k)

X

- ;

(4.6)

d it remains to express x as a function of d 2( . k) h 6. J, . IO. In paragrap it was shown that Q ..

}}

="

2 Í-J crj. r

Therefore,

Lª11 = J

Lj LC~· ·

(4.7)

r

Recall from paragraph 1 that the centroid of the n points whose coordinates \\'e seek is to be at the origin. Therefore, L,/0 is the square of the distance of the jth point from the origin, and L,¡LrC~· is the sum of the squares of the distances of ali n points from tbe origin. This latter sum is equal* to (1/n) times the sum of the squares of the distances between every pair of points. Tbat is, " " c 2. = ~~ 0 j r

.!n "~

d 2 (J k) =

j
'

2 22n I:. l:d (J, k ). J

k

· h ll be considered only (Th e form L. specifies that each pair of pomts s 1
ª

a. = _ _!d2(J k) 1k 2 '

. .

+_!_~d2(J,k) 2n

J

_!_ '°'d2(1' k) - ~ '[ '[d2(J, k). + 2n ~ ' 2n J· k k ltii~ fact w

. roof is found as used in another context earher. A P

.

in

je!OU

p

(1977), p. 320.

170

.

2

, t d2( . /<) (th~ d1stanc.,c bc.:,t ..vecn fJ<,Jnt . ·t'pul at(; tJta ·1' . 1 '2( k Ja,, 12 w e noWS 1 ' 1 C'• ual atl po~f,Jb<.I) t<J () /; ) Wcthc f, . • 1 • • i11al (or as nca r y '1 ie '

/1) IS lo

') (.i

l:L1

pul

i_s 2( .,·, k> ' 2n i

2

i./>2U, k > 1

+ _J_ ~0 2 () , k) 2n

k

. es of 0 2( 1· k) have been ca1cu1ated for every p&n (f Thc numenca1 va1u ' . . ' . . d. , Hence ali the elements of A can be gJ ven numenca] ·i&Juer.. pomts J an 1<.. •· It remains to fi.nd the elements of C. 13. Since A is a square symmetric matrix, we know that A = U'AU,

where A is the diagonal matrix whose nonzero elements are the eigenvalues of A, and U is the orthogonal matrix whose rows are the correspondmg eigenvectors of A; U' is the transpose of U. Since A is a diagonal matrix, it can be replaced by the product A112A11: in which the nonzero elements are · A. 1(2, A.1{ 2, ... , A.1t2. Thus A= (U'A1/2)(A1/2U) . Each of these factors is the transpose of the other. Now recall (from paragraph 1) that, by definition, A= C'C. Therefore

' U'A112

and

A112u = e

. e elements of C · of . Then the first princi al ' .we therefore carry out an eigenanalys1s e~ements in the first r p coordmates of the data points which are tbe e1g ow of e ' first \l /~nvector of A). The , are obtained from A.1/ 2u' (u' is the /\2 U'2 and second p . . 1 1 i . d fro.rJJ ' so on. lf as is nncipal coordinates are obtaine . ' very oft h d. auoil en t e case. a two-climensional or 10

A

14. To find th

-- e,

AL coORDINATE ANALYSIS pfllNCIP 171

e

ted only the first two rows of ¡5 wan ' . . need be dinates of the n pomts m two-space . evaluated· they . coor . Wlth the . ' give the distance between every pa.tr of points a . po1nts so arranged th t11e . . al pproximat at tbeir dissi.mJ.lanty as e cu1ated at the outset. es as closely as possible To reiterate,. without explanations: the operations . . are the followmg. requ1red to do a PCO

l. Calculate. the between every pau . of d . . dissimilarity . some chosen d1sslIIlllanty measure. Denote by 8 . k qua ~a~s, using between quadrats j and k. Put the squares of th ese (J, d1ss ) .the·¡ d1ss11nilarity ·· elements of an n X n matrix A. irm anties as the 2. Find the elements of the n X n symmetric · (4.8) which is matnx A from Equation

'f} 2(J, k) is the sum of the elements in the jth row of A; 'f.k8 2(J, k) is the sum of the elements in the k th column of A; 'f.l,k8 2 (J, k) is the sum of all the elements in A.

Amore compact formula for determining A from A is given in Exercise 4.6.

3. Do an eigenanalysis of A. 4. For a two-dimensional PCO, calculate the first two rows of C which are

and

liere ;\1 and A are the first and second eigenvalues of A; (un u12 ... uln) and (u21 u222 . . . u2n) are the respective eigenvectors.

172

.nts. Their coordina tes are (e n' 5. Plot the Pol ( C111• Cz ,,).

)

c21 ,

(e i2,c

22 ).,

We now consider an example. XAMPLE. A simple example }s shown in Table 4.5 and Figure 41 E . # 12 . h h · O. The 2 x 5 data matrix, Data Matnx ' is s own at t e top of the tabJ · . . e. As always, its (i , j)th eleme~t deno t es th e amo~n t of spec1es z m quadrat J, bu¡ these elements are not, m PCO, the . coordmates of the. data . points to be ordinated. The quadrats have labeling numbers 1, ... , 5 (m Italics) above the

TABLE4.5. AN EXAMPLE OF PRINCIPAL COORDINATEANALYSIS (SEE FIGURE 4.10). 1

2 3 9 8 10 14 The matrix of squared dissimilarities is

4

X=(¡

o tl

Matrix A whose ele

100

=

o

5

15 8

169 25

o

2~ ) is Data Matrix # 12.

196 64 169

o

529 225 400 81

. O ments are given by Equation (4.8) is 120.48

-119.92 -25.92 - 78.52 . 55.68 An eigenanalysis of A h 168.68 ' s ows that its eigenvalues · /\¡ , A , A A A _ (to two decimal places) are 2 The first two ~¡ge4 , s - 310.77, 85.78, 4.50 O - 9 45 nvectors are ' ' . . A=

Ben

12.48 4.48

12.88 26.88 74.28

- 25.92 - 17.92 - 35.52 23 .68

u} : ( -o. 517 -0.152 -O 3111 O233 o. 747 ) u 2 - ( - O629 · · ). ce the first t 0.169 O.721 -O .234 -o.oz7 Worows f · o e = A112u . d' tes. are cí) 'which are the required coor 1na :::::( - 9.117 2 13176) ( C2 - 5.823 - .675 - 5.485 4.101 0'254 . l.567 6.681 - 2.171 - . 1

L coORDINATE ANALYSIS

rRH.JCIPA 173

Axis 2 10

. 3

03

2

2~ -10

-15

-5

4o

.

5

15

Axis 1

4

Qf

-5

- 10

Figure 4.10. The solid dots are the data points (projected 0 t t of data matrix X (Data Matrix # 12) in Table 4.5. Each np~n;~~sfaªbcel) dyiel~ehd by a PCO · th 1 f X th t · e e w1t a number deoo!Jilg . e co umn o a represents lt. The hollow dots show the sam ed centered PCA e data after uns andardiz , · t

respective columns of X, and these are used to label the points in Figure 4.10. It is now necessary to choose a dissimilarity measure for measuring the illssimilarity between every pair of quadrats. Let us use the city-block distance CD (page 45). Then the dissimilarity between quadrats 3 and 5, for instance, is CD(3, 5) = jx13

-

x 15 j + jx 23

-

x 25 1 = 18 - 231+114 - 91

= 15 + 5 = 20. The dissimilarity between every pair of quadrats is measured in this ~ay,

5 and the squared dissimilarities are the elements of the 5 X matnx

· a·

!:::..f

are o shown in Table 4.5; all the elements on the mam iagon ' course, zero since CD(j, j) = O for all J. . (4 8) For examNext, the elements of A are determined from Equatwn · · ple,

a 35

-

-

-=.Af)Q 2

+

763

10

+ .ill2 - 3~56 10

= -

al of

!:::..

78 .52.

·ght halves are Sine . · ly their upper n w. e matnces A and A are symmetnc, on rttten out. ble and also its first A·18 · nin the ta then analyzed Its eigenvalues are give • t\Vo . eigenvectors.

\ ·. 11 yaliv . Tliis i111pli '8 Lliat it i:) ¡, 1h:at ",, IH i' ' 1· J 'l!iq~~.il,¡ 11 will h ~L . , . "1 spa<.; of u11y lllllllr)t-r o t un t1:-i1011) ~ 11, • JOl'ldS 111 ,1 1 C.I ' 1 ll 1lrlYl! . . .1 ' 111 '• 11 \;'l1:1 ·11 1·111g f¡yi.; 1 . . 1'11.' i'llKC:S :-;fia CXr1Cl y lJi(.; Y'il ' ' . JHll WISC ( lJ1.;~, w· iy thal 1lu;n 1 . .1 Slll'lllci i11 iabsolutc mag111tuuc U , n A 11 4. ' : ne A' is mue ' • ' . . 11 ~JJ(j ~ , 1low vc1, si . 1 • 1 caii 1-:afdy he 1gH<>l(..:;d. Wc a1e, 1t1 a11 y (.;¿ . tio1111111mLICli< • 'f l~L,,)J, the d1stor . llw f)()Í1ILS JJJ lwo-spéHA..... o use <., ll Y 0101 1a11 t 1 1 · · )fl~Sml ti 11g . intcrcste<. m "I . . 1 cxamplc ( 111 wh1d1 thcrc are 011ly lwo w,, . • : · 111 thc P' usen . · · ' P01..1c d111wnswns . . of an ordinal1on, wh1ch is to reduce lhe . \) ld lcf "1 l thc puipo:-;c (J111cr 1 1 wou <. .'
,1

'

'

'

•

'

1

1

These points are shown as the solid dots in Figure 4.1 O. For comparison, thc results of carrying out an unstandardized, centered PCA on thc samc data are also shown (by hollow dots) on the same figure. It is interesting to compare the desired interpoint distances (the sq uares of these distances are the elements of A) and the actual interpoint distances in the two-dimensional ordination yielded by PCO. The two matrices to be compared are shown in Table 4.6.

TABLE4.6. A COMPARISON OF (1) INTERQUADRAT DISSIMILARITJES (CITY-BLOCK MEASURE) AND (2) INTERPOINT DISTANCES FOLLOWJNG THE TWO-DJMENSIONAL Peo SHOWN IN TABLE 4.5 AND FIGURE4.JO.

---=--------------------~~~~~---Interquadrat Dissimilaritiesª o

1

10

o

13 5

o

14 8 13

o

Interpoint Distances'

23 15 20 9

o :?he elemcnts of A in Table 4 5

o

9.8

o

13.0 5.8

o

13.7 7.7 13.0

o

---

23.0

16.0 19.9 9.3

o

using Pythagoras' · thare the sq uarcs of lhcsc dissi mil ari 1i c.<. ven in Wi 2Computcd. 5 X matrix at the botto 0 fsT eorem, from thc coordinates of thc points which are gi m abJe 4.5.

p~1NCIP

AL coORDINATE ANALYSIS

175

, can be seen, the discrepancics b~tw~en d , .· A~ ight. Such as t11ey are, th~y result r cs1rc
The problem deserves investigation. . . of PCO is found in A good example of the practica! applicatwn d h mpling units Kempton (1981). The organisms stud1e · d were moths .an t e saplaced at 14 (" . b k) were hght traps quadrats" in the terminology of this 00 d with discoverlocaf1 . t was concerne . ons throughout Great Britam. Kemp on t·ons based on one 1 lllg h w ether an ordination (by PCO) of th ese 14 oca 1 r after year. The seaso ' . hly the same yea . ns moth collections remamed roug f ig term valid1ty, or 1 ~Uestion is: Is an ordination of moth communities oh o1 na-lysis of a single is th that t e ,ere so much variation from year to year . 1 Iy is an important, Year s b . ? This e ear , " e th o servations is virtually mearung1ess. ' d that there was soro ough K ton f oun . rs a e . rarely pondered problem. emp . . consecuuve yea ' ons1st ,, ' . b . ed ll1 s1x ency among the ordinations o tain

ª

.

176

biogeographers who wish to ordinate ge · g resu1t for . . . h . ograph¡ fairly reassunn. unity compos1t10n m s ort-lived organis e . the basis of comm ms. s1tes on

5

~R

RECIPROCAL AVERAG ING, CORRESPONDENCE ANALYSIS

. ·ng and correspondence analysis are alternative names f Reciproca1 averag1 . . or the same technique, one that is deservedly popular for ordmatmg ecologicat data. lt is commonly known by the acron.ym RA (Gauch, 1982a). It is ye¡ another version of PCA (besides those d1scussed ~n page 152, and many others) and, as such, might seem to have n.o cla1m to special mention. However, as we shall see, it has one great ment shared by no other version of PCA. Thus consider one-dimensional ordination. Recall (page 83) that, by one definition, a one-dimensional ordination consists in assigning a score to each quadrat so that the quadrats can be ordered ("ordinated") along a single axis according to these seores. Each quadrat's score is the weighted sum of the species-quantities it contains. What differentiates one ordination technique from another is the system used for assigning weights to the species. In RA the quadrats and the species are ordinated simultaneously. Seores are ~s~igned to each quadrat and to each species in such a way as to ma~e the correlation between quadrat seores and species seores (as explamed later). In the discussion ' web egm · b Y cons1denng · . RA as merely another versio· n of PCA. Af ter that 1·t · h . ' is s own how the same result (a scoring system for both spec1es and q d ) · l averaPi ,, ua rats can be obtained by the so-called "reciproca o-'ng procedure. RA as a Form of PCA

It was pomted · out li f four different way eüar er that an "ordinary" PCA can be done in one o s. ne choo fi Jeave them uncentered Th . ses rst whether to center the data or 5 whether to standa a· en, mdependently of this first choice, one choose_ da a· r ize the dat ª or leave them unstandardized (to. staJ1 ·ded r IZe them, the eleme t . bYthe st d n s m each · d1v1 an ard deviation f row of the raw data matnx are d tbe 0 data mat · nx may be left a11 the e¡ements in the row). In other wor s.daf' un transfo d . staJI rme or it may be centered, or

CAL AVERAGING, OR CORRESPONDENCE

RfCfpRO

ANAL YSIS

..,..,

'

/ /

Or both centered and standardized Whi h di ed · e eve 0 f h zh 'sen the next steps are the same: the dat ~ t e four po 'ibiliti1.:s' 15· e o ' a matnx ( not) is postmultiplied by its transpose and the w1lether tr~ns~ormed 0 product matn · 1 then : analyzed (for examples, see Table 4 e1gen . " . ·4, page 153) Th . we1ghts" to be att h · en each e1genve1:Onsists of a list of wr e . ac ed to ea h . " ores" (which are we1ghted sums of species . . e pec1e so that se quantit1es) e b for each quadrat. The seores are the coordinates of h ~n ~ computed . ation · t e pornts m a plot of tbe or dm RA differs from the four versions of PCA alread a· . . is . transformed before the eige Y iscussed 111 the way in which the data matnx al . . . . nan ys1s, a.nd m the way in which the e1genvectors are transformed into seo f h . . analysis. We cons1der these two procedures in turn Theyres a dter t e e1genare emonstrated . · in Tables 4.7 and 4.8, which show the RA ordination of a 3 x 5 matrix (Data Matrix # 13). The reasons for the operations will not become clear until we attain the same result by "reciproca! averaging." Here they are presented in recipe form, without explanation. Since in RA seores are assigned both to quadrats and to species, the procedure yields an R-type and a Q-type ordination simultaneously. (Recall that an R-type analysis gives an ordination of quadrats and a Q-type analysis an ordination of species.) In the following account we first consider the Q-type part of the analysis which gives the species seores (Table 4.7), and then the R-type part of the analysis, which gives the quadrat seores (Table 4.8). The data are not centered and they are transformed as follows. Each element in the data matrix is divided by the square root of its row total and by the square root of its column total. . d As always, let the number of spec1es . (the num ber of rows rn the . ata X) m· h ber of colurnns rn ( atnx X) be s, and the number of quadrats t e num ben.

Let r;

=

.;, x 11.. be the total of the ith row of X;

L.,.¡

j=l

s . h olurnn of X; let c.= '\""' x 1). . be the total of the Jt e J L.,.¡ i=l

let N

s

= '\""'

L.,.¡

n

'\""' x . .

L.,.¡

i=l j=l

lj

~

n

= L..,¿ " eJ. = '=1 '-- r; )=l

I

be the arand total. l::J

TABLE4.7.

RA ORDINATION OF DATA MATRIX #13; THE EIGENANALYSIS GIVING THE SPEClES-SCORES.

The 3 X 5 data matrix X witb row and column totals shown is o 2 15 15 6 x19 5 7

o

20

15

25

r:

o 1//30

o

o ) (15

l/~

i

2

o

6 7

15 5

2

o 8

2

1

8

29

10

30

o

20 30 50 .

100

1/fü

o o

o o

o o

o

l/fiü

o

o

o o

o l/v'lO

o

1//30

1//25

o

o o o

2~)

o

( 0.67082

= 0.32863 0.02828

0.11547 0.28284 0.25560

P = MM' =

o

o

0.14142

0.61237 0.15811

0.04082)

0.35777

0.74878

( 0.48500 0.25311 0.12965

0.25311 0.56300 0.17842

0.12965) 0.17842 . 0.77980

o

o

o

.

Eigenanalysis of P gives

o 0.56056

o

o ). 0.26715 '

0.44721

u= ( 0.49281 0.74641

The matrix of species seores is V= /NUR -

1 2 /

= ( i.i02

1 0.929

1.669

-1.213

-0~998). 0.060

0.54772 0.50885 -0.66413

0.70711) -0.70584 . 0.04236

RfCIP

fbell

ROCAL AVERAGING, OR CORRESPON DENCE i'\N

ALYs1s

179

tbe (i, j)th element of the tran f

s orrnect m . atnx, say M . , lS

X¡)

¡r;c;

with i = 1

m· · = - 11

,... ,s

and i :::: 1 , . .. , n.

The whole transformation can be eompactly · . Rdenote the s X s diagonal matrix wh wntten in matrix f . h ose nonzero 1 orm. Let totals of X. Thus m t e example in Table 4.7, e ements are th e row ( r¡

R

=

~

o '2

o

o

o) o = (20o

30

O

o

sH

1//20

o

o o

1//30

o o

o

1//50

r3

Next, note that

R-1;2

=

-1/2 '1

o

o o

-1/2 '2

o

oo 1,3-1/2

(The reader should confinn, by matrix multiplication, that R- 112 R- 112 R-1, and then that R- 1 R = RR- 1 = 1, the identity matrix.)

=

The n X n diagonal matrix c- 1/ 2 is obtained from the column totals of Xin the same way; its nonzero elements are the reciprocals of the square roots of the column totals. lf we now wri te (4.9) M = R-112xc-112 ad

. d b h quation it is easily n carry out the matrix multiplication spec1fie Yt e e ' seen that m the (i 1·)1h element of M, has the required value. M' Call W IJ' ' . . d b its transpose . e now find the product of M postmuluplie Y the Product, which is an s X s matrix, P. Thus (4.10)

P=MM'. pis

i enanalysis is performed in the matrix that must now be analyzed. The e g 1..," A (whose nonzero the u a· onal rnat rix V (whose rows are el sual way and yields an s X s iag ernents , d s X s rnat are the eigenvalues of P) an an .lft

180

the corresponding eigenvectors of P). The results for the numerical . examp¡ are shown in Table 4.7. 1t w11l be seen that A1, the largest eigenvalue of p .e unity. This is always the case, and the explanation is given subsequent) , is It remains to derive the species seores. This is done by postmultiply~· by a diagonal matrix whose jth element is /'j. gU

i/N

Denoting by V the s X s matrix whose rows are the sets of species seores we therefore have, when s = 3, ,

V= U

JN /r1

O

o

O

JN/r2

O

o

o

JN/r3

=

/NUR-112.

(4.11)

(The reader should check that postmultiplying U by a diagonal matrix has the effect of multiplying each element in the jth column of U by the jth element of the diagonal matrix.) The rows of V are the required vectors of species seores. It will be seen that the first row of V, corresponding to the largest eigenvalue, A1 = 1, is (1, 1, 1). This result is true in general. That is, for any s, the largest eigenvalue is always A1 = l. The s-element row vector of species seores 112 corresponding to this eigenvalue, which is the first row of V = IN UR- , is always (1, 1, ... , 1). It is a "trivial" result, and the reason why it is invariably obtained becomes clear subsequently, when we use the reciproca! averaging procedure to do the same RA ordination. Only the rows of V after the first are of interest. N ow for the R-type part of the analysis, which gives the quadrat seores. Recall that for the Q-type part we analyzed the s x s matrix P, where

p =MM' and M = R-112xc-112. Notice that M'

=

c-1;2X'R-112.

·ces is

(1 t should be recalled that the transpose of the product of two matn h . . d r see t e product of thelf respective transposes multiplied in the reverse or e.' Exercise 3.9, page 131. And it should also be noticed that transposlflg a diagonal matrix leaves it unaltered.) Thus, written out in full ' (4JZ)

JPRoCA

L AVERAGING, OR CORRESPONDENCE ANALYSIS

~f C

181

.

ow c}ear, from considerations of s rt 1s nwe must analyze the n X n matrix YIIUnetry ' that to ordinate th qtiadra ts e

Q = (c-112x,R - 112 )(R -112

xc - 112)

. ( 4.13) surning that s < n, we find that Q has only . As .h h . s nonzero e 1 e identical wit t e e1genvalues of p (see igenva ues and ar u1eY . page 127). Table 4.8 demonstrates the analys1s using Data M . atnx # 13 As 1 . . 5 which in this case is a X 5 matrix) has the eige .· a ways, V( . nvectors as its rows Th X 5 matnx W, whose rows are the quadrat-scores, is · e

5

w = muc-112.

(4.14)

'

this equation should be compared with (4.11 ).

TABLE4.8.

RA ORDINATION OF DATA MATRIX #13'

THE EIGENANALYSIS GIVING THE QUADRAT-SCORES. X' is the transpose of X in Table 4.7. M' is the transpose of M in Table 4.7. 0.55880

0.17764 0.15867

Q = M'M =

0.20572 0.21362 0.40000

[

0.10499 0.10778 0.05657 0.14800

0.04856 0.19610 0.11839 0.27366 0.56233

(Q, like P, is symmetrical and the elements below the principal diagonal have here been omitted.) Qhas the same nonzero eigenvalues as P, namely, A1 = l; A2 = 0.56056; A3 = 0.26715. The matrix of eigenvectors is

u=

0.5000 0.6382 -0.5488 - 0.1740 -0.1061

0.3873 0.0273 0.1757 0.1113 0.8978

The matrix of quadrat seores is W

=

0.4472 0.2671 0.7739 0.0420 -0.3577

ffeuc-1/ 2

1

1

1

1.276 - 1.098 -0.348 -0.212

0.070 0.454 0.287 2.318

0.597 l.730 0.09 4 -0.800

0.3162 -0.2442 -0.2336 0.8657 -0.1906

0.5477 -0.6790 -0.1203 -0.4539 -0.1359

111 .l.

Ax i s 2 20

15

10 o3

02 1

1

- o~

10

I '>

-

1

_¡_

0 .5

-

~--....!_ Axis 1.0 1.5

1

40

·~

-o5

.4

.,

O! -10

. 411 lls ,show the outcome of RA ordination of . Data Matrix #13 (Tables 4 7 arid 4 H). ThL hollow dob show t~e same data after unstandardized, centered PCA; they are plottcr.l in thl' planc of PCA axes l and 2.

As in the ordination of the speeies, the first set of seores (the first row of consist~ of ones and is of no interest. The seores on the first and second RA axes are given by the elements of the seeond and third rows of W. Using

W)

thcsc seores as coordinates, the five points representing the quadrats give the two-dimensional RA ordination shown in Figure 4.11 (solid dots). The rcsult of doing an unstandardized centered PCA on the same data is shown for comparison (hollow dots). The ~orrelation Between Quadrat Seores and Spec1es Seores The analyses just describ d h . . (the rows of V in Tabl 4 e ave prov1ded sets of seores for the species . 7 Tahlc 4.8) s e · ) and sets of seores for the quadrats (the rows of Win · upposc the spec· . . of V, and thc quadrats th ies are ass1gned the seores in the k th row . TahJc 4.9 the . e seores in the k th row of W Then as demonstrated in ' square of th · , saY e correlation coeffieient between these seores,

L AVERAGING, OR CORRESPONDENCE

ANAL Ys1s

IPRoCA Rf C

183

.

s eq

ual to 'A", the kth eigenvalue of p

1 1

r,. · . ~ l, ... , s. . k correlauon

The

.

.

coeffic1ent r k is calculated f

rk =

1 N

s

anct Q T . · his holds t rue for rom the formula

n

.L L 1=1 J=l

X;Juk;wk .. J

(4.15)

X) is treated h . Here xi;. (from the dataf matrix . .. as t ough lt were " h uency of occurrence o spec1es r m quadrat . ,, S . t e freq . J· ometunes the l f Xare indeed frequenc1es; even when they aren t h e ernents o d h . o ' owever, for exarnple even when they recor t e b10masses of the species in th d . e qua rats, they are treated as frequenc1es for t~e purp~se of the present calculations. The term v is the k th seo re of the z th spec1es; likewise w . is the k th ki ' k1 score of the jtb quadrat. Table 4.9 shows the computation of r2 • As may be seen, r22 = A . The 2 reader is invited to check that r 32 = 'A 3 (see Exercise 4.8). It is now obvious why the trivial result A1 = 1, with v1 = (1, 1, ... , 1) and w1 = (1,1, ... , 1), is always obtained when P and Q are analyzed. If ali the species and ali the quadrats are assigned a score of unity and, equivalently, if we put v1¡ = 1 and w11 = 1 for ali i and j, then the right side of (4.15) TABLE 4.9. THE CORRELATION BETWEEN MATCHED SETS OF SPECIES SCORESAND QUADRAT SCORES FOR DATA MATRIX #13.

Computation of r2 . ª

15 X= (

r "" ' 2

Quadrat Seores

i

1.276

2

2

o

~

l~

0.070

0.597

o 8

1)

Species Seores

1.102 0.929 29 -0.998

o

-0.772 -J.240 29( - O. 998)( -1.240)}

.

100{15(1.102)(1.276) + 2(1.102)(0.070) + . . + : : 0.7487 2 ' ':::: 0.561 ~

. f iJ1 the second (' 'ta11cs) ro d f species seores in J s' trom tbe secon r011, of yllgbt of h . . h ond set o drat score , . t e data matnx is t e .sec . ond set of qua row of w 1 ~ Table 4.7. Below it (in italics) 15 tbe sec - A2.

''l-

io lhe .

in Table 4.8.

184

beco mes

r

i

=

_!_ L LX;¡· N . . I

J

. that r = 1 since, by definition, N = t;t ·X utomatica11Y I RA d. . 1 iJ· 1' 0 It follows lt . discarded when an or mat10n is done this tnvial resu is . Th . . e spec1es seores repeat, . th repeating is the followmg. . . h and A ther pom t wor no f d by RA are such as to maxuruze t e correlat' d at seores oun f hi b k . ion qua r them. Th e Proof is beyond the scope o t s oo ; it may be found between in Anderberg (1973, P· 215 )·

ª ..

The Reciproca! Averaging Technique RA ordination can also be done by "reciproca! averaging." In outline the procedure is as follows. First, arbitrary trial .values are chosen for the species seores. Next, a first set of quadrat seores is computed from these species seores. Then a second set of species seores is computed from the first set of quadrat seores, then a second set of quadrat seores from the second set of species seores. And so on, back and forth reciprocally, until the vectors of seores maintain constant relative proportions. Table 4.10 illustrates the procedure numerically, using Data Matrix #13 again. At every stage, each quadrat score is the weighted average of the last-derived species seores (and vice versa). In computing these averages, the speeies seores being averaged are weighted by the amounts of the species in the quadrat (and mutatis mutandis when quadrat seores are averaged). 1 vCº\ v< >, ... denote the successive vectors of speeies seores, In symbols let 0 1 and let w< ) wC ) d h p· ' ' · · · enote t e successive vectors of quadrat seores. rrst, values for the elements of vCO) are chosen. It is convenient to use per· centages, ranging from O for the lowest score to 100 for the highest. In the example the ehosen seores are

= ( u~O)' v~O)' u~O)) = (100, 50, O).

vCO) 0The

w< ) is

elements of wCO) ar

wj

Wh

ere

cJ

.

(0) -

[

is the j th colu

X

. . t of e now computed. For mstance, the 1th elemen vCO)

lJ 1

o

+ x 2J·V~)+

...

+x Sj.uCº)]¡c . s J

mn total of the data matrix.

(4.16)

185

v(%) 2 6 7

15 9

o

2

1

15 5

o

o

8

29

20.0 33.3 37.5 20.0 40.9 51.7 60.l 20.0 45.3 ........ .... .. 93.0

) (100)

50.0 68.8 (52.1) (73.0)

20.0 (18.6)

v<2\%)

iOO

64.0 (lOO) 50 48.8 (68.9) 69.9 (100) o 15.J (O) 59.5 (80.l) 17. 7 (O)

3.3 3.3 3.3

(O)

91¡8 190 0.929 1.102 73.0 100 0.597 1.276

1

Row 2 of V - 0.998

o

w

76.9 (100) 72.J (91.8) 20.9 (O)

3.3

o

V

V(%)

18.6 0.772

52.1 0.070

1

Row 2 of W - 1.240

1

The. data matrix is· shown above and to the left of the doubl e lin e. eCies seores are m the columns on the right, labeled v<º) v
Success1ve . a · · S . ppro~mat~ons to the uccessive approximat10ns to the

Thus in the numerical example

wfº> = ( (15

X

100) + (9

X

50) + (1

X

0)] /25

=

78.0.

1

When the n elements of w<º) have been found, v< ) is computed. lts ith element 01CI> is v
v\1> = 1'L

+ (2

33.3) + (0

( (15 X

78.0)

+ (2

20.0) + (1 X 3.3)]/20

X

X

X

~

37.5) 64 0 ·

f

they are used to

co elements of v <1) are rescaled to percentages be ore 1 d but it is not rnpute wCl) Id also be resca e nece · (The successive w vectors cou ssary.) lHe

186

The procedure is continued until the vectors stab~ze (i.e., until any o-ive unchanged results). The final results m the exampl . . . e are f ur th er stePs er hown by the column on the extreme nght m Table 4.10 (which gives the final species seores as percentages) and the row at the bottom (which giv the final quadrat seores as percentages). As is ~hown in the lower part of t~: table, these seores are the same (apart from bemg rescaled as percentages) as row 2 of v (in Table 4.7) and row 2 of W (in Table 4.8). Thus they are the required seores for, respectively, a one-dimensional ordination of the species, and a one-dimensional ordination of the quadrats. The seores on the second RA axes (i.e., the third row of V and the third row of W) can be obtained by a similar, though computationally more laborious procedure. It is not described here. Details are given by Hill (1973). The reader should confirm that if v<0) = (1 , 1, 1), then wC0J == (l. L 1, 1, 1). This is the trivial result mentioned on page 181. We now show the equivalence between the reciproca} averaging procedure just described and the outcomes of the eigenanalyses of matrices P and Q in Equations (4.10) and (4.13). Suppose reciproca! averaging has been continued back and forth (rescaling the species seores as percentages each time) until stability has been reached. Then we can rewrite Equations (4.17) and (4.16) (in that order), dropping the superscripts in parentheses. Thus (4.17) becomes

for i

=

1, ... , s;

(4.18)

(4.16) becomes wj

= (x1jV1 + X2 }·D2 + · · · +x Sj.uS )/ej

forj=l, ... ,n. (4.19)

Next let us write these t . Let R-1 b h . wo equat10ns more compactly in matrix forro. e t e s X s diagonal · c- 1 be the n X a· matnx whose ith element is l/r¡; let n iagonal matrix h . . · . w ose _Jth element is l/c1. The rnat~ versions of (4.lS) and are (419 · ), with the size off each matrix shown below it,

and

V = R-1 X (sXl) (sXs) (sXn)

w

(n X 1)

=

(

c-1 X'

(n~l)

V • nXn) (nXs) (sXl)

(4.20)

(4.21)

CAL AVERAGING, OR CORRESPONDEN RforRO

CE ANAL Ys1s 187

_'tuting the right side of (4.21) for the subsll V =

.

w in (4.20) gives

(R -lX)( c-1X'v)

now operate on (4.22). Notice that . (4.22) We b . matrices I mu lt iplied as may e. convement, provided the1r. ordernay. be factored or ntheses are put m wherever they help to ak is never changed. pare h . m e the st der should check t e s1zes of the matrices at eps e1earer. The rea . every step to b multiplications are poss1ble. e sure that all First premultiply both sides of (4.22) by R1;2. Then R1f :!v = (R1;2 R - 1)(xc-1x')v (4.23) =

R - 1;2 (xc-1x')(R - 112 R1;2 )v.

(4.24)

Here the interpolated factor (R - 112 R1l 2) is simply a factored form of the identity matrix and leaves the right side of the equation unchanged. The reason for interpolating it becomes clear in a moment. Writing c- 1 = c- 112 c- 112 and rearranging parentheses, we now see that R1/ 2v = (R-112xc-112)(c-112x'R-1; 2)(R112v). (4.25) On substituting from (4.12), this becomes (Rl/2 V)

=

(4.26)

P(Rl/2 V).

. . 1 ment column vectors). Now Both sides of (4.26) are s X 1 matnces (i.e., s-e e Th s transposing the row vectors. u transpose both sides to convert t h em t0 left si de gi ves

anct transposing the right side gives , 1;2 p' [P(R1;2v)]'

Th

=

(R.1/2v )'P'

e last equality follows from the fact lience

= vR

===

v R1 12 P · 1

.

. etncaiso that Pis synun

that p

===

P'.

(4.2

?)

'R1/2)P. (v'R.1;2) = (v . . true for all s rhis result 1s lt f 1 tor of P. o lows that (v'R1;2) is an eigenvec

188

vectors of species seores, that is, for all s rows of V. Hence

VR.112 ex U

(4.28)

where U is the s X s matrix whose rows are th e eigenvectors of P. Postmul1 2 tiplying both sides of (4.28) by R- 1 shows that

V ex UR - 112 which, apart from the constant of proportionality

(4.29)

IN,

is identical with

(4.11). This explains why the species seores can be obtained either by reciproca! averaging or by eigenanalysis of P; the results are the same. It is left to the reader (Exercise 4.9) to derive the analogous relation between matrix Qin Equation (4.13) and the vectors of quadrat seores.

4..6.

LINEAR ANO NONLINEAR DATA STRUCTURES

The methods of ordination di.s cussed in this chapter so far (PCA, PCO, and RA) are ali achieved by projecting an s-dimensional swarm of data points onto a space of fewer dimensions. In the simplest method (PCA) the coordinates of the points before projection are the measured quantities of the s species in each of the n quadrats; centering and standardizing the data (both optional) merely amoun t to changing the origin and the scale of measurement, respectively. In PCO and RA, the measurements are adjusted in a more elabora te fashion (as described in Sections 4.4 and 4.5) before the swarm is projected onto a space of fewer than s dimensions. But, to repeat. the final step in all these ordinations consists in projecting a swann of points onto a line, a plane, or a three-space. It is obvious that whenever such a projection is done, there is a risk t~at the original pattern of the swarm will be misinterpreted; this risk is the pnce that must be paid for a reduction in dimensionality. We now ask wh~ther projection of the swarm is likely to produce a pattern that is posiuvelY

. 1 d" baS a ~s ea mg. The answer depends on whether the original data swarrn !mear or non linear structure. · Figure 4.12 demonstrates the d1fference. . · m~ The three-dimens10nal swar . . 1h . . 10 rd111a th . e upper pane as a linear structure; if a one or two-dimens10na hon 0 f th . . . plaJle. e swarm were done by proJectmg the points onto a line or

.. ic¡\R AND NONLINEAR DATA STRUCTU RES Lli,~ 189

(a)

.. ....,..,.,' .... ------

,."',,,.

.,-,:

,/.

,,~· --~-/ ""----- ----

( b)

Figurethe4.12. Linear and r( b). n_onlinear . case hollow(a)dots are the data swarms (solid dots) in three-space. In eacb coordinate frame. p OJectwn of the swarm onto the two-dimensional "ftoor" of Lhe

the result be sausfactory. . . data wouldwould b Sorne of the mformation in the original anothe e lost, of course, but the positions of the points relative to one rwould . the sense that p01nts . clase to each oth . b e reasonably well preserved m each oth:r m the original three-dimensional swarm would remain clase to Th ~ m the one or two-dimensional projections. obvio e spual swarm m · the lower panel has a nonlinear structure. There is· . Pro;e ~Yoo way of orienting a Jine or a plane so that when the swarm is 1 ªPprJ cted · ont 0 it . the relationships of ali tbe points to one another are even nto preserved. For instance, suppose the swarm were projected 0 e floor of the coordinate frame; it would be found tbat the pomts at

~;:mately

190

each end of the spiral, which are far apart in three-space, would be .f h d. . 1 . el ose together in two-space. In d ee d , 1 t e two- lffiens10na p1cture were the ' available representation of the swarm, it would be impossible to whether its original three-dimensional shape had been that of a ge a'1 a

. :~ly spi~

hollow cylinder, or a doughnut. It should now be clear that ordination by projection, for example b PCA, PCO, or RA, although entirely satisfactory if the data swarm is linea; may give misleading results if the swarm is nonlinear. 1t is sometimes said that PCA, for example, gives a distorted representation of nonlinear data. This is a misuse of the word " distorted." The picture of a many-dimensional swarm that PCA yields is no more distorted than, say, a photograph in which both distant and nearby objects appear.. One would not call such a picture distorted because the images of a distant mountain peak and a nearby tree-top, say, were close together on the paper. In the same way, the circle of points on the ft.oor of the coordinate frame in Figure 4.l2b is not in the least distorted. But it is rnisleading. What we require is a method of ordination that deliberately introduces distortion of a well-planned, spe· cially designed kind, that will correct the misleading impression sometimes given by truly undistorted data. Various methods of ordination that achieve this result have been devised. They are known collectively as nonlinear ordination methods. A note on terminology is necessary here. The contrast between linear and nonlinear ordination methods is that they are appropriate for linear and nonlinear data structures, respectively. The term "linear ordination" should not be used (though it occasionally is) to mean a one-dimensional, as opposed to a two or three-dimensional, ordination. The term catenation , suggested by Noy-Meir (1974), is a useful and unambiguous synonym for " nonlinear ordination." W e now consider how nonlinear data swarms can arise in practice. Then a good method of ordinating such data, known as detrended correspon· dence analysis, is described. The Arch Eff ect

~col?gical data often have a nonlinear structure. An example of an investigatwn that would yield such data is worth considering in detail. . Imagi.ne an ecological community occupying a long environmental gradt' ent, for.mstance the vegetation on a mountainside. The vegetation forros d coenocline ' a commuru·tY wh ose spec1es-composition · changes srnoo thlY aJl

LI

- RUCTURES

·· ese rcipenie . F1 gure .La giYe a diagrammatic portrayal of such a coenocline. Each i.·Te repre~enL ne species. the horizontal axis measures distance along the ~3. ·em. and the height of a particular species' curve above this axis shows the . ·~y the species responds to the varying environmental conditions along

íhe ~a dien t

·:-o · ima~e that the coenocline is sampled by placing a row of quadrats · 'd ) Will the - a ed at equal intenrals along the gradient (up the mountainsI e · . be linear f data points) "sultant ··data structure" (the shape of the swarm 0 . f the segment of gradient 0 h N nonlinear? The answer depends on the lengt

th . ac 15 sampled.

. ntains the peaks of one or Suppose the sampled segment IS long, and co d monotonically m . , .es do not respon ore speaes response curves. These speci . th y do not increase to th nt· that IS, e h they first increase e gradient oYer the length of the segme '. contmuouslv. or decrease continuously, along it. Rat er,is nonlinear. anct then d;crease As a consequence, the data structurent of the gradient . h rt segme . h But if samplin º is confined to only a s o h species present w t .e 0 1f' of eac · lf IS tgure 4.13b) over which the response curve h data structure itse s gm . li ar then t e ent is at least approximately ne ' .ªºain that results \, · ªPProximately) linear. onlinear data swarm a sampled e now consider the shape of the n d crease, aiong hen . and tben e severa! species first IDcrease,

192

(a)

(b) .. ·~.

· tal radient Utbe Figure 4.13. (a) The response curves of eight species along an envi~onmen g tb ti.e data 1 community (a coenocline) were sampled at a series of points along 1ts whole Ieng ' ¡ed at . swarrn would be nonhnear. ( b) An enlarged segment of (a). lf the coenocline werebsamp espoose a number of closely spaced points within the segment, wbich is so short tha_t t e ~at Jea t curves are not appreciably nonlinear within it, the data swarrn would be linear approximately).

. . . here are onlY grad1ent. Figure l4.4a shows a very simple artificial example, t uallY · . d and eq ' · three spec1es and their response curves are identically shape d t 1he · ¡0 cate spaced. Assume that n = 12 quadrats are examined; they are . . e (oí . . . . . d VISUa 1lZ s1tes marked on the honzontal axis. The reader 1s mVIte to uence .h . . . the seq construct, w1t stiff wire) the curve in three-space connecting . tes 1JJe · ordJllª of data pomts these quadrats would yield (each point has asco . repre· oiJlt amounts of each of the three species in the quadrat that the P bY tJ1e sents). lt will be found that the curve is the same as that shoWI1

ª

1l

t

\

4 1

IENT

4

-~

•

4

•

10

t:i

12

A 1 2

\

\ 9

'q

\

\ 1

••

_

.,..Axis 1

10

\ 12 /

o

~

- d hne in Fi,,,nr 4.14b. The latter is a two-dimensional pCA ordination dll.1. and a, u l . , d. t d .. picture" of the points; it shows 1 t fits thelll au~~mnr d L 1 is an un 1stor e . d t theplanet1a ,t . h' l uced ' ·ben they . . are proJecte on. ¡1 r:recf (so111et101es called 1 1 - · 1e 1.:urve e lub1ts the so-called 01e C.1. • whtill data fro m a long .. cr ). and the fact that it appear 1 by !lAl detracts tt 'rdin ted by PC A ( and also. as shown ~~:· drawback is tbis: fulne:: of PCA asan ordinat10n method.

°

.

ÜRDINI\ l l()N

one would like the result of ordinating the quadrats observ d . . h li e a1ong a 11 grad1ent to ave a near pattern themselves, in the present near . . case to for more or less stra1ght row m two-space. But they do not. The d h ma . . . . as ed curve. Figure 4.l4b, which IS almost a closed loop, gives a misleadin 1"d g ea of th . h h. . . grad1ent even t oug 1t 1s an und1storted picture of the data swa S . . rm. uppo one were to ask for a one-d1mens10nal ordination of these dat ª· Th · o f t h e quadrats along axis 1 turns out to be ord ermg r

3 4 5 2 1 6 7 12 8 11 9 10

'

an ob~iously meaningless resu!t. Bu~ if ordination by PCA (or by RA or by PCO) 1s performed on data w1th a lmear structure (as in Figure 4.13b), the result accords with what one intuitively expects; for an example, s Exercise 4.10. It might be argued that the PCA ordination in Figure 4.14b would not mislead in practice. The points are numbered according to their position on the gradient and can be joined, in proper order, by a smooth curve. But it should be recalled that this is an artificial example with only three species. Given field data with many . species, one always has to project the data swarm onto a space of far fewer dimensions than it occupied originally; · the swarm is a "hyper-coil" (a multidimensional analogue of the dashed curve in Figure 4.14b ), then when it is projected it will automaticall "collapse" and yield as meaningless a pattern in the line, the plane, or three-space as the one-dimensional ordering of the artificial example listed previously. The problem is, of course, compounded when the gradient sampled is Iess obvious than that of a mountainside, or is not even apparent at all. Indeed if environmental variables such as soil moisture, soil texture, and the like are varying haphazardly in space, there may be no gradient in the ordin~ sense. Then the quadrats will have no particular ordering befo~e an ~aly~~ is carried out, and the purpose of the analysis is to perceive theu ordenng there is any) and diagnose its cause. . t to What is required, therefore, is an ordination method that is not ~ubJe~aY the arch effect, one which will ordinate a nonlinear data swarm in tion· that clearly exhibits iri one, two, or three dimensions the true interrela ships among the quadrats. oti·d A The s Let us first consider whether RA is an improvement over PC · and curve in Figure 4.14b shows the RA ordination of the same 12 quadrats

ª

¡';

~¡N(.4~ AN

D NON LINEAR DA TA STRUCTURES 195

s the sort of results that RA is f resen t . ound t . reP It is an 1mprovement over PCA in Y1eld in practice . da 13 · ated. However, the effect is still p t~at the arch A' wi~h real gger resent m . euect is 1 3 ~,x d s not give an erroneous ordering .nuld form, and 1h ess il oe . h d. . on axis 1 . a t ough , it still pr d ·ngless pattem m t e uection of axis 2. mea.fll b . , a true o uces a ? quadrats would o v10usly consist of a row f ~epresentation of th 1.t . o equ1sp d e on axJ.s . a co ace . points along axis. 1 with no componen . . 2. Also, there is h end of the grad1ent: the pomts at the end ntraction of scale at eac . d . . . are more clo 1 ose at the nuddle an this vanation in spacing d se Yspaced than . . oes not corr th iation m the steepness of the enVIronmental d. espond to any var gra ient Therefore, although RA
°

Oetrended Correspondence Analysis Detrended correspondence analysis (DCA) is an ordination method that overcomes the two defects of ordinary RA. It fiattens out the misleading arch, and it corrects the contraction in scale at each end of an RA-ordinated data swarm. DCA does this by applying the requisite adjustments to an ordinary RA ordination. In general terms, the adjustments (to a two-dimensional ordination) are as follows. . The arch effect is removed by dividing the RA-ordinated data swar~ mto several short segments with dividing lines perpendicular to the first axis, and th so that the arch . . 1hen slidmg the segments with respect to one ano er . d. hift d up or down m such a isappears. More precisely ' the segments. are s. hi e ch segment (.i.e., the way that the average height of the po~nts wit n ~al The scale contrac~verage of their seores on the second axis) are all e~ htf~rward rescaling of tion at each end of the swarm is corrected by stra~g d d cription of these thos . . d d A detaile es e parts of the axes where 1t 1s nee e · . h m out have been P~ocedures and a FORTRAN program for carry1~; tr~cedures consist in given by Hill (1979a) who devised the method. T . p order to force .an overt ' . ·ed out U1 . · tuiuve . ' systematic data manipulat10n, carn ossible with in orct111 · . well as P . y sorneation mto a forro that accords as s intuitions rna d ta expect . . . k h t erroneou .th real a tirn ations. Therefore, there is a ns t experience W1 therein) atidest be forced upon a body of data. flow;~:Za and references ests With artificial data (see Gauch,

ª

suggest that the method gives useful results and permits ecologically e . orrect interpretations to be denved from confusing multivariate data. An example, using real data, of the contrast between RA and DCA ¡ shown in Figure 4.15. The data consist of observations on the aquatic5 vegetation at 37 sites in oxbow lakes in the floodplain of the Athabasca River in northem Alberta. The lakes dilfered among themselves in a variety of abiotic factors, the most important of which was salinity. Ordination of

RA

X X

o

o

o

DCA

..

• •

•

•

x••

• • • xX

•

•

e

o

o

•

e

X

X

ru~

X

x f the Athabasca ) lakes in the valley o d Nitelfa RA and DCA ordinations. ~f 37 oxbow. angiosperms (plus Cha~a ~ (•). those

or~inated on the basis of the c~~:t~ru;:;sd~sfti:i'::s~~d Fi ure 4.15.

o

X

X

(

•

th~m.

cl(a~e)s

those with

T~~a~~Xº~dinaUa~ ;'

growing m Three d those with neither spec1es (X). [ kindly provided by . with Triglochin manttma ' an . . f the same data was . a- (1984) . The RA ordmation o adapted from L1e11ers Lieffers (pers. comm.).]

.Ap,ARIS

eº'''

º

NS ANO CONCLUSIONS 197

,..111ple sites by RA and by DCA are h e sai.. . s own · !l1 of the figure. D1fferent symbols hav b 111 the upper a d ane1s . e een used f n lower P whether they contamed Typha latz:r ¡· or the sites d . g 011 10 za (whi h epend in ) Triglochin marítima (which thrives i . e cannot tolerate s linwater , n satine w a e ecies never occurred together. The RA . ater), or neither th tWO sP ord1naf , e the arch effect and the scale contraer ion clearly exhibits bo th d . ion effect d . pear when the ata are ordmated by DCA ' an both effects d1sap .

4.7. COMPARISONS AND CONClUSIONS Ali four . of the . . ordination methods described in this ehap ter have ment. m appropnate crrcumstances. Three of the m~thods are suitable ~or data with a linear structure, and such data are obtamed very frequently m ecological studies (van der Maarel 1980). PCA has the merit that it is the most straightforward, conceptually: of all methods; it allows the user to look at a visible projection of a multidimensional and hence unvisualizable swarm of points. In addition, uncentered PCA aids in the recognition of distinct classes of quadrats (see page 162). PCO sometimes allows one to construct and inspect a data swarm in which the distances between every pair of points corresponds (approximately) with sorne chosen measure of their dissimilarity. RA provides simultaneous ordinations of quadrats and species. With nonlinear data, DCA is the ordination method at present fav?red . . . d that it removes two likely by the maJonty of ecologists. lts a vantages are . . d t ordinated by ordmary sources of error that arise when nonlinear are d f t · t" effect Its e ec is that RA, namely, the arch effect and the scale contr~c wn . · by deliberately · . · lation that 1s, tt gams these advantages by data mampu . ' tbe scales of the ft · · 1 ¡ adJustments to atterung the arch and by applymg oca . 1 artifacts devoid of 1 mathemauca axes. lf these troublesome effects are tru Y . emove them. But · . · 1 deslfable to r h ecolog1cal meaning then it 1s obVIOUS Y etirnes lead to t e o ' d "d fects" may soro verzealous correction of suspecte e . f 1 ·nformation. unwitting destruction of ecologically meamng u l . st but are beyond the oh . nli ear data eXl d vised by t er methods of ordinatmg no n . . method has be~n e 4 and scope of this book. An especially pronus111g.b d in Noy-Meir (197al)t,ernaShe . · 1 0 descn e · g" or, b. Par~ and Carroll (1966); 1t is s " ararnetric 111appJJl d·fficult than 1 1 / efiy m Pielou (1977). lt is known. as pthernaticallY more ively ' as " continuity analys1s. · " It 1s rna

ªª

ª

l~H

DCA, but is free of the rather contrived "corrections" that . · · Parametnc · mapping could make De A ord mat10ns somewh a t sub.~ectlve. . . profitabl b tested in ecologtcal contexts; it may prove to have the merit Y. e without its defect of artificiality. s of DCA There is, however, a valuable byproduct of RA and DCA orct· . inat1on~ that more than compensates for any defects they may have They . . . · prov1de the mformat10n needed to rearrange the rows and columns of a data m atnx in such a way as to make the raw data themselves easily interpretable. Thi is possible because both methods ordinate the quadrats and the specie: simultaneously. Consider the artificial example in Table 4.11. (An artificial example ¡5 used for the sake of clarity.) The two matrices in the table contain identical information. Both record the abundances of 10 species in 10 quadrats. The upper matrix shows the data as they might have been collected in the field. I t displays no discernible pattern, and there is no reason to suspect that it contains a concealed pattern. It is typical of the sort of matrices obtained when observations are first recorded in a field notebook. N either the species nor the quadrats are listed in any particular order; indeed, one often
7,l,10,9 ,5,2,8,6,4,3, and the ordering of the species is:

2,9,8,1,4,6,10,7,5,3. gedin . Let the d ata matnx be rewritten with the quadrats (columns) arran th d h · · · the order e or ~r s own m the_ first hst, and the species (rows) ~r~anged 111 l. The shown m the second hst. The result is the lower matnx m Table 4.1 pattern or "structure" of the data is now strikingly obvious. An example using real data is given by Gauch (1982a). x1 ·bit 11 An altemative method of rearranging data matrices in order to e en their structure has been devised by van der Maarel Janssen, and LouPP (1978), who give a program, TABORD, for carryin~ it out.

r

1\I

~

1

¡S 11)<)

f ,\JJ

1.J-' t t. .

s

NtllNli RUl UR

ns

¡ rRlN l

fhl'

-- :irn rdcr~d dat ·1 matri ·

rtl\ '

Quadrnt 1

,

-

,l

'

4

5

(>

8

2 3

3 ./ 5

4

9

3 4

4

2

l 3

1

2 2 3 3

J

.,

3 1

..+

2

2

1 3

10

1

3

1

J

4

4

l

4 l

10

-

1

(1

8

4

o

2

2

3

3

3 2

-l

4

_,'

3

4

The sarne data váth the r w and e lumn" n~arrnng,ed

.,

9

4

3

8

2

1

1

4 6 10

7

1

10

o

3 4 3 2 1

2 3

1

4

3

3

4

1

.,

5

-

8

ó

1

1

3

3 4

3

2

3

4

1 2 3

3

4

1

1 2 3 4 3 2

1 1

5

2 1

3

l

'.!

3

.+

3

3

.+

EXERCISES 4

,

f thc e varinm:e

Consider Table 4.2. What are the eigenvnlue~ lllatrix yielded by the SSCP matrix R? \ = UX be the 4.2. L t tr1·x and let 1t1.tie e X be a row-centered data roa ' CA \ hnt d 1 the quni transformed matrix obtained by doine> p · .l,

ª

1

200 1

tr(X:X') and tr(YY represent in geometric terms? Why . . Wou1 expect them to be equal? [Rerrunder: tr(A) 1s the trace of 0. You i.e., the sum of the elements on the main diagonal.] rnatnx A )

1

4.3.

Show that the eigenvectors of a 2 X 2 correlation matrix are (0.7071

and

0.7071)

( -0.7071

a1Way~

0.7071) .

4.4.

Refer to Table 4.4 and Figure 4.6. From the table, determine the angles between: (a) the x 1-axis and the y 1-axis in Figure 4.6a ; (b) the x 1-axis and the y{-axis in Figure 4.6a; (c) the (x 1/a 1)-axis and the y{'-axis in Figure 4.6b; (d) the (x 1/a 1 )-axis and the y¡"'-axis in Figure 4.6b.

4.5.

Refer to page 164. What is the coefficient of asymmetry of axis 3in the example described in the text? (Note: axis 3 does not appear in Figure 4.9b because it is perpendicular to the plane of the page.)

4.6.

Let A, M, N, and 1 ali be n X n matrices. A is the matrix whose (i, j)tb element is given in paragraph 12, page 170. The (i, j)th element of Mis - -!-8 2 (), k ). Ali the elements of N are equal to l/n. 1 is the identity matrix. Show that (1 - N)M(I - N) = A.

4.7.

The quantities of two species in quadrats A, B, and C are given by the data matrix A

X=(!

B 5 4

Pe.rfo~ a PCO on these data by simple geometric construction. usmg city-block distance as measure of the dissimilarity between quadrats. (Show the result as a diagram of the pattem of the tbJee po~ts after the ordination; do nor compute the coordinates of the pomts.) 4.8.

Refer to Table 4 7 4 8 · , .. and 4.9. Confirm tbat ,32

=A 3.

Here r3 i the co r l · . 3 of V bl r e ation between the species seores in row a e 4. 7) and the quadrat seores in row 3 of W (Table 4.8).

(T

201

~.9.

Refer to Equation (4.27) on page 187 sh . f r a RA ordination are related to ;he º:"'ing how the species seo , o e1genvectors 0 r res Equation (4.12) (page 181). Derive the an J P defined in . a ogous relat" b the quadrat seores an d t h e e1genvectors of Q d ~on etween

(4.13).

efined Jn Equation

4JO. Consider the following data matrix species and the columns quadrats.

X=

11 17

12 19

22 30 34

27 27 28

in

which the rows represent

13 21 32 24 22

14 23 37 21 16

Find matrix Y, giving the coordinates of the four points after an unstandardized, centered PCA of the data. (Hint: with these data there is no need to construct the covariance matrix and do an eigenanalysis. To perceive the structure of the data, inspect the results of plotting against each other the quantities of every pair of species.)

Chapter Five

Oivisive Classification

5.1.

INTRODUCTION

In this chapter we return to the topic of classifying ecological data. Several methods of classification were described in Chapter 2; all were so-called agglomerative methods. Here we consider divisive methods. The distinction is as follows. In an agglomerative classification one begins by treating ali the quadrats (or other sampling units) as separate entities. They are then combined and recombined to form successively more inclusive classes. The process is often called "clustering." Metaphorically, construction of the classificatory dendrogram (tree diagram) starts with the twigs and progresses towards the trunk.

A divisive classification goes the other way. lt starts with the trunk an.d pr h llection of quadrats 1s ogresses towards the twigs. That is, the w 0 1e co bd. ·_ treat d . h divided and the su iv1 . e as a single entity at the outset. 1t is t en Stons re d.iv1ded, . . again and agam. . hods have one notaCompared with divisive methods, agglomeratr:e mhet mallest units (the blqe dis ªdvantage. It arises because they start w1 th t e .sal quadrats in the uadrats themselves). lf, by chance, there are a f ew atyp1c 203

204

DIVISIVE ClASSIFICAnoN

data set, th.ese quadrats are ,~kel~' to ~ave a strong ~ffe~t on ~he first round of a clustenng process, and bad fus10ns at the begmmng w1ll influence later fusions. The obvious (but, with one exception, impracticable) solutio is to adapt the agglomerative methods so that they can be used the oth way round. It is easy (in theory) to devise a method of classification division that proceeds as follows. The whole collection of quadrats is fu divided into two groups in every conceivable way, and one then judg which of the ways is "best" according to sorne chosen cri terion. lf there n quadrats, there will be 2n - l - 1 different divisions to compare with 0 another (for a proof, see Pielou, 1977). Having discovered the best possib division to make at the first stage, the whole process must be repeated each of the two classes identified at this stage, and then on each of the fo classes identified at the second stage, then on each of the eight class identified at the third stage, and so on. N ot surprisingly, the comput requirements of such methods are so excessive that they are infeasible unle n is very small. However, there is another method (actually, a whole set of relat methods) of doing divisive classifications that avoids these computation diffi.culties. lt consists in first doing an ordination of the data (in any wm one chooses) and then dividing the ordinated swarm of data points wi suitably placed partitions. The procedure is known as ordination-spa partitioning. The term describes a large collection of methods since one e choose any one of a number of ways of doing the initial ordination, an then any one of a number of ways of placing the partitions. Gauch (198 has reviewed the development of these methods. Collectively, they constitu a battery of exceedingly powerful procedures for interpreting ecologic data. They yield an ordination and a classification simultaneously, and th classification, being divisive, avoids the disadvantage of agglomerative clas sifications previously described. lt should be noticed, however, that ordination-space partitioning is much more "rough and ready" method of classifying quadrats than th agglomerative methods described in Chapter 2. Even so, it is probabl adequate .for most, if not all, ecological applications. And, as so ofte happe~s m efforts to interpret ecological data, one is faced with tli pere~mal problem of choosing, judiciously, one of a large number 0 poss1ble and only slightly different procedures. In the following sections, we considera few representative methods.

coNS rRlJ(T

iNG ANO PARTITIONING A MINIM

UM SP

AN N1Ne TREE 205

2

coNSTRUCTING ANO p ARTITIONINC A

~ 1 N1MUM SPANNING TREE

, rernarked previously that one (and

lt was . . on1y one) f s of classificat10n can be done in the agglomerat· met110d . reverse. The 1ve

°

. l1bor clustenng (see page 15), also known a . .method is nearest. . s smgle li k . do a nearest-neighbor classification dI·v·ISIVe . 1y the nn age clustermg· o . d T lotted in s-space ( n IS the number of quad t' ata points are rs . . . ra s to be cla ·fi d P fi tnurnber of species); this is equivalent to dom· . . ssi e , and s t11e . . . . g an ordmat10 f h with no reducuon m dunens1onality. n t e data The. points of . . . tree A . the swarm . are then linked . . by a mimmum spannzng spanning tree IS a set of line. segments . m . the swarm · . linking . . all the n pomts in such .a way that every pau of pomts 1s linked by one and on1Y one path (i.e., a line segment, or sequence of connected line segments). None of the patbs form closed loops. The length of the tree is the sum of the len ths of ¡15 constituent line segments. The minimum spanning tree of the is the spanning tree of mínimum length. (Note: Do not confuse a spanning tree with a tree diagram or dendrogram.) Figure 5.1 shows a simple example with n = 10 and s = 2. The coordinates of the 10 data points in two-space are given in Data Matrix # 1 (see Table 2.1, page 18), and the swarm of points is identical with the swarm in Figure 2.3a (page 17). For clarity, the data points are here labeled with the letters A, B, ... , J instead of with the numerals 1, 2, ... , 10. The length of ne1g

°

sw~rm

each line segment is shown in the figure.

Partitioning the Tree

. . . method is now done by Nearest-neighbor classification by the divisi:e then the second e tf · t link in the tree, u mg, m succession, first the 1onges ess is illustrated, up longest link, then the third longest, and so on. The procs i·n Figure 5.2. The t h 0 f dendrogram 0 ~ e penultimate step, by the sequence . of all yields a dendrogram ~ltimate step, that of cutting the shortest link ' 1 there are b idenf ¡ . . . ica With that m Figure 2. 3 · ;11 the examP e, . Th t when, as .u.. h rniJUIDum e procedure is very easy to carry ou lotted, and t e only t m can be P Let us now wo species· for then the data swar t of paper. span · ' d. sional shee mng tree drawn on a two- 1men

'

'\I

/\

e

1,¡

G

N

') ('

lf)

11 \

w

w

o...

lf)

13

.lb

H

?'t'

E

•,o

10 78

o

o

10

20

SPECIES

30

40

1

Figure 5.1. Thc points of Data Matrix #1 linked by their mínimum spanning tree. The coordina tes of thc quadrnt poinl, (hcre Jabclcd with the lctters A, B, . . . , J) are given in Table :U. Thc distancc bctwccn cvcry pair of joined points is also shown.

describe the procedure in such a way that it is applicable whatever the value of s. It is still convenient to use Data Matrix #1 asan example. lf the data swarm is man y-dimensional and hence unvisualizable, the line segments forming the mínimum spanning tree must be found by inspecting the n X n distance matrix showing the distance between every pair of points. The distance matrix for Data Matrix # 1 is given in the upper panel of Table 5.1. It is identical with the distance matrix in Table 2.1 (page ~S) except for two changes. In Table 5.1 the quadrats have been labeled with . ' . · the lette~s mstead of n~merals, as explained. And sorne of the d1stances the matnx have been g1ven superscript numbers· these label the segments . . . . , d TheY are rrummum spanmng tree m the order in which they are foun · f . . oower ound as follows. (The method is due to Prim, and is descnbed in and Ross, 1969; Rohlf, 1973; Ross, 1969.) ·n the The first segment ~f the tree corresponds to the shortest distance ~ Elt table. The shortest d1stance is 2 2 = d(E H) the length of segmen bY h . · · · ' ' . find, t erefore, supe1scnpt 1 is attached to this d1stance. We next . ce t dista 11 . s.ear~ hmg the rows and columns headed E and H, the shortes :; :; 3,6; linkmg a third point to either E or H. It is the distance d(E, B)

:r

coNlf

RlJCflNG ANO PARTITIONING A M

INI MlJM SPA NNINC TREE 207

CUT 2

cur 1

CUT 3

e

CUTS 6 87

CUTS 4 8 5

I

C

G

D

1

I

C

GAF

D J

1

___ L CUT 8

r co•FoJB ruc on of Data Matrix # . the const (10º.0 f the dendrogram givmg ti igure 5.2. St ages m . . a nearest-neighbor classificacut !tves the complete d1- Each successive cut perrruts a division to be made. The final (ninth) endrogram, which is shown in Figure 2.3b. F'

searchin~ ~uperscnpt

lherefore . 2 is attached to this distance. We next find, by linlting a / e rows .and columns headed E, H, and B, the shortest distance therefore, pomt to _any of E, H, or B. It is the distance d(H, J) = 5.0; d(B,H) perscnpt 3 is attached to this distance. Nouce that although of th . .Ois shorter than d(H J) segment BH is not ad!Illss1ble as part 4 e nuni . tree as ,it would ' not pe . mu spanrung forma loop BHE and Joops are

~

~~rth

01

rnutted · ntinuing ts needed to Co eteih · · m the same way• all the n - 1 ::::: 9 .segmenh nd in the cornp1

e tree are found. They are listed, with thelf Jengt s ª

208

DIVISIVE ClASSIFI(

ATION

TABLE 5.1. A DIVISIVE NEAREST-NEIGHBOR CLASSIFICATION OF DATA MATRIX #1.ª The distance matrix : tJ e B A A

o

o

e

o

D

F

E

25.0 15.8 27.0

16.5 11.38

14.4

o

B

D

5.7 20.0 21.5 29.2 23.6

18.0 3.6 2 12.5 14.9

o

E

G 7

6

6.1 9.2 5 15.1 19.l 12.7 11.2

o

F G H

H

o

17.9 4.0 14.4 12.7 2.2 1 23.3 12.2

o

J

27.3 19.4 24.8 8.1 13.69 19.2 40.3 7.8 4 25.5 7.2 31.0 24.4 27.9 13.3 27.6 5.0 3 o 32.5

o

J

The lengths of the segments of the mínimum spanning tree, in the order in which they were found: 4: d(J, D) = 7.8;

2: d(E, B) = 3.6; 5: d(B, G) = 9.2;

3: d(H, J) = 5.0; 6: d(G,A) = 6.1;

7: d(A, F)

8: d(B, C)

9: d(C, I) = 13.6.

1: d(E, H) = 2.2; =

5.7;

=

11.3;

Diagram of the mínimum spanning tree, constructed from the segments whose lengths are given above. The segments are labeled 1, 2, ... , 9 from longest to shortest, showing the order in which they are to be cut. FA

•

6

•

G

5

•

BE

l~

8

•

J

H

9

•

7

•

D

4

•

:see Table 2.1 and Figures 5.1 and 5.2. The row and column headings refer to the quadrats.

order in which they were found, in the center panel of Table 5.1. Observ that they were not a·iscovered m · order of mcreasmg . . length; some of th later-found segments are shorter than sorne found earlier. Now that th seg~ents have been found, it is easy to draw a two-dimensional diagrarn matic representat· f h · · ;,, the IOn t e ffilrumum spanning tree as has been done ~· . bottom panel of Table 5.1. The segments are linked together in the order in

°

coNS

fRUCTING ANO PARTITIONINC A M

·a

INIMUM SP

.

ANNINC TREE 209

. h theY were 1 entified; they have b ,vJJ.lC f h 1 een assig ¡enª {hs with 1 or t e ongest up to 9 f or th hned ranks acc or d.mg to th · º o obtain. a nearest-neighbor class·fi . e s ortest. e1r T 1 cation f e it remams to cut the tree's segment rom the minimum . 1re , . . . . s, one af te spanrung e largest. This partitiorung process h r another, beginrun· . h . as airead b g w1t th ·aure 5.2. Of course, w1th a many-dimen . y een demonstrated . flº .. s1onal d t m be plotted and partlt10ned as was the tw~-d· . a a swarm which cannot .. . tb d imens1onal s . e part1t10nmg mus e one on the d1'ag . warm m Figure 5 1 !h . rammatic · · · ' constructed as shown m Table 5.1. The dia mirumum spanning tree b gram can alw . dimensions, regardless of the value of s 1 ays e drawn in two . . · n practice of in the classificat10n can be done by comput . ' course, all the steps Rohlf (1973). The foregoing description ofe:h: pro~ram has ~ee~ given by met od explams Its princip1es.

Clarifying an Ordination with a Minimum Spanning Tree

When s-dimensional data (with large s) have been ordinated in two-space, for example by PCA, there is obviously always a risk that two points that are far apart in s-space will appear close together in two-space. This is

particularly likely to happen if the data have a nonlinear structure (see Chapter 4, Section 4.5). To avoid being misled by the spurious proximity of points that are, in fact , widely dissimilar, it often helps to draw the minimum spanning tree on an ordination diagram. Then the fact th~t 11 apparently similar points are not linked by a segment of the tree makes obvious that the similarity is only apparent. Figure 5.3 shows an example. The data are from Delaney and Healy (1966) and the analysis from Gower and Ross (1 969 )· The purpose of. the . . 1O isolated populat10ns research was to investigate the relat10nships among t f . f veral skull measuremen s. 0 0 shrews of the genus Crocidura on the basis ~e . 2 11 ws the 10 Ürdinating the data and reducing their dimensionality. to th: ~gure. The dat · s shown in pomts to be plotted in two-space, comes trom five po l · h One group pu at1ons belong to two groups of five eac . 1 d· the other group of th . ( of Eng an , d e Scilly Islands off the southwestem 1P of France, an 1 co ' ff h north coas rnes from four of the Channel Islands 0 t e d. tion did not show frorn 1 d lf the or ina f h th ap Gris N ez on the French mam an · elude that two 0 t e e rn · b tural to con Inimum spanning tree, it would e na

ª

ª

e

.

210

DIVISIVE CLASSIF

ICA110N

Channel Island populations (those from Jersey and Sark, labeled 1 were closely similar. In fact, as the mínimum spanning tree shows t:nd S) more similar to one of the Scilly Island populations than to each other ey are Thus a mínimum spanning tree is a useful adjunct to an ordinati~n can be helpful i~ pre.venting misinterpre.ta~ions. When .the number of po:~ being ordinated is fa1rly low, then the rmmmum spanrung tree can be show as part of the ordination, as in Figure 5.3. When there are a large number ~ points, a diagram showing the tree as well as the points may be too confusing to be useful as a final portrayal of one's results; even so,

0

J

~ s 1
11

11 \1

o---

----<{---- - 1

\1

---

11

1

b

B

. · g 1he F1gure 5.3. A two-d · · al . . . m shOWJll . . . imens10n ordination of a ni.ne-dimensional data swar brews. rruru mum spanrung t (d h . ts on s ree as ed line). The original data were skull measuremen dia-ercol 11 JI co ected from 10 lo 1 . . frorn ' · d . h . ca popu 1ations. F1ve of the populations (solid dots) are · d (circlcsl is1an s m t e Sc11ly Isl d Th r Jan s 5 1 and Ca G . an s. e hollow syrnbols refer to four of the Channe 196 9.) fhC . p ns Nez (square) ; J = Jersey; S = Sark. (Adapted from Gower and Ross, 1I )' OP ~s~_t ~ap . shows the locations of the two island groups (A = Scilly Is., B = Cha.Dlle s. ' ns ez is on the north coast of France , far to th e eas.t

,, 111

ning

t tm 1 n1 ,r unn tt. d . th

tn\lsti

IN

N

•·111011

so that

thod

'tmpk. unifü.:ial e ampk -erves a a demonstration. We uadrat' that e 1ntain. t:1gether. eight species. The data are in LIT 1 . in the tL p panel :-if Table 5.2. The columns are headed euer: . . . . . F. '·l ch ar the quadrat labels. An unstandardized, p · - carried 1Ut on these data in order to determine the ' mp nent .-, r ( the e ordinates mea ured on the principal 1 e data p int ·. Th s cMrdinate , rounded to one decimal e \ ·en - t e e [umn: f matrL Y in the center panel of Table 5.2. ter !he p . the , t point: are rdinated in five-space. Hence Ju e 1rr spondin,, l each row is shown on the

=

-~-

malle-1 tfi th

te~

ei~en ~

n the fi.fth

1lu is much :maller than others, .ªºd a· g the to one decunal

is ar all z ro after roun

in

212

CLASSIFICATION OF SIX QUADRATS TABLE 5.2.VITCH'S METHOD. BYLEFKO :_:_:_~~~~~~~~~~~-------Data Matrix # l 4 :

A

X=

B

C

D

E

9

10

4

o o

28

25

3

1

1

37 14 2 8 1 7

39 15 1 11 3 7

50 65 20 19 21 30

40 50 8 15 10 25

45 42 10 12 11 24

F

1

o 46 40

o o 50 23

The matrix of coordmates a ft er PCA (the principal component seores): .

Y=

A

B

- 36.8 0.3 2.2 -0.5 O.O

- 33.8 O.8 2.2 0.1 O.O

C

D

30.5 -14.3 9.5 0.2 O.O

11.8 -11.5 - 7.2 - 3.9 O.O

E

F

7.5 - 7.2 - 6.8 4.3 O.O

20.9 31.9 0.1 - 0.3 O.O

= 676.l 234.l ;x. 3 = 32.9 .\4 = 5.6 ;x. 5 = 1.6 ;\. 1

,\2 =

The matrix of signs: A

+

y= (+ s

B

+ + +

e

D

E

F

+

+

+

~)

+ +

+

Figure 5.4a shows the data swarm projected onto two-space; equiva; lently, it is a two-dimensional PCA ordination of the data. The coordmate of the points are given by the first two rows of Y. h Matrix Y, in the bottom panel of the table gives the signs of 1 e . elements in Y, which is all the informat10n . · d for correspondmg reqmre carrying out the classification. To make the first division, consider the :~1 row of Y,. We see that A and B both have rninus signs, whereas C, D, E, Bl F ali have plus signs. Hence the first division is into the two classes (A.ro· and (C, D, E, F). The division is shown diagrammatically in the first us 10 . ªso l obvio gram . p·igure 5.4b. That this should be the first division is

den~

I'

( 11)

,, •1

,,, (\

lt

1

•111

/

,,

11

/

1,

11

(h)

,'111f

1 1

11111 1

¡1

1

1 ,, ,

,

¡

1 ,, 1 1

l / ,, 1

,,

11111

, 111111 1

"" • 111111 1

1

1

! 11

1

1

1

1

,,

li'ii,:11rc• c;A, ~ 10 11 .il

< '1:1 ,• ,if1< .1111111

Pe/\ c11d111:1111111 cil

of .1

·.1~

11111111 d.lf.1 "'

'

1

1

1 11 ¡ l ,1f~1,1it 1 11 1111 lf111rl r 11 ¡ ,, l t/'11h111•11 •11 ,llJí'/ 1,f ,,,, ,,.. , ,,,,,,,. ,,,íl

1111 p1111i1 •,, ( ¡,) 1111 ',I q111 ,,, ,

fio111 thL· S< ;lllt•¡ cli;i¡•1;1111 Íll f •ig1111 ·

hav~ 11~1·at1v

11

1

1/lfl •¡¡IJ1,,11

JI J',

~/¡

,i

IJ1;if Jr'rJJd' , /

H

:111d

1 o()idiii;tl( 'I,, ; 111 d 111 · 111rw11111i¡ j p111111, ¡111',Ji11 1 1,1,1,¡il111:it1,,

1,,,

axis l. 'l ltc s · ·n11d d1v1:)'"'' · 1il1 1 ~w1 ,c,1, u 111 f11,, 1JJ:1d1 , 1, J1!11 / 11¡ 1.:1111111111¡~ 1 111 l. ,,f y· u1 l1rn11a¡•,1: 1111 1· al tli1; :,< :itlt r d1:wr:w1 <>11 :J1:I', í ¡1111111 1 f1:1, : ¡1 1 1',1111 collrdi11:111, :111d po1111 :, <', 1 ' :111d I ~ ha .;1, "' í')ilJ ¡1 '1' rl d111:i11 ¡,, I ¡, "' '• 1111, 1

1

1

i

~l:l'{)Jld d1vi:,1 1111 ~. plil :,

l:t, :, (< , IJ , J~,

J )

' I J11

tJ111 •

1

);1•, ,• ,

11'1/I 1

1

1

,t , 1; 1111

1

,
I

214

I/

'" '' / 1,,, ¡

. A B) (C '. IJ, n), íJIJ l ,, ., ;1 , ,Ji tU(IJ lll 11. ,• 'J1¡1J d1 r111t 11 1:, , are, thercforc, { • ' 11 in Figure 5.4h . ¡ '1 rd divi:m 111 w1 , tr11J h l 1, 1111 11J 1 ' , 111 ,,, J 1¡11 11 ,, ', ;¡ " 1 To do l 11 • ti 11 1111 . ,· · I (Thc )'COHl<,, 111<, l/llf>J1<.,¡d ,11111 ', C'H 1 ,f¡j j l,1 11, i.d1 1t,1l, 1 1 two-d1mcnswn.1 . ,1 ''"' , ) l•roin iow 'i uf y ti 11, : ,(.,1 n l h:it, ' ''' f h1 lh1r 1:, 11 , 1 1, stage howcv r. , • and havc posttivc '()(JJdmal 1:, :1nd f) :w 1 I', 1i:J ¡1 i1 l''lf1; _, v,1,1d111:11i.; '. We thcrcforc separa te tl1osc.., porn b l11:il, J1 :110 11111 ' ,ni 11~11, ¡,,, tfw /1r )1 time , namc 1y, (( ') f Olll (IJ • fl)' · 'I bi: 1'1' f',; , tli lfo 1tf Ji.,,,, 11 ,v1:11 , 11 11 , ¡ 1 ' irt 1

F

5.4b. The two two-mcmbcr cla1->hcs p 1....t>cri l :1 l U1 1,, Uwd di 11 ',1 1111 : 11 1,, f,1,1r1 '.fiht íil the fourth division : (A , B) splítH mlo (JI) ;wd (HJ , (J >, f '. J 1f11i '> irit1 , (fJJ íJrid (E). Th.is is clcar from row 4 <Ji' Y , but 11 1 111 11 <...íJJ 111 <, t ~1c Vl',1:1lw,;d geometrically sincc wi:, are cor1c(.;.rncd wit11 Oi e ,,,,r,,d in: ti.,1, 1,f IJ1L. f' ,ínL Uicr lhn.;c. Observe Lhat points that hlJV0 bc<,,'>rrn; 1cparnl(;,d ;1t f1{J CMIJ •,tfJgc; 1,! th1; classífication cannol beco me f(jUflÍt0
In the final dendrogram) the hcjght:, <Jf thr.,, fJrf> t, '">CVJIVJ , .. , n<1dc~ (countíng from the top W'> thc rclativc Jcngth <Jf thc {JXÍ ~ that was "broken" at the " real" clu~ters undlVlded; for mstance) in thc examplc in f igun 5 4 onc m1ght regard (A, B) an_d (D, E) as true, natura] el u~tcr~ ani~cation a~ complete at the third stage Th 1·s · . · . . . f' , tion: . . . · ts one of the ad vanta''C'> <Jf a d1 vi &1ve class1ica 6 the subd1vis10n pro h h ali "t 1 ,) cess necd not be contrn ucd bcyond the stage at w JC ru Y separate cluste h· b · d , ntage . h .. ' fS ave CCn ~Gpé:tn:1tcd. rf he aS' ociated d1sa va . is t at a dec1s1on must b d d fined. equivalentl e ma e a.~ t<J h<>w a "real" el u&ter &hall be e f Y, a rule has to b j · uence o subdivisions sh ]d e < cvi.se:;d fr>r dcc1drng when the seq re ou stop Such , 1 . a several possible e .t . ·' ª ru e Jh unav<Jidably arb1trary an d there . d far n ena for -cal lecJ stopptng rules is

Noy M · t' 1•' hlio11 UJ M ·lito 1 1111

h · 111. ti 1.i 1101 q111l1 111 1.11rq1I!,,. A, w1U1 L0Jkc,v1tch ':) llllllirnl 1111 d,111 u< lt1 f 01d111nkd (w1ll1 11<1 11;,d11 licw 111 dírn e.11 i-,irniaJity) liy I' A, lit< p1111 q1,tl .ixt, 01 tlll;JJ l11ok1;11 111 Lw1¡, cm aflc.r anoLhcr ,

111 "''1111d ¡d .11

~1:11 1111 • w11l 1 1111 111 .1. r >Y M 11 s 1111 ,ll1qd ddJl-1 :) i11 th 0 way 111 which thc "111 al po111t ' 1 lio.111 101 ;i 11 111 ;;iY. . lt Jt, :,() pl ;1 ·d ;1~. t<J rnak0 tlic ~ uro ni tl11 (w1lli111 1•1q11p) v: 11ia11 '} , qj tlic; prir1<,., Ípí1I wnp11n0nl :.,i.;orc~ of thc:, ¡11<111 p. 1d p11111 I., 111 1lli 1 :.id ()1 111 · IH erik p()llll a:) :,JTl
11,X AMPLH. 1, 1 11 111 lh11d '1111 .11 p'

Cllll

1

,

id11 1111 IJl',I a,1,1 ,, arad '" . (;1->U lt of hn;akrn'' Jl IJ dWf.)511 r<JÍnl:-. J~ ancJ L. ., ¡,,, ni 4 p()trJL (A , B, <:, and IJ) lo th i.; lcll th1 :-. br(;ak

'1111 ·,1 , 11 ,

,,¡

,,¡

1

pq1111 :111

%.~ ,

f 1 \

~

¿y,2 1

l

31 8, ~~().,) , J 1 .8 )

11 ?.. l .'J8

CLASS CA"nor"' crr 3 TABLES.· D BY NOY-MEIR'S METHO .

----

-

(J

First Break . . . The seores on the first pnne1pal ax.is are D A B C Point: 11.8 - 36.8 - 33.8 30.5 Seore: Within-Group Variance Break Right Group Left Group Between 608.17 o Aand B 104.31 4.50 B and C 46.81 1445.46 C andD 89.78 1121.98 D andE o 883.97 Eand F The smallest sum is marked with an asterisk. Malee the brealc between B and C.

E 7.5

J

20.9

Sum of Variance~

608.17 108.81* 1492.27 1211.76 883.97

Second Break The seores on the second principal axis are

Point: Score: Break Between

A 0.3

B 0.8

C -14.3

D -11.5

Within-Group Variances Left Group

Right Group

351. 70 O 73.57 571.81 61.65 764.41 E and F 46.45 o Malee the brea}{ between E and F. Third Break

A and B C and D D and E

The seores on the third principal axis are Point: A B C D Score: 2.2 2.2 9.5 -7.2 Break Between

Aand B e and D

Within-Group Variance Left Group

Right Group

o

17.76 46.85 Make the br ak b e etween C and D. D~dE

:~:ponent Table 5.2 fer the seores y

Da

48.05 16.84 23 .81

E - 7.2

F 31.9

Sumof

Variances

351.70 645.38 826.06 46.45*

E -6.8

F 0.1

sumof Variances

48.05 34.60* 70.66

~

. ta Matnx X and the matrix of pnncip

pARTITIONING A PCA OROINATION

217

fhe seores of the n 2 = 2 points (E and F)

. to the nght of the break point are

(y5, Y6) = (7.5,20.9)

with variance

1 { n2 - 1

6

1 (L 6 Y; )2} -~Y;2- ;¡

1-5

2

=

89.78.

i=5

The sum· of· these two variances is 1211 ·76 · (It sh ould now be clear how ali the entnes rn the table are computed.)

lt ~s se~n. that the smallest sum of variances is obtained when the break

on this between points B and C. Hence the first d.lVlSlOil · · Of th e ºfi axis · is· made · elass1 cahon is ~~º. the classes (A, B) and (C, D, E, F). The second d1v1Slon is made by breaking the second axis. In this case the break _comes between the points E and F. Therefore, the three classes recogruzed after the second division are (A, B), (C, D, E), and (F). Likewise, th~ four classes a~ter the third division are (A, B), (C), (D, E), and (F). The ultimate step, which needs no computation, is to break A from B and D from E.

It should be noticed that the sequence of breaks is identical with that yielded by Lefkovitch's method. In the example, each ax.is was broken exactly once. In another version of the method, an ax.is may be broken more than once if such a break gives a smaller sum of variances than would the breaking of a hitherto unbroken axis. Details are given by Noy-Meir (1973b). He also discusses applications of the method to data ordinated by other forrns of PCA (besides unstandardized centered PCA as used here). And he suggests possible stopping rules. The method does not lead to unwanted splitting of tight clusters of points as Lefkovitch's method sometimes does when a cluster happens to be skewered by one of the principal axes (see Exercise 5.2). ~t is interesting to note the resemblance of the method to rninim~1:1 vanance clustering (page 32). It is not true to say'. however, th~t Noy-Me~,r. s Partitioning method amounts to rninimum vanance clustermg done m reverse." Thus one
218

DIVISIVE CLASSIF

ICAT10N

tation required (page 204). The only divisions exa . amount of cornpu · · llllnect nding to breaks of the pnnc1pa1 component axes· h are those corresp O . . ' ence . . there are only n - 1 poss1ble break pomts on each axi ' g1ven n pomts, s.

5.4. PARTITIONING RA ANO DCA ORDINATIONS A

artitioning method devised for application to PCA or PCO ordina-

ti:~s ~an, of course, be applied to RA and DCA ordinations as well, and

vice versa. Hill (1979b; and see Hill, Bunce, and Shaw, 1975) developed a partitioning procedure that was applied to RA ordinations, but there is no reason why it should not be used with PCA and PCO ordinations. In principie, it consists in carrying out a one-dimensional RA ordination and breaking the axis at the centroid so as to divide the data points into two classes. Each of the two classes is then itself split, in exactly the same way, to give a total of four classes; then each of the four classes is split to give eight classes, and so on. The method is known as two-way indicator species analysis; a computer program for doing it, called TWINSPAN, is available (Hill, 1979b) and, as its author comments, it is "long and rather complicated." This is because, at each step, the required one-dimensional RA ordination is first done in the ordinary way to give a "crude" partitioning of the data points; it is then redone (at least once and sometimes twice) with the species quantities ~eighted in such a way as to emphasize the influence of especially useful diagnostic species (i.e., of differential, or "indicator," species) identified by the first ordination. These (and other) refinements are thought to make the classification more natural by ensuring that "indifferent" species (those that are not diagnostic of true natu~al classes) do not affect the results. However, the price of such refinements IS the lo 0 f · · . . h d of . . ss s1mplic1ty. Whenever a simple basic met 0 ~alys~s ~s refined and elaborated, the number of possible 'modified forros of e ongmal rnethod in the111 b . . creases exponentially and choosing among eco~es mcreasmgly subjective. NSP Al'..¡ I t IS worth conside · analysis in orde t d nng an example (in Figure 5.5) of a TWI d 0 rh emonstrate how clearly the results can be displaye · One can h · (on) bY ave t e best of tw 0 . presenting a t wor1ds (classification and ordma 1 wo or three-dimensional ordination of the data under invesu·

p/4 RTITIONING RA ANO DCA ORDINATIONS

219

gation, and _the~ drawing the_ partitions that yield the cJassification directly To complete the representation, the on the ordmahon scatter. diagram. . classification dendrogram is g1ven as well. Figure 5.5 shows the result of an ordination-plus-classification of vegeta-

tion. It is adapted from a figure in Marks and Harcombe (1981). The scatter diagram shows a two-dimensional RA ordination of 54 sampJe plots representing the range of natural vegetation in the coastaJ plain of southeastern Texas. The data matrix was also classified, using the TWINSPAN program, and gave the classification dendrogram shown asan inset on the graph. The four groups
• •

• •

•

........•·:•,. •

• C\J

•

Vl >(

<{ <{ Q:

0

p

PO

•

HP

•

F

RA

Ax i s 1

b

TW INSPAN classification of data on t e · al ordin ation. and Figure 5 5 A two-dimens10n . asoutheastern Texas. The vege t tion classes ak m . · PO pme-o v~g~tation· · of 54 sample ~lots o f vegetat10n land ine forest and wetland pme sav~alain hardwood a:d pine forest; F, llatland and (1981). In the d1stinguished are: P, sandhill and forest on upper slopes; HP, hard . k t [Adapted from Marks _and !far . artitioned iolo forest and wetlands and shrub thic ~ s. f th r and the ordinatton diagram is P . . is . carned ur e 0 . . ' nginaJ paper the class1ficat10n

ª

:~od

10 classes rather than merely 4.]

~:~b~

.

DIVISIVE CLAss

220

IFICA.110~

1t should be noticed that when an .ordination and a classification are done simultaneously, it becomes poss1b~e to represent. the classification . the mo st natural way poss1ble. Thus, cons1der Figure · . lf dendrogram m . 55 t whích the data had been subJected had been a . the only ana1ys1s o . to four classes ' the resultant dendrogram, thought of as a e1ass1.fica t.10n m bl . mo b11e capa e of swiveling at every node, could have been . drawn in any one of e1.ght ways·, for instance ' one of. the other .poss1ble versions ('1n addition to the one shown in Figure 5.5) is the followmg.

HP

F

PO

p

However, only two of the ways (that in Figure 5.5 and its ·mirror image) show that, for example, HP is closer (more similar) than F to PO, and that the greatest separation (dissimilarity) is between F and P. It should now be clear that the numerous possible ways of drawing a dendrogram are not ali equally informative. One of the merits of the TWINSPAN program is that it arranges the dendrogram's branches in a way that puts similar points close to each other, so far as is possible in a two-dimensional representation. Since a TWINSPAN classification entails a new one-dirnensional RA ordination at each step, the same result would be obtained if DCA ordina· tions were ~sed. This is because the order of the points on the first axis is 1 ?e same with ªDCA as with an RA ordination. Partitioning a two-dimen· ~10nal DCA ordination is yet another way of doing a divisive classification; it has been proposed and demonstrated by Gauch and Whittaker (19811 wh~ g~ve the procedure the name DCASP · The partitioning is· done subJectively p ff am 1 · ar rnns are drawn through parts of the scatter diagr . h h w ere t e data po· t hod is unlikel 10 m s can be seen to be sparse and therefore the met .. Y make "fal " d · · · '' true divisions m se ivisions. But there is a risk that sorne . n· ay escape notice D · · o-d11ne sional diagram · ata pomts may appear close m a tw ace even though th . · nal sp (for an example s F" ey are far apart in many-dunensi~ beÍJlg rnisled by drawi~geteh ig~r~ 5.3). It may be feasible to guard against t is to b e nummum spanrung . tree of the data swarm tha e Partitioned.

221

EXERCISES

5..1 The following distance matrix gives the pairwise distances between points in a swarm of 10 points in nine-space. The points are labeled · A, B, ... , J. Find the segments of the minimum spanning tree and list them with their lengths in the order in which they were found (as in the center panel of Table 5.1). Draw a diagram of the mínimum spanning tree.

A

A

B

e

D

E

F

G

H

o

1.88

2.33

2.26

1.74

2.93

3.30

10.73

o

2.54

2.97

2.05

4.00

4.52 10.89 9.09 8.78

o

3.22

1.54

4.01

4.10

11.28 9.66 9.21

o

2.68

4.51

3.46

10.01

o

3.84

3.56

10.54 9.04 8.64

o

3.37

10.99 9.00 8.74 10.44 8.96 9.07 o 3.27 3.77

B

e D E F

o

G H

I

J

8.83 8.57

8.20 8.24

o

I

3.00

o

J . # 14 is altered by putting x61 = x62 5.2. Refer to Table 5.2. If Data Matr~. of principal component seores = O, it is found that the ma nx

becomes -37.5

-1.2

Y=

2.5 -0.7

-1.3

-34.8 0.6

32.2 -12.5

1.8 0.4

9 .5 0.2

1.4

O·O

o.o o.o

13.2 -11.7 -7.0 -3.8 0.2

o.o

8.4 -7.7 -6.4

18.4 32.5 -0.3

4.3

-0.3 -0.l

-0.2

-O.O

o.o

-00 Pl . b Lefkovitch's method. ot . . of these data y Do a divisive classificati~n .

the two-dimensional ordi~at10~. of the data described in Exercise 5.'2 5.3. Carry out a divisive class1ficast1on when four classes have been d1s.' ethod. top using Noy-Meu s m tinguished.

Chapter Six

Discriminant Ordination 6.1.

INTRODUCTION

The data matrices that have been described, ordinated, and classified so far in this book have all been treated in isolation. lt has been assumed that an investigator has only one data matrix to interpret at any one time. We now suppose that several data matrices are to be interpreted jointly. lt is desired to ordinate all of them together, that is, in a common coordinate frame, and an ordination method is wanted that emphasizes as much as possible the contrasts among them. Here are several examples of the kinds of investigations in which joint ordinations are helpful.

l. Suppose one were investigating the emergent vegetation (or the benthic invertebrate fauna, or the diatom flora) of several lakes. The data would consist of several data matrices, one from each lake.

.

2. One might be sampling the insect fauna (or some taxononuc subset of it) in wheat fields in July in several successive years. Then the data would consist of several data matrices, one for each year · . . . al _ 3 O 'gh b mparing environmental cond1t1ons m sever geo . ne mi t e co b f vironmenw· hin ch region a num ero en graphically separate regions. it ea b' 0 f "quadrats" (or other tal variables are measured in each of num. er arizing conditions in 8atnpling stations) and the result is data matnx sunun

ª

ª

223

UIKRIMINANT

224

ORD¡NAlio~

. Then the total data consist of severa! data mat . that reg10n. nces, on e for each region. frequently arise in hi . I t should now be clear that situations . . . w ch ll . . ble to ordinate severa! data matnces JOmt1y. The research ts

desu a er usu 11 ow whether the separate data matrices (from dill'er ªY kn to ts wan . . w ent lakes ions or whatever 1t may be) differ from one another and ye ars , reg ' . . . . . ' may do a multivariate analys1s of vanance to judge, objectively, whether they d B independently of any statistical tests that may be done, it is clearly ad~· ut d. . anta. . . geous to be able to see, in a two- d1mens10na1 or mat10n on a sheet of paper. how the severa! sets of data are interrelated. A way of achieving this is to ordinate all the data matrices jointly by means of a discriminant ordination (Pielou, unpublished). Before the method is described, we devote a section to necessary mathematical preliminaries. 1

6.2.

UNSYMMETRIC SQUARE MATRICES

Ali ordination methods so far discussed in this book have entailed the eigenanalysis of a symmetric square matrix. A discriminant ordination requires that an unsymmetric square matrix be eigenanalysed. This section,

therefore, describes sorne of the properties of unsymmetric square matrices and shows how they differ from symmetric matrices. In all that follows, the symbols A and B are used for symmetric and unsymmetric square matrices, respecti vely.

Factorization of an Unsymmetric Square Matrix Recall that o1v . thogonal . ~ en symmetnc matrix A one can always find an or matnx u d d' ' an a iagonal matrix A such that

ª

A= U'AU.

(6.1)

As always U' denotes the trans To make the d' . pose of U. puttiJ1g iscuss1on e1earer, we now change the sym b0 Is by U':::::: V·, consequentiy U_ ' - V'. Equation (6.1) now becomes (6.2) 1

UNSYMMETRIC SQUARE MATRICES

225

Let us rearrange (6.2). Postmultiplying both ·d b V d · h f . . orthogonal, V , V = 1, lt . Is . seen that s1 es y an usmg t e act that smce V IS

AV= VAV'V = VAI or, more simply,

AV= VA.

(6.3)

The columns of V ( which are the rows of U) are the eigenvectors of A and the elements on the main diagonal of A are the eigenvalues of A'. Indeed, (6.3) is the equation that defines the eigenvalues and eigenvectors

of A. An exactly analogous equation, namely,

BW=WA

(6.4)

defines the eigenvalues and eigenvectors of the unsymmetric matrix B. As before, the elements on the diagonal of the diagonal matrix A are the eigenvalues of B and the columns of W are its eigenvectors. But in this case

B

=t=

WAW'.

This is because W is not an orthogonal matrix; in symbols, WW'

* l.

The lnverse of a Square Matrix We now ask whether, given the matrix W, another matrix, to ~e-~e?o:ed b~ w-1,

can be found such that ww - 1 =l. The answer is yes;

lS

now

as the in verse of W. . . every square matrix f1ons noted m Exerc1se 6.1' . lndeed, apart from excep f s·deration here we can , . h ept10ns rom con i has an inverse. Excludmg t e e~c sa M which may be symmetric or say that for any square .matnx, ca:i be found such that unsymmetric, another matnx, say M '

_y

MM - 1 = M-1M =l. der of the factors . . . lied by its inverse, the or -1 ' lf (Note that when a matnx is ~ultip onl if Mis orthogonal, M = M · does not matter.) Moreover if, and Y ' 1 Mis not orthogonal, M - =t= M '.

DISCRIMIN

226

AN1 O~D

l~A

l1r)~

.

d 'ving an equation of the form of (6 ) Hence m en W 1( W .2 frorn . 1 both sides of (6.4) by - not ') to get (6.4) postmultlp Y we

eww- 1 = wAw- 1 or, more simply,

B = WAw- 1 (6,)¡

·a-erence between (6.2) and (6.5) should be noted. It arise f The diw . s rom L whose columns are the e1genvectors of a symmetn· t11e V t h fact t a , e matnx . orthogonal . In contrast W, whose columns are the eigenvectors of 11s an unsy mmetn·c matrix ' is nonorthogonal. . Finding the inverse of an orthogonal matnx presents no proble . . transpose. But find.mg the mverse . has only to write down 1ts of a nonm,thone . . . fil~ onal matrix reqmres very labonous computat10ns. We do not describe the steps here. Clear expositions can be found in many books, for example Searle (1966) and Tatsuoka (1971 ). For our purposes it suffices to note tha; usually (but see Exercise 6.1) an inverse for a square matrix can be found and that most computers have a function for obtaining it. As we see in the following, finding the inverse of a nonorthogonal matrix is one of the steps in carrying out a discriminant ordination. As an illustrative example of a matrix and its inverse, suppose

M = (-; -1

4 2 3

Its ínverse is

M-1 = (

; -7

1 7

-2)

-11 .

-10

16 The reader should check that MM-1 = M-1 M = l. The Geometry of Ortho onal and Nonortho o 1 g g na Transformations

Let us n transfi

orrn

.

·

ow mvestígate the results of using a nonorthogonal JJlat[lt

another matrix.

10

uNSV MMHRIC SQUARE MATRICES 227

first, recall what happens when an orthogonal matrix is used to effect a formation. It was shown m Chapter 3 (page 94) that the effect of a data matrix by an orthogonal matrix is, in geometric terms, 10 pre otate the whole "data swarm" (the points defined by the transformed r b h · · f h · matrix) rigídly a out t e ongm o t e coordmate frame; equivalently, one think of the data swarm as fixed and the transformation as rotating the can dinate frame (see Figure 3.3, page 97). coor · by premu1Up · 1ymg · 1t · by a nonorthogNow Jet us trans f orm a data matnx

tran~ultiplying

onal matrix. As an example, let the matrix that is to be transformed be X

=(-11 11 -11 -1) -1 '

whose co1umns give the coordinates of the comers of a square with center at the origin (see Figure 6.la).

A

A

D

B

--

____.. D -

e

(a) (e) . (a) The original ·r:r nt TX in two dlllere .th a nonorthogon al matnx. . sformation of X mto ints unaltered but sforming data Wl Figure 6.1. The effect of tran d (e) show tbe tran d· (e) shows th.e po 3 3) "data swarm " a square X·' (b) an d but the polll · ts move , (Compare Figure . · ' \Vays: ( b) shows the axes un altere through the ang1es shown. lhe axes rotated, independently,

:au .

·tru c.: t a 2 / 2 ma trix T that is to be us<:d to tran~forrn X

Wc now con s . , .. L . · L· . ~ ., e forrn as tli e 11gltt-liand matnx rn cquat1on (3.7) (pag, c:t it be of lh<.i SrHn e l(JJ). hlll í ~,

let us pul

(6.6) w1.th

e12 =

O 1 and 021 . eacJi ·row of T ·sum to men ts m choose different val ues for 011 therefore T is nonorthogonaJ as A a particular example, Jet

90º

J

T _ ( cos 30º - cos 110º

90• º t 022 • (Hcnce the sguares. of the el e. • urnty.) But, unlike U m Equat1on (3 ·7) we . and 022 • Thts en sures that Tf' =f. I and 1

required.

cos 60º) = ( 0.866 cos 20º - o.342

0.500 ) 0.940 .

Then

TX

=

(-0 .37 1.28

1.37 0.60

0.37 -1 .28

-1.37) -0 .60 .

Matrix TX is plotted in Figure 6.1 in two different ways. Figure 6.lb shows the points plotted in an "ordinary" coordinate frame with the axes perpendicular to each other. As may be seen, the square in Figure 6.la has been distorted into the shape of a rhomboid, as well as being rotated; compare it with Figure 3.3b, in which the square, though rotated, is undistorted. In Figure 6.lc (which is comparable to Figure 3.3c), the square is the same shape and has the same orientation as in Figure 6.la , but the coordinate axes have been rotated. In the present case (in contrast to Figure 3.3c) the axes ha ve not been rotated as a rigid frame; instead, each axis has been rotated separately and the axes are no longer perpendicular. The angles between the new (y) axes and the old ( x) axes are shown in the figure. Exercise 6.2 invites the reader to look even more closely at the geometry of nonorthogonal transforrnations. . Of course, not all nonorthogonal matrices are of the form of T in wb.Icbd the elements of each . . riente . . row are the duect10n cosines of a new1Y 0 d coor mate axis with th · whose transpose 1s . not 1denf . e axes perpendicular. Any matnx .. · 1 ·not. mutually . d fj01U0 11 The effects of such ica. Wlth lts mverse is nonorthogonal by .e have matnces when used to transform other matnces

229

been illustrated in Figure 3 2b- e ( .1¡rea dy · page 93) Th ' sc arately) and alter the scales as well. A matrix like. . ey rotate the axes (. P,,..,es· it leaves the scale of each axis unch d T m (6.6) only rotates ange . 111e Lastly, it should. be noticed that although ' fo r convemence . the d. ·scussion deals w1th a data swarm in two-di . 1 prece mg ' 1 • • mens10na space and a 2 x 2 d 1rnnsformat10n matnx T, all the arguments can be e t l d . . h x rapo ate to as many dirnens1ons as we w1s . áft

,

•

Eigenanalysis of an Unsymmetric Square Matrix The last matter to consider in this section on mathematical preliminaries is the eigenanalyses of unsymmetric square matrices. The eigenvalues and eigenvectors of such a matrix, say B, are found b solving the set of equations implicit in Equation (6.4), namely, Y BW =WA.

If Bis an n X n matrix, then there are n eigenvalues (the elements on the main diagonal of A) and n eigenvectors (the columns of W). There are various ways of solving (6.4) but they are not described in this book. The principles are fully explained and illustrated in, for example, Searle (1966) and Tatsuoka (1971); applying them, except in artificially simple cases, entails heavy computations. Here we merely give a simple example to illustrate the results of such an eigenanalysis.

Suppose B

= (

~~ ~~

\ 41 .

Then th e equat10n

59

-148) -42 . -92

BW - WA is satisfied (as the reader should confirm) by -

putting

A=(~

o 2

o

J)

and W = (-;

-1

. a·agonal of A are the eigenvalt on the mam i It follows that the e1emen 5 . 1 to the eigenvectors of B; ues of B. The columns of W are proport10na

DISCIUMH4AN 1 230

(}fU)INA

11( )~~

. . still satisfied if the elements in any 1 ation is co urn notice that the equ t rnultiple of the values shown. To n n cil w constan . <>rrnar· b are replaced Y h rnes to the same thmg, to put their ele . '~e lhc . wluc co . . rncn t~ eigenvectors or, . it is necessary to d1v1de through each . ' 1n lhl . colurn . tion cosines, form of duec f the sum of squares of 1ts elements. Jn th n CJI h are root o e c;xarnp1 b W y t e squ of W is, therefore, L, the normalized form

ª.

0.5345 -0.8018 ( -0.2673

0.7428 0.3714 0.5571

0.8018) 0.2673 . 0.5345

6.3. DISCRIMINANT ORDINATION OF SEVERAL SETS OF DATA N ow that the groundwork has been laid, we consider how several sets of data may be simultaneously ordinated in a common coordinate frame in

such a way as to separate the different swarms of points as widely as possible. The method is described here in recipe form because the underlying theory is beyond the scope of this book. Theoretical accounts may be found in, for example, Tatsuoka (1971) and Pielou (1977). To illustrate the method, it is applied to real data. The data consist of values of 4 climatic variables observed at 14 weather stations in 3 geo· graphic regions. The purpose of the analysis is to ordinate the stations on the basis of their climates. In detail, the data are as follows. The locations of the weather stations are shown · p igure · . . on the map m 6.2, and the place names appear ascolurnn hea~mgs ~ Table 6.1. The three regions are: 1 the southern part of Yukon Terntory m th e d' ' · the e ana tan boreal forest · 2 northern Alberta, also 1I1 . boreal forest b t t 1 ' ' e diaD rairi Th ~ a. a ower latitude; 3, southern Alberta in the ana P es. e climatic v · bl 1 The d . ana es are listed in a footnote to the tab e. ata, which could b . . one for each reo1 0 h e wntten out as three separate matnces, ;" b'" n, ave thus b b watrJJ'' hereafter called X h ~en rought together as the single large t ¡be , s own m T bl 6 . epara e hr t ee regions Th lim . e .1. The dashed vertical lines 5 . ...,5) hr . e e atic ob . ID rol'l t OUgh 6 Of the m t · Servations for each Station are shOWJl nrid z. a nx. 1t remains · · ws 1 a.v to explain the elements 1.l1 ro

ª

5

10~1MINAN T ORDI NATION

or SlVLR Al

. SLIS ' f>I l)t\IA

:n1

/

ALAS KA

I

.....

)

YU KO N )

o

(

' )

I

(

N.W T

\

-,_ • I I I

BRITI SH COLUMBIA

•

• •

• I

I

I I I

ALBERTA /

- -- -:i::'.egion '

6.2.

Map showing the locafions ol the 14 weather stalions lisled in Table 6.1. Stations 1 (Yukon) O; stat10ns m Region 2 (northem Alberta) • ; statioos io Region 3

(southern Alberta) ® .

These are "dummy" variables which show to which of the three regions each of the stations belongs. As may be seen, every station has two dummy variables associated with it, x 1 and x 2 . They are assigned as follows. for all stations in Region 1 for all stations in Region 2 for all stations in Region 3.

TABLE6.l. DATA MATRIX GIVING THE VALUES OF 4 CLIMATIC VARIABLES AT 14 STATIONS IN 3 REGIONS.ª·b

Region 2 Northern Alberta

Region 1 Yukon

e

.¡ rl) rl)

~ u

e o

rl)

~

Q

,.J

o

j¡I-,

e

ofil

= = = ~

= ~ ~

(J

rl)

,,Q

e

ofil

a t:o

j¡I-,

= t: = ~ ;j

~

= ~

~

<

~

~

~

~

t:o

= ~ =

,,Q

=

~

Xl

1

1

1

1

o

o

o

o

o

X2

o

o

o

o

-18.9 12.8 11.5 11.3

-29.4 15.6 13.8 18.3

-25.0 14.4 10.1 18.4

-20.0 14.4 17.2 23.0

1 -17.2 15.6 14.7 31.9

1 -12.8 15.6 13.1 34.2

1 -25.0 16.7 11.5 20.4

1 -22.8 16.1 14.8 30.1

1 -18.9 15.6 10.3 31.8

X3 X4 X5

x6

Region 3 Southern Alberta

1 1 1 1 1 1

~

Q.l

1

~

1

o

1 1

~ 1 1 1 1

1 1 1 1 1 1 1 1

1 1

~

·e:

o

~(J

~

·o

o

t:o

Q.l

~

-

~

o o

o o

o o

-11 .1 20.0 1.1 29.1

-8.9 17.8 11.9

-8.3 18.3

26.2

26.6

~

o

~

~

-= = Q.l

~

e

1

Oll

11.5

e

:e ~

o o

e

e

~

~

o o

- 11.1 -8.3 20.6 18.9 9.8 12.9 22.8 26.2

ªData from "Climatic Summaries for Selected Meteorological Stations in the Dominion of Canada, Volume I." Meteorological Division, Department of Transport, Canada, Toronto, 1948. b x 1 and x 2 are dummy variables; see text; x 3 and x 4 are daily mean temperatures in degrees C in January and July, re pectively: x 5 a.nd x 6 are precipitation in cm for October to March and April to September, respective1y.

isCRIMINANT ORDINATION OF SEVERAL O SETS OF DATA

233

In. general, if there were k reg·ions, k _ d . 1 equired to label all the stations Th ummy vanables would b · ey would be e r (l,O, ... ,o)

\º.' ~ ~ ..' :'. ~)

forall st ations · in Region 1 for all stations in Region 2

( 0 , 0 , .. ., 1)· ·

·f~; ~il ~t~~i~~~ in. R.~~~~ ·k :__ ·i

(O' O' ... ' O)

for all stations in Region k'

with k - 1 elements in each vector. . . Thus, when there are n stations altogether grouped mto . k reg10ns and s vanables are observed at each station ' X h as s + k - 1 rows 'and n columns. In the present case with s = 4 , k = 3, and n -- 14, x 1s · a (6 X 14) . matnx. The operations to be carried out on matrix X are now described in numbered paragraphs. l. Center and standardize the data as described earlier. That is, replace the (i, j)th element of X, x; 1 , by (xiJ - x¡)/a; where X; and a; are the mean and standard deviation of all the elements in the ith row. (Note: it makes no difference to the result whether the dummy variables are standardized. In the computations shown in Table 6.2 they are standardized. Every row must

be centered.) 2. Postmultiply the matrix by its transpose to obtain the SSCP matrix S. Sis shown in Table 6.2. 3. Partition S into four submatrices Sw Sw Sw and S22 as shown by the dashed lines. The four parts into which S has been divided are as foil . S · (k _ 1) x (k - 1) = 2 x 2 matri.x giving the sums of OWS. · · ummy variables x 1 and x 2 ; S22 1s 11 lS a squares and cross-products o f t h e two d f th · · · the sums of squares and cross-products o es X s = 4 x 4 matnx givmg . S is a (k _ 1) x s = 2 x 4 the ~bserved variables X3, X4, X5, and x~, ro~ucts formed by multiplying matnx whose elements are all sums of cross P d · bl . s = S' one of the dummy variables by one of the observe vana es, 21 12. (See Exercise 6.3.) l 4. Obtain the inverses of Sn and Sw name y, Written out in full in Table 6.2.

5-1 11

and

s-221 . They are

STEPS IN THE DISCRIMINANT ORDINATION OF THE DATA IN TABLE 6.1.

TABLE 6.2.

The SSCP matrix Sis 14.00

- 6.60

1

-

8.34

- 9.41

3.54

-10.44

-~~j~ __ }~~~J--~~~~--=--~~~---~~~---JY~ 1

- 8.34 - 9.41

- 3.66 - 3.28

1 1

3.54 3.38 1 -10.44 7.88 1 The inverses of S 11 and S 22 are 11 =(º·º9184 0.04329

s-1

-

0.04329)· 0.09184 '

14.00 9.05

9.05 14.00

- 4.22 - 7.20

5.89 4.78

4.22 5.89

- 7.20 4.78

14.00 -0.41

- 0.41 14.00

0.13303 -0.01552 0.00038 -0.03014

s-1 _

22

-

r

The product matrix D is

D

-1

=

-1

S22 S21 S11 S12

The eigenvalues of D are ;\ 1 = O. 96396 and

=

0.40734 0.53979 - 0.00263 -0.05904

r

-0.07552 0.15650 0.05717 -0.02004

0.41975 0.57768 -0.00195 -0.02255

-0.25893 -0.29694 0.00330 0.11999

;\ 2 = 0.59832.

The first two eigenvectors of D (normalized) are the rows of W' -(-0.61003 2 ( ) -0.35471

-0.78007

0.00494

0.13902)

0.02458

0.01981

0.93444 .

0.00038 0.05717 0.10047 -0.01677 -0.07804) 0.21171 0.01165 . 0.57397

-0.03014) -0.02004 -0.01677 . 0.09046

oisCRIMINANT ORDINATION OF SEVERAL SETS OF DATA

235

s.

Find the matrix product D defined as

n -- s-22 1s21 s-111s12 . Dis shown in Table 6.2. 6. Do an eigenanalysis of the unsymmetric square matrix D. The results are shown in Table 6.2. The number of nonzero eigenvalues is always the Jesser of s (the number of variables of interest) and k - 1 (where k is the number of groups of data). Hence in the example in which s = 4 and k - 1 = 2 there are only two nonzero eigenvalues ,\ and ,\ , and they are 1 2 shown in the table. Only the two eigenvectors corresponding to the two Jargest eigenvalues of D are required for a two-dimensional ordination. (In the present case, of course, the two largest eigenvalues are the only nonzero eigenvalues.) These eigenvectors are shown as the rows of the 2 X s = 2 X 4 matrix Wá). . 7. The required coordinates for the data pomts are given by the columns of the 2 X n = 2 X 14 matrix Y, defined as

in which X 4 is the s X n = 4 X 14· matrix obtained by deleting. the ( ) · bles from the centered and standardized k - 1 = 2 rows of dummy vana data matrix. . Figure . 6·3ª· For comparison, a PCA ordinaThe ordination is shown m · of the same d ata is · o-P1ven in Figure .6.3b. hon ch more clearly d'"' 111erent'1As may be seen, the three sets of pomts arPeCmAu The only outlier in the d' · than by · 1 ated by discriminan! or mat10n Alb t station that seems to be ong discrirninant ordination is the northern h er ~h its own group. This outlier . n s more · t ers With the Yukon group of sta t10 . t an bl Wl it has the col dest wm . b seen m Ta e 6.1' IS Fort Chipewyan; as may e Alberta group. . . \ anct driest summers of the northem a· tion is that it penmts the difThe advantage of diseriminant or dma·th maximum e1an·tY· The process ferences among data sets to be displayed w1 es) for several batches of data fi seor shall be as compact ' and as nds new coordinates (i.e., trans formeh batch p. s that eac h s as possible. ºlllts in a way that ensure 'Nidely separated from the other bate e '

236

(a)

•••• ®

(\J

®

•

®®

(f)

X

o

o o

®

o AXIS

(b)

• •

•

o

•

®

(\J

®®

(f)

-

X
•

o o

o

®

AXIS

1

Fim•re 6 3

Two ordinations of the 14 weather stat10ns: ' s · (a) a·1scnmman · · t ordinatiw(b) PCA 'to~ • • (centered and standardized). The symbols for the three reg1ons · are the same a ordination Figure 6.2. lil.

The number of ways that now exist for classifymg and ordinating · . ecological data is already large. No doubt the invent10n of new . ' more has ingenious techniques will continue. It could be argued that the t!Dl; lf come for calling a halt to the endless proliferation of new _metho ~le ecologists are ever to lit their individual contributions together mto 5 gd · · knowledge, it seems desirable that few gootly um·11 ed body of sc1en1Jfic methods of data ana!ysis should be adopted widely and used consisten and that unproven methods should be consigned to the scrap-heap. re At . h a mo tractJve t ough this argument may be, it collapses before ret· persuas1ve counterargument. This is that the development of data interp

ª ª .

¡XI t< ISI S

ing ml'lhods is an int., cgr"l. 1 part

f

°

237

.

11nd s1lOll 1l t not come t . 11 sc1cntific · pr gress. and ' as such will w1·11• 1o· ' fo11 ow d by nnt ,ª . alt: Ev ery 1mpr vei • e e 1ing un nent in e ' not 1 w11on, and 1f th interp . pr vements ¡11 t 1 . omputer capabilif rnttlt1on f ec 111iques f 1es . d un mm s of ecol gists th ec 1ogical data . . or data interpreJ 1\l g ncral principie llt;d musl f amiliarize Lh ts lo remain in the hands er ymg meth ds f d· emselves thoroughly w·th 1

?-

ata handling.

EXERCISES following three 3 X 3 mv r e . matrices H 1, H 2, and H3 do not have

6.1. :h

n fr 1

H,=

2

H2 = (

3

H, = (

~

-3

1 2

-6

~ -1

2

o -2

-i); -3

-n

Use each in turn to multiply the 3 X 8 matrix X (which

cube) where

represents a

Examine the products H 1 X, H 2 X, and H 3 X and determine the dimensionality of the figure into which tbe cube has been transformed in each case. What relationship does this suggest between matrices that cannot be inverted and the transformations that such matrices bring about?

6·2. Construct a diagram like that in Figure 3.4 (page 98) but with the Y1 and rraxes not perpendicular to each other. Let the angles between the o1d an d ew axes /J , /J , 821 , an d 822 be defined as on page 100. 12 11 11 Denve equations analogous to (3.4a) and (3.4b) on page 99.

DISCRIMINANT

238

e x 3 matrix X is partitioned as shown 6.3. Suppose th 6 . denoted by X1 and X2. matnces Xu

X12

X13

Xz¡

Xzz

X23

---------

X=

X31

X32

X33

X41

X42

X43

X5¡

X52

X53

X61

X62

x63

O~DINl\110~

1"\t 1~' o tWQ

SUb-

(~:)

Write out t~~ _6_x -6 SSCP matrix XX' in full, and show how it can be partitioned-into four submatrices that are identical with those in the product

(Keep track of the sizes of the various submatrices and their products.) Note that the multiplication of partitioned matrices is carried out according to the same rules as ordinary matrix multiplication except that submatrices take the place of individual elements.

Answers to Exercises CHAPTER 2 .t. (a) 10.15; (b) 8.49; (e) 10.10.

2.2.

Step

Fusion

"Farthest Points"

Distance Between Clusters

1

1, 3

1, 3

4.58

2

[1, 3], 2

2, 3

8.49

3

4, 5

4, 5

9.17

2.3. The coordinates of the centroids are: Cluster [1, 2]

Cluster [3, 4, 5]

5.5

0.333

o

1.667

2

2.667

-3

1.667

them is 7.190. ) The distance between lt is independent of P· 2.4. d1([P], [M, N]) = 554.125. (Note: The resu

2.5. 173.2. 239

240

. (b) 1.297; (e) 1.297 radians = 74.3º . 2.6. (a) 0.2 705 ' 3 S _ 6. ( ) J _ i s = i. (b) J = 4, - 7, e - S = 1. 2. 7. (a) J = 3, 12 ' Proof that S ~ :

S = 2a/(2a + b + e) = 2a/(2a + f) on putting b + e : : : f; J=a/(a+f). S/l = (2a

+ 2/)/(2a + f) >

1 when f > O or

=

1 wh

en f:::: O.

CHAPTER 3 3.1.

- 1 3

(d)

1

~

23

(f)

(- 6819 78

3.2.

4

o

o 4

-

-2

11

3)· 3 '

(b)

~).

22 - 14 24

(e) cannot be formed;

12

(e) cannot be formed;

- 3 ' - 1

10

(

- 7

-8) - 3

[ 14 (g) 30

- 10

20 124

3 1 7 13

o 8

-4 20

3 9 3 33

[Note: To form the r0 d BC by A B P uct BCA, for example one may postmultiply or by CA.] '

-1). 3 '

u2 = (

-1 -1). - 2 2 '

3.3. U2 is Orthogonal. U . ' i is not 3.4. L XetXX'=A.Thena .. ist~ of 11 . and the Jth col e sum of cross-products of the ith row is the sum of eros ~mn of X' (which is the jth row of X). Likewise, ap X' ( hi . s Product 0 f . ...,11 of w ch is the ·th s the J th row of X and the i th colu11u· z row of X) H . · ence a l.j. = aji.. for all i, J.

241

.5. ( _¿.5 5

.6. The eigenvalues of A are 2 5

= 32

-0.5) 1 . and 3s

=

243 ·

Thi s follows from:

As = (U'AU)(U1\.U)(U'AU)(U1\.U)(U'AU)

= U'A(UU')A(UU')A(UU')A(UU')AU = U'A5U since UU' =l. 5

Hence the eigenvalues of A are the eigenvalues of A raised to the fifth power.

( 0.6733 0.5858 0.4242 -0.1347 .8. .\ 2 = 2.55 [from tr(S) = L¡AJ

.10. .\ 2 is the same for G and F. Hence .\ 2 tor of F is

-0.0741 ). 45.285. The second eigenvec-

=

Then the second eigenvector of G, namely, v{, is proportional to u'2X. Hence v{ = ( 0.61 -0.78 -0.14 ).

CHAPTER 4 U. The covariance matrix is (l/n )R with n .\ 3 =

=

8. Hence Ai

=

36; A.2 = 25;

16.

U. Tr(XX')}

is the sum of squares of the distances of ali points in the swarm from the

Tr(YY ')

origin of the

{ x-coordinate framt y-coordinate framt:

. ·d e at the centroid d. ate frames comc1 . Since the orio1ns of the two coor m hi 1·1 follows easily that er , _ (YY'). [From t s of the swarm, tr(XX ) - tr (l/n)tr(XX') = L¡A¡; see page 126.]

242

4.3. Let the

2 X 2 correlation matrix be

p

=

(~

Let the matrix of eigenvectors be

U= (

C?S0

- smO

sinO)

coso .

Then since UPU' = A, it follows that the (1, 2)th element of UPU' is O. That is, p(cos20 - sin28) = O. Hence cos O= ±sin O;

o= 45°.

Therefore the eigenvectors are: ( cos O

sin O) = ( 0.7071

0.7071)

and ( - sin O cos O) = ( -0.7071 4.4.

0.7071 ).

(a) 10.6º; (b) 25.0º; (e) 55.9º; (d) 45º.

4.5. 0.816. 4.7. The ordinated data form a triangle the lengths of whose sides are: d(A,B)=4; 4.9.

d(B,C)=3;

d(A, C) = 7.

The following equations are numbered to correspond with those in tbe text, except that primes have been added here.

w = c-1x,R-1xw c112w

=

c-1;2(X'R-1X)(c -112c112)w

(4.22') (4.24')

= (c- 112x,R-1;2 )(R-1;2 xc-1;2 )(c112w) (4.25') =

Q( c112w).

(4.26')

243

Therefore. "

'Cl

1

- = W'Cl/ 2Q

whence

(4.27')

wc1, 2 a: u where UQ is the n

( 4.28')

Q

n matrix of e1genvectors of

w a:

Q Th ·

f ere ore,

uQc-112.

( 4.29')

uo. -12.99

Y=

o o o o

-4.33

o o o o

4.33

o o o o

12.99

o o o o

Ibis follows from the fact that the data points líe on a straight line in 5se-space (i.e .. they are confined to a one-dimensional subspace of the 5xe-space ). The PCA places the first principal aus on this line; therefore. the coordinates of the points can be found by determining the distances separating them. An eigenanalysis of the covariance matrix would show that it has only one nonzero eigenvalue. The number of nonzero eigenvalues of a covariance matrix, which is equal to the number of dimensions of the subspace in which the data points lie, is known as the rank of the covariance matrix. In this example, the rank is l.

CHAPTER 5 5.I. Th e segments of t h e

· · m spanninP0 tree are:

ffilillIDU

1: d(C. E)

=

1.54:

2: d(A. E)= 1.7 4 :

4: d(D. A)

=

2.26:

5: d(F.A) = 2.93:

7: d(l. D)

=

8.20;

8: d(J. I) = 3.00;

3: d(B, A)= 1.88; 6: d(G. A)= 3.30;

9: d(H. I)

=

3.27.

G

F

D

J

B

E

e

G

~ \

~

F

11

'11

\1

I

H

~-

---'f-- - 1

1

b

\1

1

------

-----

D_~A

--~ ~

E \--.C

J

• B

One of the many possible diagrammatic representations of the minimum spanning tree is shown in the figure (upper panel). Compare it with the two-dimensional ordination of the data, with the mínimum spanning tree superimposed, in the lower panel. The lower panel is a reproduction of Figure 5.3 (page 210) with the points labeled to match. The distance matrix in the exercise was taken from Gower and Ross (1969).

5.2. Successive divisions give the folloWiflg classes: 1: (A, B) (C, D, E, F);

2: (A) (B) (C, D, E) (F);

3: (A) (B) (C) (D, E) (F); 4 : (A) (B) (C) (D) (E) (F).

245

N atice that A and B are sep . f . arated at the . .second step eve th still orm a close pau in th e or~mation pattem whin h ~ugh they indistinguishable from the pattern m F c is almost . separat10n occurs because th igure 5.4a. This " ,, dB e second princi 1 . unnatural an . pa axis passes between A

5.3. Successive divisions give the f 0 11owmg . classes: 1: (A, B) (C, D, E, F);

2: (A, B) (C, D, E) (F); 3: (A, B) (C) (D, E) (F). Observe that the close pair (A, B) has not been divided.

CHAPTER 6 6.1. The points of H 1 X form a straight line, hence form a figure of one dimension. The points of H 2 X (and also of H 3 X) are confined to a plane, and hence form a figure of two dimensions. This leads to the conjecture that a matrix can be inverted only if it brings about no reduction in the dimensionality of a swarm of points when it is used to transform the swarm. Proof of the correctness of this conjecture, and methods for judging whether a given matrix can be inverted, are beyond the scope of this book. See, for example, Searle (1966), Chapter 4, or Tatsuoka (1971), Chapter 5. Matrices that can,. and cannot, be inverted are called, respectively, "nonsingular" and "smgular." 6·2· The equations are

Y1

= X¡COS 011

Y2 =

X¡COS 021

+

X2COS

012

+ X2COS 022.

Glossary (Words in italics are defined elsewhere m . the glossary.)

Agglomerative classification. Same as clustering. Arch effect.

The appearan ce o f a proJected . data swarm as a curve (" arch") when. the ~ata were obtained from sampling units ranged along a one-drmens10nal gradien t.

Asymmetry, coeflicient of.

A measure of the degree to which an ordination axis approaches unipolarity (see unipolar axis).

Average distance between two clusters. The arithmetic average of all the distances between a point in one cluster and a point in the other.

Average Iinkage clustering.

Collecti ve term for all clustering methods in which the dístance between two clusters depends on the locations of all points in both clusters. Contrast nearest-neighbor and farthest-neighbor clustering.

Average Jinkage clustering criterion. The dissimilarity between clusters [P] and [Q], where [Q] is formed by the fusion of clusters [M] and [N]. Four measures of this dissimilarity are: Centroid distan ce, trom the centroid of [P] to the centroid of [Q]. . ·a f [P] to the midpoint of the lme Median distance, from the centro1 o joining the centroids of . [M] and [N]. the average distance between [P] Unweighted average dzstance,

and [Q].

same as 247

qOssA~v

248

·stance the average of the average dist average d1 b anee b ' · d Weighte M] and the average d1stance etween [P] anct [N e. n [P] and [ · xhib. ]. twee •ty The heterogene1ty e ited by a d t beterogene• · ª a sw Between-axes more subswarms confined (or almost confi ar117 . t'ng of two or . . nect) t cons1s t of the total space contammg the whole o ·i:rerent subspaces . swarm d111' . h. -axes heterogenetty. . Contrast wtt zn ·sting entirely of. zeros and. ones. . d ta Data consl Bmary ª · = { 1 if speci~s i is present m quadrat }, X¡; o otherw1se. . d'nation axis on which the data points have sofue B. 1 r axis An or l ipo pos1t1ve ª . . ·an d some negative seores. Contrast unipolar axis. "l

. A n ordination designed to show, as clearly as possible' the Catenat1on. structure of nonlinear data. Centered data. Da ta in which the observations . . are expressed as deviations from their mean value. Hence their sum 1s zero.

Centroid. The center of gravity (or "average point") of a swarm (or cluster) of points in a space of any number of dimensions. The coordinate of the centroid on each axis is the mean of the coordinates on that axis of all the points in the swarm (or cluster). Centroid clustering. A clustering technique in which the distance (or dissimilarity) between two clusters is set equal to the distance (or dissimilarity) between their centroids. Centroid distance. See average linkage clustering criterion. Cbaining. In a clustering process, the tendency for one cluster to grow by the repeated addition, one at a time, of single points. Characteristic values or roots. Same as eigenvalues. See eigenanalysis. Characteristic vector. Same as eigenvector: See eigenanalysis. Chord distance. The shortest (straight line) distance between two points on the same circle, sphere, or hypersphere. City-block distance CD Th · a· te ' · e d1stance between two points, in a coor roa frame of any numb 0 f a· · nts er imens10ns, measured as the sum of segme . parallel w1th the axes p . . · or pomts J and k, s

L

CD = lxiJ - xikl · Cluste · T i=l rmg. he process of 1 · . · · ·milaJ Points to form e assifymg data points by combmmg 51 er 11 sma classes, then combining small classes into Jarg

249

classes, and , . el ;r· ;¡:; so. on. Samc as av
vector. A matrix with onl y . coJurno written as a vertical column.

one colum 11 E • -
rnplete-linkage clustering. Sam e as 1arthest-neighbo r ·/ . C0 · · l l t · r e ustenng cornbtnatona e us ermg methods. Th ose , .ll1 wh1ch . each .· matrix can be constructed from the d' . success1ve distance t· h prece mg d1stance ma nx; t e raw data are need e d only to con truct the r.i1rsl d'islance matrix . .obtai d b correlation . . . coefficient. h . A tandardized f orm of covanance d1v1dmg t e covanance of two variables s d ne ofY · · ' ay x an Y, by the product . m . [ -1 1] It the stand ar d d eviat1ons of x and y · lts val ue a1ways 1ies measures the degree to which x and y are related. ' ·

Correlation ma~ix~ A symmetric matrix in which the (h , i)th element, when h =I= z, is the correlation coefficient between the h th and ¡ th variables. All the elements on the main diagonal (top left to bottom right) are l. Covariance. When two variables, say x and y, are measured on each of a number of sampling units, their covariance is the mean of the crossproducts of the centered data. The ith cross-product is (x¡ - x)(y¡ - Y) where

x and y

are the means of the xs and ys.

Covariance matrix. A symmetric matrix in which the (i , í)th element is the variance of the ith variable, and the (h, i)th element is the covariance of the h th and i th variables. Czekanowski's Index of Similarity.

Same as percentage similarity.

t~= ::i:~sv~;

Data matrix. A numerical table in which each columhn listlis. a:: . . · ( adrat) and eac row s tions on one samplmg unzt or qu ' one of the observed variables in all quadrats. . . multidimensional space of one Data point. A geometric representat10n 1ll column of a data matrix. . upies a space of · which usua11Yocc Data swarm. The set of all data pomts many dimensions. . . tionships produced . the hierarchica1 re1a Dendrogram. A diagram showing nts except those on by a hierarchical classiftcation. . . which all e1eme Diagonal matrix. A square matnx 1ll · gbt) are zero. the main diagonal (top left to bottolll n

.

. . Th co mes of tJ e angl es madc by o rnes. e h . f th ec ·o 1 _ d' t fram e and t e axes o e . . f a coor ma e ong1n o f h pr ~ection s onto the axes of a thc Jengths o t e line.

lhc

A matrtx showing the distance from cach point to e . vcry other point in a data swarm •.. 'fi ti'on. The process of classifying data points by fir&t divid t ' ' e e1as l ca h d. . . ing the whole swarm of points into classes, t en re 1v1dmg &orne C>r al! ·nto subclasses and so on. Contrast clustenng of the e elass es l ' · • • The process of fi.nding the eigenvalue- eigenvector pairs of a E1genana1 1 • guare matnx A. The eigenvalues are the elements of .the diagonal matríx A and the eigenvectors are the rows of U (eqmvalently, the columns of V') where A= U'AU. i tan e ma

.

any linc lhrou1r~ f , &1 lhc ramc. E{luiv· . , u1cnu umt segmt.nt 0 f y,

.

E"gen ·alue-eigenvector pair of a matrix A .. An eigenvalue (or "eigencalar"'J-e1genvector pair of A are, respect1vely, a scalar number 'A and a row vector u' related by the equation u' A = A.u'. If A is an n x n matrix. there are n such pairs. See also eigenanalysis. Element of a matrix. One of the individual numbers composing a matrix. Tbe (i , J )th element is the number in the ith row and the jth column

of the matrix. Euclidean distance. The distance between two poin ts in the ordinary sense in one, two, or three dimensions, or the conceptual analogue of distance in spaces of more than three dimensions. farthest-neighbor clustering. Clustering in which the distance (dissimilarity! between two clusters is taken to be the longest distance between a

parr of points with one member of the pair in eaph cluster. Contrast nearest-neighbor clustering. Geodesic metric· Th e great cuc · le d1stance · (shortest over-the-surface d'is· tance) between tw ·

o pomts on a sphere or hypersphere.

Group-average clust · . .h ermg methods. Cluster;ng methods that use the un ltetg ted or weighted · · · ·1 ·ly [

r

r .

o a pa1r of clusters.

average dzstance as measures of the d1ssll1ll aI1

Hierarchical classificat. . d Every indi' ·d bion. A classification in which the classes are rank.e · vi ua1 elongs t0 n]dJlg

class, up to the highe

ª

cl~ss, and every class to a high~r-ra t class which is the totali ty of all indiv1duals.

251

orsesboe effect. Same as arch effect. entity matrix. A square matrix in which all th

¡ diagon al ( top left to bottom right) e e ements on the main are ones and all h zeros. ot er elements are

ternode. See node. verse of a square matrix. The inverse of an n 1 roatrix

x-

SUCh that

. X. h xx-1 = x-ix = l. (1 XIS. nthematidentity . rzx IS t en X n matrix )

accard's Index t The ratio . a/(a ·+ f) · ofh Similarity between two quad ras. where a is t e number . of species comm on t b oth quadrats and f is the numb er of spec1es present in one or other (b ut not both) of the

°

quadrats. Compare S(í}rensen's Index.

Latent values or roots.

Same as eigenvalues. See eigenvalue-eigenvector

pair.

Latent vector. Same as eigenvector. See eigenvalue-eigenvector pair. Linear data. A data swarm is (approximately) linear if its projection onto any two-dimensional space, however oriented, gives a two-dimensional swarm whose long axis is (approximately) a straight line. If any projection yields a (projected) swarm with a curved axis, then the data are non linear. Linear transformation. A transformation of one set of points into another done by defining the coordinates of the transformed points as linear functions of their coordinates befare transformat10n. The ongmal · h fi d e· e they are never squared coordinates appear only ID t e rst egree i. ., or raised to a higher power). Manhattan metric. Same as city-block distance. . MS A measure of the dissimilarity of Marczewski-Steinhaus d1stance, · d' 1 dex of Similarity. MS = two quadrats, the complement of J afccar ~es npresent in only one (not . the number o specI h f /(a + /) where f is . umber of species coromon to bot both) of the quadrats, and ª IS the n of them. Th meaning of each num. 1 ay of numbers. e . h Matrix. A two-dimensiona arr . . . in the matrix, that is, on t e ds on its pos1t10n ber (or element) d epen . occurs. . which It . A row and the coluillil in d et AB of two roatnces . n of the pro u d Matrix multiplication. The formatl~B is the sum of the paifWise pro ucts and B. The (i, j)th elernent of

GLossl\~y

252

· the ¡ th row of A and the jth column of B. Ben f h Iernents m . ce AB o t ee ·r the number of columns in A is equal to the nurnbe ·sts only i A d r of eXI . . AB has the same num ber of rows as an the same nurnb rows m B, s B· AB is A postm ultiplied by B or, equivalently eBr of colurnns •n/ied by A. In general, AB -=fo BA. premult Ir

ª '

,

. g method A clustering method that uses the medi Median clustenn · . . . . an . b etween clusters as a d1ssurulanty measure. dzstance . d'is t ance . See average-linkage clustering criterion. Median Metric measures of dissimilarity. triangle inequality axiom.

Measures th at, like distance, satisfy the

Mínimum spanning tree. The shortest spanning tree that can be constructed in a given swarm of points. Mínimum variance clustering. Clustering in which the two clusters united at each step are those whose fusion brings about the smallest possible increase in within-cluster dispersion. Monotonic clustering methods. sals is impossible.

Methods in which the occurrence of rever-

Nearest-neighbor clustering. Clustering in which the distance (dissimilarity) between two clusters is taken to be the shortest distance between a pair of points with one member of the pair in each cluster. Contrast f arthest-neighbor clustering. Nodes and intemodes. The parts of a dendrogram. The nodes are the horizontal lines linking classes of equal rank. The internodes are the vertical lines linking each class to the classes above and below it in rank. Nonlinear data.

See linear data.

Normalized data. The coordinates of a data point or the elements of a vector rescaled so that their squares sum to unity. Ordination. The ord · f enng o a set of data points with respect to one or more axes. Alt~rnatively, the displaying of a swarm of data points inª two or three-d1mens. 1 . hi wna coordmate frame so as to make the relations ps among the points · . . . . . _ m many-d1mens10nal space v1Slble on mspect ion. Ordination-space partiti . . . . swarm of d _om~g. The placmg of partitions in an ordmated ata poznts m ord classes. The result . . .. er to separate the points into groups or is a dwiswe classi.fication.

253

ogonal matrix. A square rnatrix that h ·· . ' w en used as a lransformation causes a ng1d rotation of the d t a a swarm arou d 1 . f . the coor d mate rame without any cha f n t 1e origin of · . nge o scale Th d orthogonal matnx and its transpose is th .d . · e pro uct of an . e l ent1ty matrix. rtitioned matr1x. A matrix that has been s bd .. d . a u iv1 ed mto sub 1 · placing one or more horizontal (between-row) .. marices by · 1 (b partit1ons and/or one more vertica etween-column) partitions so d. . or as to iv1de the mat nx · into rectangular blocks.

ortbrnatrix,

Same as percentage dissimilarity.

percentage difference.

percentage dissimilarity, PD. PD = 100 - PS.

The complement of percentage similarity PS.

Same as percentage dissimilarity.

Percentage distance.

Percentage remoteness, PR (new term]. The complement of RuZil:ka's Jndex of Similarity RI. PR = 100 - RI. Percentage similarity.

The percentage similarity of quadrats j and k is

S = 200

s

L i=l

min(xi}' X¡k)

+

Xij

percent

X¡k

are the quantities of species i in quadrats j and k, . · . '1 . the lesser of the two quant1t1es. and IIlln( xi 1 , x ¡ k) is . d ts from a homogeneous b h 0 f rephcate qua ra Pool (in this book]. A ate h are due only to chance. . ·as among t . em populat10n. Diuerence Postmultiply. See matrix multiplica~wn. · ultiplicatwn. ·t Premultiply. See matrlX m for a swarrn of data pom s, coordinate axes . of the data. Each Principal axes. The new . . cornponent analysis 1 . d b doing a pnnc1pa f the data. obtame Y . . ¡ component o . . component . t a przncipa b a pnnc1pa1 axis represen s . bles derived Y . ht d sum of the New vana h is a we1g e Principal components.. a bodY of dat~. Eac r of the centered and/or analysis to describe red) variables, o . . ll rneasu "raw" (as angina y .ponent for an . bl s . ipal com d standardized vana e . The value of a pnncoint on the correspon t score. d. ate of the p Principal componen the coor in . t J-Ience individual poin · ing principal axis. where x. . an d

. X ,k

t 111, I

254

'

I•¡

. A ordinatwn of species. Thc dutu /" 11111 1 . tiOfl n d. . 1111 d Q-type or ma · dinate axes (before or mat1c n) r prt 111 / . d the coor . . 1u1r1t,, / 1 spec1es an of the covariance matnx analyzcd 1s th , 11 / . k )th e/emen . . The (J, . . ( f all species) in quadrats .J and /, 'on1u_1.,'''"'''"' 1 / ly¡11' of the quant1ttes o

ordination.

. . . b k an ecolog1cal samplmg umt ol any ktnd.

Quadrat. In this oo '

An ordination of quadrats. The more usual 1,11 n . . • d d 1 )f . . Th data points represent qua rats an t11c coo 011 Jtc, ordmat10n. e . . 1111. The ( h, 1 )th e/emem (be fore ordl.nation) represent spec1es. . . . . ( 1 1Ji "'''ª' 1 · analyzed is the covariance of. the. quant1t1cs (tn '111 qu;
R-type ordmation.

The n~mber of its non.zcro <'t?,rw;a/ur-" Equívalently, the number of dimens10ns of the spacc in wh1ch thc do.to

Rank of a covariance matrix.

points lie. Residual matrix. The r th residual matrix of the square symmetnc matr11, A lS

where A¡ and u¡ are the ith eigenvalue and eigenvector of A.

Reversa) (in clustering ).

A reversa} occurs when a fusion made late rn
Row vector. A matrix with only one row. Equivalent1y, a vector written as a horizontal row. l.

Ruiicka's lndex of Similarity, RI RI = 100

t Í=l

between quadrats j and k is min(xiJ, xik)

max( x¡ 1 , xik)

percent

where x ¡1. and x ar th . . . d k· mi ( ik e e quantities of species ¡ in quadrats J an n x,J, X¡k) and max(x )d . d the greater of th t .iJ.' xik enote, respectively. the lesser an e wo quantities. Sample. A collection of sam . . Sampling unit A . . . pling uruts or quadrats.

.

·

n md1v1dual pl t

uch

0 umts, each of whi h . or quadrat. A collection of rnany 5 . e Is a ditt ....,mun1tY under study co · erent small fragment of the coiu.u > , nstitutes a samvlP

5caJar. An "ordinary" numbcr · matrix). ' in co11l1ast lo a11 a11ay <>f

11u111h1,;. 1 ~. (a

Single-linkage clustering. Samc as, nearest-ne1ghl I 50rensen's Index of s· . . )()re· usll'ring. 1m11anty hctwccn 2a/(2~ + f ),where a is the number of s }~º quadmts. 'J lae ratio and f is the number of sp e· . , pcc1cs L:om111on to botf1 quadrat:-. h d tes present 111 onc < ll t e qua rats. Compare Jaccard' , d >r <> 1cr (but not bolla) of . .~ 1n e 0 / Sunilarity. Spannmg . . . ali ll · · · tree. A set of line segm'~11ts '"' · JOrnmg · pomts so that every pair of po· t . . H.: pornls in a swllrn1 of 111 IS 1111ke<.J by J are no loops). on Yonc:, palh (i.c:,., U1cn, SSCP matrix.

Same a

Standard deviation.

um-of-squares-and-cross-products matrix

The square root o f th e vartance. .

·

Standardiz~d

data. Data that . have been re caled by dividing every observation by the standard deviation of all the obscrvations.

Stopping rule.

A rule for deciding when a divisive classification should

stop. Submatrix. A subset of a given matrix that is it elf a matrix. It is delimited en bloc from the "parent" matrix, with the arrangement of the elements unchanged. See also partitioned matrix. Sum-of-squares-and-cross-products matrix. The matrix obtained by multiplying a data matrix by its transpose. The (i, i)th element is the sum of squares of the ith variable. The (h, i)th element is tbe sum of cross-products of the h th and i th variables. Syounetric matrix. A square matrix that is synunetric about its main diagonal (top Jef t to bottom righ t). Thus the ( h, i )th elemen.t and. tbe 1 (i, h)th element are equal for all h, i. A square matnJ< ol wbich this not true is unsymmetric.

th lements on the main diagonal f race of a square matrix. The sum o .e e T (top left to bottom right). The trace of A is written tr(A). . . ultiply a data matnx m d Transformation matrix. A matrix use to prem order to bring about a linear transformation of the data. . f the s X n matrix X is tbe n X s 0 Transpose of a matrix. The transpos~ s (and consequently, the 1 matrix having the rows of X as its co umn , ' columns of X as its rows). It is denoted by X·

256

.

Same as dendrogram .

.

Tree daagram. . The axiom that the d1stance betwee rty ax1om. n any 1 Triangle inequa ot exceed the sum of the distances frorn t\Vo . A and B cann eacn points . oint C; that is, d(A, B) :::; d(A, C) + d(B, C). of

them to a third p . sures Those that cannot m any circurn stanc · distance mea · U1trametr1c l . a clustering process. es cause a reversa m . . . . d·nation axis on which the data pomts all have · · lar axis. An or 1 C . seores Umpo . (all positive or all negat1ve). ontrast bipolar ax · of the same s1gn is. . tr'x See symmetric matrix. Unsymmetrrc ma • · . Same as average dzstance. See average-tin. . Unwerghted average distance. .

kage clustering critena. . Th e mean of the squared deviations, from their mean value, of a Vanance. set of observations. Variance-covariance matrix. Same as covariance matrix. Vector.

A row vector or column vector. In sorne contexts, the n (say)

elements of a vector constitute the coordinates of a point in n-dimensional space. Weighted average distance.

See average-linkage clustering criteria.

Weighted centroid distance.

Same as median distance. See average-linkage

clustering criteria, Weighted methods treat clusters as of equal weight irrespective of their numbers of points. Unweighted methods treat data points as of equal weight so that the weight of a cluster is proportional to its number of points.

Weighted and unweighted clustering methods.

Within-axes heterogeneity. The heterogeneity exhibited by a data swarm consistmg of two or more subswarms, when all subswarms occupy the same many d · · ·t - imensional space. Con trast between-axes heterogenei Y· Within-cluster d' · · ces isp~rs1on of a cluster. The sum of squares of the distan from every pom t . h 1 m t e e uster to the cluster's centroid.

Bibliography

Anderberg, A11a/y.rn · · · 1or r. A ¡· · York. M. R · (1973) · Clu\'fer ' /IP trn/1011s . /\c¿Hfl:111ic l'n.:ss, Nl'w Carleton. T. J., and P. F. Mayc
1643-1649. G auc h , H . G ., J r., an 446-451.

d R. H. Whittaker (1972). Cocnoclinc simulation. Ecolox_v 53:

d R. H. Whittaker (1981). Hicrarchical cJassiftcation of G auch , H. G ., J r., an community data. J. Eco/. 69: 135-152. . . . . . d . corrclation In "Ordrnat1on · , G dall D W (1978a). Sample smnlanty an spccics oo ' . . . . " (R H Whittaker Ed.), W. Junk. Thc llague, pp. of Plant Commumt1es . · '

99-149. . . 1 "C1assiftcatjon of Plant ComGoodall, D. W. (1978b). Numerical classification. n ,, 257

258

. . ,, Whittaker, Ed.), W. Junk, The Hague, pp. 249-28S. /. h E / mumues (R. H. Classification . Methods 1º' t e xp oratory Ana/ . Gordon, A .D. (198l)Chapman and Hall, New York. Ys1s of M ltivanate Data. . Gower,uJ. C. ·(1967). A comparison of sorne methods of cluster analysis. Biotnetrics 23· 623-637. . 1 . · J C. and P. G. N. Digby (19~1). ~prDess1~~(VcomBp ex trelaEtidonshi~s in tWQ G ower, · . ' I n "In terpreting Multivanate ata . ame t, .), Wlley ' New dimens10ns. York, pp. 83-118. . . . . d G J s. Ross (1969). Minimum spanrung trees and single linkage Gower, J. C., an . . . 8· 54 64 cluster analysis. Appl. Statlst. 1 . - . . Hill, M. O. (1973 ). Reciproca! averaging: an e1genvector method of ordination. J. Eco/. 61: 237-251. Hill, M. O. (l 97 9a). DECORANA-A FOR~RACN ProgrVam. for .Dtetrehnded Correspon dence A nalysis and Reciproca/ Averagmg. orne11 ruvers1 y, 1t aca ' NY · Hill, M. o. (1979b). TWINSPAN-A FORTRAN Prog~am for Arranging Multiuariate Data in an Ordered Two-Way Table by Classificatzon of the lndividua/s and Attributes. Comell University, Ithaca, NY. Hill, M. o., R. G. H. Bunce, and M. W. Sh~w (1975)_. Indica.tor _species analysis, a divisive polythetic method of classificauon, and its applicat10n to a survey of native pinewoods in Scotland. J. Eco/. 63: 597-613. Hill, M. O., and H. G. Gauch, Jr. (1980). Detrended correspondence analysis, an improved ordination technique. Vegetatio 42: 47-58.

Jeglum, J. K., C. F. Wehrhahn, and M. A. Swan (1971). Comparisons of environmental ordinations with principal component vegetational ordinations for sets of data having different degrees of complexity. Can. J. Forest Res. 1: 99-112. Kempton, R. A (1981). The stability of site ordinations in ecological surveys. In The Mathematical Theory of the Dynamics of Bio/ogical Populations IJ (R. W. Hiorns and D. Cooke, Eds). Academic Press, New York, pp. 217-230.

Lance, G. N., and W. T. Williams (1966). A general theory of classilicatory sorting strategies. l. Hierarchical systems. Computer J. 9: 373-380.

Lefkovitch, L. P. (1976). Hierarchical clustering from principal co~rdinates: an efficient method for small to very large numbers of objects. Math. Biosci. 31: 157-174. Levandowsky M (1972) An a· · · · d f · or mation of phytoplankton populations m pon s o v . ' . '. arymg salinity and temperature. Ecology 53: 398-407.

Levandowsky, M., and D. Winter (1971). Distance between sets. Nature 234: 34- 35· L1etfers, V. J (1984) E Alb t . · S .. · mergent plant communities of oxbow lakes in nortbeastern er a. alinity water le I ft · 62' 310-316. ' ve uctuations and succession. Can. J. Botany · Maarel, E. van der (1980) O . . 42: 43-45. · n the interpretability of ordination diagrams. VegetaflO

259

,1 E. van der, J. G. M. Janssen and J M f · ' · W L 1111anrc ' i program or structunng phytosociological t~bl . ouppen (1978). TABORD ks P. L., and P. A. Harcombe (1981) F es. Vegetatio 38: 143-156. 'a tvfaJ ' T E l . orest veget t. southeast exas. co. Monogr. 51: 287-305. a ion of the Big Thicket,

~wrison, D. F. (1976). Multivariate Statistica/ M th d York. e o 'S. 2nd ed. McGraw-Hill, New

Newnhan:1· R. M . (196~). A classification of climate b

rinc·1 and 1ts relationship to tree pecies distribuf YI:'P P~ component analysis . ion. rorest Set. 14: 254-264 . . · . Nichols, S. (1977). On the mterpretation of princi al com 0 cal contexts. Vegetatio 34: 191-197. p P nent analys1s m ecologiNoy-Meir, l. (l 973a). Data. transformations in ecoloo-ical . er ordination. I Sorne advantages of non-centermg. J. Eco/. 61: 329-341. Noy- ~eir, l.. (~ ?73 b ). Divi~ive .polythetic classification of vegetation data by optiIlllZed diVls10ns on ordination components. J. Eco!. 61: 753-760. Noy-Meir, l. (1974). Catenation: quantitative methods for the definition of coenoclines. J ºegetatio 29: 89-99. Noy-Meir, L D. Walker, and W. T. Williams (1975). Data transformations in ecological ordination. II. On the meaning of data standardization. J. Eco!. 63:

779-800. Orlóci, L. (1978). Multivariate Analysis in Vegerarion Research. W. Junk, The Hague.

Pielou, E. C. (1977). M athematical Ecology. Wiley, New York. Rohlf, f. J. (1973). Algorithm 67: Hierarchical clustering using the mínimum spanning tree. Computer J. 16: 93-95. ). Algorithm AS 13-15. Appl. Statist. 18: 103-110. Ross, G . J . S. (1969 . N y k Searle, S. R. (1966). Matrix Algebra for the Biological Sciences. W?ey, ew ~r . 11 (1966) Para.metric representat1ons of non-linear Shepard, R. N., and J. D. cai:ro . ·al . ,, (P R Krishnaiah, Ed.), Academic data structures. In "Multivanate An ysis · · Press, New York.

.

Sneath, P. H. A., and R. R. Sokal (1973). Numerzcal Taxonomy.

w

H. Freeman &

.

Co., San Francisco. . clusters in association . . al . .ficance of spec1es Strauss, R. E. (1982). Staustic sigm 639 · analysis. Ecology 63: 63 4. Wiley New York. / Tatsuoka, M. M. (1971) Multivariate ~na_Yszs~f Plan; Communities. W. Junk, The Whittaker R. H. (Ed.) (1978a). Ordznation ' .. Junk, The Hague. Hague. . Plant Communities. W. Whittaker, R. H. (1978b). C/assi.fication o/

lndex

Page numbers in boldface · d. m icate substantial t reatment of a topic. Anderberg, M . R., 184 Clustering meth 0 ds, combmatorial, . Apollonius's theorem, 48, 78 70 Arch effect, 193, 196 group average, 70, 74 monotonic, 74, 76 Asymmetry, coefficient of 163 Axes, principal, 148 ' no~hierarchical, see Pooling we1ghted and unweighted, 70 , 73 Components, principal, 148 Binary data. 53 Correlation coefficient, 108 , 183 Bipolar axis , 162 matrix, see Matrix , correlation Bunce, R. G. H. , 218 Correspondence analysis, see Reciproca) averaging Carleton, T. J., 164 Cosine separation, 50 Carroll, J. D .. 197 Covariance, 105 Catenation, 190 Covariance matrix, 103, 111 Centered arid uncentered data 103, 15 8 ran.k of, 243 Centro id, 25 ' Czekanowski 's index of similarity, see Chaining, 22 Similarity Characteristic value, see Eigenvalue Characteristic vector, see Eigenvector Data, linear and nonlinear, see Linear data Classification: Data matrix , 1 Data point, 8 agglomerative, see Clustering Data swarm, 8 divisive, 203 Delaney, M. J., 209 hierarchical and nonhierarchical, see Pooling Dendrogram, 19, 220 Cluster, 10 nades and internodes of, 19 Clustering, 10, 13, 203 Detrended correspondence analysis, 190, 195 average linkage , 32, 63, 69, 73 Difference, percentage, see Dissimilarity, centroid, 25, 51 , 70, 74 percentage complete linkage , 22, 72 Digby, P. G. N., 175 farthest-neighbor, 22, 72 Direction cosines, 102 median, 70 Discriminant ordination, 223 rninimum variance, 32, 72 Dispersion, within-cluster, 32 nearest-neighbor, 15, 72 Dissimilarity' percentage, 43' 55. 61 single linkage , 15, 72 261

26:l

u.

(l

JI (

1d, h7

1 rd

7 '

blod 'í ud1d n, 14 ~I 'i9 e 1 t

Maru ., -St rnh. n dian. lJfl; p r cntag • \([ Di nw J~hl1 d a erar wc1µhtr.d avuar

i1

7

I f ~

•(1

nt

nt '

wc1r.h1cd ccntr 111 6x l',lnn1on, 13 , 1QO

'' 1 1 n

Livcnílnaly'>1 • 116, l 2ó. 229 Hotcllm m lhod 120 f.:,igr.;nvalu J lb, 22 L•!' nv uor. J lb, 225

1

l

Gaud1 H (_, Jr 9, 77, !SI, 152, 17h 11>1. l9S, 198 . 20 . 220 (, ( tJC 19 • 2 1~ Hor eh dkct, 1t1 An:h llnt

lnd1ca1or

f)(LJc ,

21

Jace:ard' 1ndc of •.i111tl:i11t) 57 hl Jan en. J (., 1 . 1n Jcglum . J K., 152 Kc.:mp1<11 R A. 17S

. Lan1.: . '

, 5.

t1l) , /(),

12 Newnlta111 , 1 . M, 1 'i N1d1ul , S., 1S'i, l SK Non. 1.:, 15 1, l 'í'i Nuriltlll::11 d.11 .1, ,,,,, 1 111 .11 1!. 11 N111111d1 1l·dda1 .i,46,'il ,11 N11y f\1 ·11 , 1., 107, ltil, lf,4 , 1 ' 1 ~ • 1

215, ') 17

75

l..atl:nt vo.ilu1.: , <'< l.1gc.:n uluc.: Latcnt \l.! tur. i1' F1gcrl\eu11r lefkm·111.:h, L p . "l l I . _ 1 l~\andm,., ~k}. M • 4 S7 5

01di11.it11111 , H\ 1.l1 d1-.l 11111111:111r , 22 Q lypc, X, 71

R-typL', 8 J(Jl. 2JO, 218 OnJ1nntin11 -<, p1icL' p:HIJ[JCllllll '~ (I) 72, 71 Jrl(K1, L., J2 . .¡e;, 50, 'i?. )}{, ' '

263

c.

34, 150, 169n, 197 , 204, 214

píelOU, E· ' ' .,.,4. 230 76 77 poo]lng, ' . nce-and-absence data, see Bmary data . prese . ·pal component ana1ys1s, 136, 152 p0 nc1 ·pal coordinate analysis, 165 PrJnCI 1

.--

Q-type ordination, see Ordination, Q-type Quadrat, 9 Reciproca} averaging, 176, 184 Redundan ey' 15 2 Remoteness, percentage, 44, 55, 61 Reversa}, 74 Rohlf, F. J., 206, 209 Ross, G. J. S., 206 , 209, 244 Rotation: nonrigid, 227 rigid, 96, 116, 137 R-type ordination, see Ordination , R-type Ruzicka's index of similarity, 44, 57 , 61 Sample, 9 Sampling unit, 9 Scalar, 85 Scatter diagram, 6, 88 Searle, S. R., 226, 229 , 245 Shaw, M. W., 218 Shepard, R. N., 197 Similarity, percentage, 43, 61 Sneath, P. H. A., 70, 72 Sokal, R. R., 70, 72 S~rensen's index of similarity, 56 • 61 Spanning tree, 205 mínimum, 205

sscp (sums f . -o -squares-and-cross-products) matnx, see Matrix, SSCP Standard deviation, 104 Standardized data, 107 ' 155 Strauss, R. E., 71 Submatrix, 233 Swan, M. A., 152 Table arrangement, 4, 198, 199 TABORD, 198 Tatsuoka, M. M., 12, 126, 226, 229, 230, 245 Transfonnation: linear, 92 orthogonal and nonorthogonal, 226 Tree diagram , see Dendrogram Triangle inequalily axiom, 41 TWINSPAN, 218, 220 Uncentered data, see Centered and uncentered data Unipolar axis, 162 Variance, 104 Variance-covariance matrix, see Covariance matrix Vector, 85 column, 86, 96 row, 85 Walker, D., 107, 165 Wehrhahn, C. F., 152 Whittaker, R. H., 72, 191, 220 Williams, W. T., 65, 69, 70, 75 , 107, 165 Winter, D., 45, 57

Related Documents

E. C. Pielou - The Interpretation Of Ecological Data_ A Primer On Classification And Ordination (1984, John Wiley & Sons).pdf
March 2021 0

[john C. Davis] Statistics And Data Analysis
February 2021 1

C & Data Structures Qa
January 2021 0

A Primer On Social Problems
February 2021 0

Ecological Consideration Of Site
March 2021 0

The Gospel Of John
January 2021 0

More Documents from "Chalcedon Foundation"

E. C. Pielou - The Interpretation Of Ecological Data_ A Primer On Classification And Ordination (1984, John Wiley & Sons).pdf
March 2021 0

March 2021 3

March 2021 2

Geleias De Mestre
March 2021 2

January 2021 4

January 2021 3